Skip to Main content Skip to Navigation
Conference papers

MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources

Abstract : This paper describes a new part-of-speech tagger including multiword unit (MWU) identification. It is based on a Conditional Random Field model integrating language-independent features, as well as features computed from external lexical resources. It was implemented in a finite-state framework composed of a preliminary finite-state lexical analysis and a CRF decoding using weighted finite-state transducer composition. We showed that our tagger reaches state-of-the-art results for French in the standard evaluation conditions (i.e. each multiword unit is already merged in a single token). The evaluation of the tagger integrating MWU recognition clearly shows the interest of incorporating features based on MWU resources.
Document type :
Conference papers
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal-upec-upem.archives-ouvertes.fr/hal-00621585
Contributor : Matthieu Constant <>
Submitted on : Wednesday, September 11, 2013 - 9:23:47 PM
Last modification on : Wednesday, February 26, 2020 - 7:06:05 PM
Long-term archiving on: : Thursday, March 30, 2017 - 2:43:44 PM

File

constant_sigogne.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00621585, version 1

Citation

Mathieu Constant, Anthony Sigogne. MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources. ACL Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE'11), 2011, Portland, Oregon, United States. pp.49-56. ⟨hal-00621585⟩

Share

Metrics

Record views

391

Files downloads

334