Skip to Main content Skip to Navigation
Conference papers

MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources

Abstract : This paper describes a new part-of-speech tagger including multiword unit (MWU) identification. It is based on a Conditional Random Field model integrating language-independent features, as well as features computed from external lexical resources. It was implemented in a finite-state framework composed of a preliminary finite-state lexical analysis and a CRF decoding using weighted finite-state transducer composition. We showed that our tagger reaches state-of-the-art results for French in the standard evaluation conditions (i.e. each multiword unit is already merged in a single token). The evaluation of the tagger integrating MWU recognition clearly shows the interest of incorporating features based on MWU resources.
Document type :
Conference papers
Complete list of metadata

Cited literature [31 references]  Display  Hide  Download

https://hal-upec-upem.archives-ouvertes.fr/hal-00621585
Contributor : Matthieu Constant Connect in order to contact the contributor
Submitted on : Wednesday, September 11, 2013 - 9:23:47 PM
Last modification on : Tuesday, October 19, 2021 - 11:26:00 AM
Long-term archiving on: : Thursday, March 30, 2017 - 2:43:44 PM

File

constant_sigogne.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00621585, version 1

Citation

Mathieu Constant, Anthony Sigogne. MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources. ACL Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE'11), 2011, Portland, Oregon, United States. pp.49-56. ⟨hal-00621585⟩

Share

Metrics

Record views

406

Files downloads

385