MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources - Archive ouverte HAL Access content directly
Conference Papers Year : 2011

MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources

Mathieu Constant
Anthony Sigogne
  • Function : Author
  • PersonId : 764797
  • IdRef : 167754998

Abstract

This paper describes a new part-of-speech tagger including multiword unit (MWU) identification. It is based on a Conditional Random Field model integrating language-independent features, as well as features computed from external lexical resources. It was implemented in a finite-state framework composed of a preliminary finite-state lexical analysis and a CRF decoding using weighted finite-state transducer composition. We showed that our tagger reaches state-of-the-art results for French in the standard evaluation conditions (i.e. each multiword unit is already merged in a single token). The evaluation of the tagger integrating MWU recognition clearly shows the interest of incorporating features based on MWU resources.
Fichier principal
Vignette du fichier
constant_sigogne.pdf (93.63 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00621585 , version 1 (11-09-2013)

Identifiers

  • HAL Id : hal-00621585 , version 1

Cite

Mathieu Constant, Anthony Sigogne. MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources. ACL Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE'11), 2011, Portland, Oregon, United States. pp.49-56. ⟨hal-00621585⟩
213 View
416 Download

Share

Gmail Facebook X LinkedIn More