Combining Compound Recognition and PCFG-LA Parsing with Word Lattices and Conditional Random Fields

Abstract : The integration of compounds in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly preidentified. This article evaluates two empirical strategies to incorporate such multiword units in a real PCFG-LA parsing context: (1) the use of a grammar including compound recognition, thanks to specialized annotation schemes for compounds; (2) the use of a state-of-the-art discriminative compound prerecognizer integrating endogenous and exogenous features. We show how these two strategies can be combined with word lattices representing possible lexical analyses generated by the recognizer. The proposed systems display significant gains in terms of multiword recognition and often in terms of standard parsing accuracy. Moreover, we show through an Oracle analysis that this combined strategy opens promising new research directions.
Document type :
Journal articles
Complete list of metadatas

https://hal-upec-upem.archives-ouvertes.fr/hal-00841574
Contributor : Matthieu Constant <>
Submitted on : Friday, July 5, 2013 - 10:51:53 AM
Last modification on : Thursday, February 7, 2019 - 5:48:52 PM

Identifiers

Citation

Mathieu Constant, Anthony Sigogne, Joseph Le Roux. Combining Compound Recognition and PCFG-LA Parsing with Word Lattices and Conditional Random Fields. ACM - Transactions on Speech and Language Processing, Association for Computing Machinery, 2013, 10 (3), pp.8.1-8.24. ⟨10.1145/2483969.2483970⟩. ⟨hal-00841574⟩

Share

Metrics

Record views

288