Construction of linguistic resources for the extraction of "complex text segments"

Abstract : The development of computational linguistic resources (electronic dictionaries and grammars) for the automatic extraction, identification, and further fine-grained annotation of "complex text segments" , is the core of our work. We use and extend the notion of multi-word units (MWUs) by allowing a large description of linguistic objects: compound nouns, entity names, verbal forms (compound tense and negate forms, introduction of clauses between the auxiliary and the past participle, etc.) and frozen expressions (i.e. idioms). The identification of complex sequences of text segments is done by using dictionary graphs which combines the power and versatility of the local grammars and the expressivity of the electronic dictionaries.
Document type :
Poster communications
Complete list of metadatas

https://hal-upec-upem.archives-ouvertes.fr/hal-01448712
Contributor : Claude Martineau <>
Submitted on : Saturday, January 28, 2017 - 7:27:12 PM
Last modification on : Friday, July 13, 2018 - 3:54:02 PM

Identifiers

  • HAL Id : hal-01448712, version 1

Citation

Tita Kyriacopoulou, Claude Martineau, Cristian Martinez, Aggeliki Fotopoulou. Construction of linguistic resources for the extraction of "complex text segments". PARSEME 2nd general meeting, Mar 2014, Athènes, Greece. ⟨hal-01448712⟩

Share

Metrics

Record views

253