Construction of linguistic resources for the extraction of "complex text segments"

Abstract : The development of computational linguistic resources (electronic dictionaries and grammars) for the automatic extraction, identification, and further fine-grained annotation of "complex text segments" , is the core of our work. We use and extend the notion of multi-word units (MWUs) by allowing a large description of linguistic objects: compound nouns, entity names, verbal forms (compound tense and negate forms, introduction of clauses between the auxiliary and the past participle, etc.) and frozen expressions (i.e. idioms). The identification of complex sequences of text segments is done by using dictionary graphs which combines the power and versatility of the local grammars and the expressivity of the electronic dictionaries.
Type de document :
Poster
PARSEME 2nd general meeting, Mar 2014, Athènes, Greece
Liste complète des métadonnées

https://hal-upec-upem.archives-ouvertes.fr/hal-01448712
Contributeur : Claude Martineau <>
Soumis le : samedi 28 janvier 2017 - 19:27:12
Dernière modification le : jeudi 11 janvier 2018 - 06:20:23

Identifiants

  • HAL Id : hal-01448712, version 1

Citation

Tita Kyriacopoulou, Claude Martineau, Cristian Martinez, Aggeliki Fotopoulou. Construction of linguistic resources for the extraction of "complex text segments". PARSEME 2nd general meeting, Mar 2014, Athènes, Greece. 〈hal-01448712〉

Partager

Métriques

Consultations de la notice

174