Normalisation of 16th and 17th century texts in French and geographical named entity recognition - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Normalisation of 16th and 17th century texts in French and geographical named entity recognition

Résumé

Both statistical and rule-based methods for named entity recognition are quite sensitive to the type of language used in the analysed texts. Former studies have shown for example that it was harder to detect named entities in SMS or microblog messages where words are abridged or changed to lowercase. In this article, we focus on old French texts to evaluate the impact of manual and automatic normalization before applying five geographical named entity recognition tools, as well as an improved version of one of them, in order to help building maps displaying the locations mentioned in ancient texts. Our results show that manual normalisation leads to better results for all methods and that automatic normalisation performs differently depending on the tool used to extract geographical named entities, but with a significant improvement on most methods.
Fichier principal
Vignette du fichier
KogkitsidouGambette-2020-postprint.pdf (1.56 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02955867 , version 1 (15-12-2020)

Identifiants

Citer

Eleni Kogkitsidou, Philippe Gambette. Normalisation of 16th and 17th century texts in French and geographical named entity recognition. ACM SIGSPATIAL GeoHumanities'20, ACM, Nov 2020, Seattle (virtual), United States. pp.28-34, ⟨10.1145/3423337.3429437⟩. ⟨hal-02955867⟩
211 Consultations
293 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More