Skip to Main content Skip to Navigation
Conference papers

Normalisation of 16th and 17th century texts in French and geographical named entity recognition

Abstract : Both statistical and rule-based methods for named entity recognition are quite sensitive to the type of language used in the analysed texts. Former studies have shown for example that it was harder to detect named entities in SMS or microblog messages where words are abridged or changed to lowercase. In this article, we focus on old French texts to evaluate the impact of manual and automatic normalization before applying five geographical named entity recognition tools, as well as an improved version of one of them, in order to help building maps displaying the locations mentioned in ancient texts. Our results show that manual normalisation leads to better results for all methods and that automatic normalisation performs differently depending on the tool used to extract geographical named entities, but with a significant improvement on most methods.
Document type :
Conference papers
Complete list of metadatas

https://hal-upec-upem.archives-ouvertes.fr/hal-02955867
Contributor : Philippe Gambette <>
Submitted on : Friday, October 2, 2020 - 11:41:12 AM
Last modification on : Wednesday, October 14, 2020 - 3:55:14 AM

Identifiers

  • HAL Id : hal-02955867, version 1

Citation

Eleni Kogkitsidou, Philippe Gambette. Normalisation of 16th and 17th century texts in French and geographical named entity recognition. ACM SIGSPATIAL GeoHumanities'20, Nov 2020, Seattle, United States. ⟨hal-02955867⟩

Share

Metrics

Record views

25