HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Normalisation of 16th and 17th century texts in French and geographical named entity recognition

Abstract : Both statistical and rule-based methods for named entity recognition are quite sensitive to the type of language used in the analysed texts. Former studies have shown for example that it was harder to detect named entities in SMS or microblog messages where words are abridged or changed to lowercase. In this article, we focus on old French texts to evaluate the impact of manual and automatic normalization before applying five geographical named entity recognition tools, as well as an improved version of one of them, in order to help building maps displaying the locations mentioned in ancient texts. Our results show that manual normalisation leads to better results for all methods and that automatic normalisation performs differently depending on the tool used to extract geographical named entities, but with a significant improvement on most methods.
Document type :
Conference papers
Complete list of metadata

Contributor : Philippe Gambette Connect in order to contact the contributor
Submitted on : Tuesday, December 15, 2020 - 3:59:47 PM
Last modification on : Thursday, March 17, 2022 - 10:08:42 AM
Long-term archiving on: : Tuesday, March 16, 2021 - 7:58:58 PM


Files produced by the author(s)



Eleni Kogkitsidou, Philippe Gambette. Normalisation of 16th and 17th century texts in French and geographical named entity recognition. ACM SIGSPATIAL GeoHumanities'20, ACM, Nov 2020, Seattle (virtual), United States. pp.28-34, ⟨10.1145/3423337.3429437⟩. ⟨hal-02955867⟩



Record views


Files downloads