Skip to Main content Skip to Navigation
Conference papers

Real-time unsupervised classification of web documents

Abstract : This paper adresses the problem of clustering dynamic collections of web documents. We show an iterative algorithm based on a fine-grained keyword extraction (simple, compound words and proper nouns). Each new document inserted in the collection is either assigned to an existing class containing documents of the same topic, or assigned to a new class. After each step, when necessary, classes are refined using statistical techniques. The implementation of this algorithm was successfully integrated in an application used for Information Intelligence.
Document type :
Conference papers
Complete list of metadatas

Cited literature [7 references]  Display  Hide  Download

https://hal-upec-upem.archives-ouvertes.fr/hal-00722749
Contributor : Anthony Sigogne <>
Submitted on : Friday, August 3, 2012 - 6:40:39 PM
Last modification on : Wednesday, February 26, 2020 - 7:06:06 PM
Long-term archiving on: : Monday, November 5, 2012 - 11:00:16 AM

File

123.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Anthony Sigogne, Mathieu Constant. Real-time unsupervised classification of web documents. 4th International Multiconference on Computer Science and Information Technology (IMCSIT'09), Oct 2009, Mragowo, Poland. pp.281-286, ⟨10.1109/IMCSIT.2009.5352714⟩. ⟨hal-00722749⟩

Share

Metrics