A new hybrid binarization method based on Kmeans

Abstract : The document binarization is a fundamental processing step toward Optical Character Recognition (OCR). It aims to separate the foreground text from the document background. In this article, we propose a novel binarization technique combining local and global approaches using the clustering algorithm Kmeans. The proposed Hybrid Binarization, based on Kmeans (HBK), performs a robust binarization on scanned documents. According to several experiments, we demonstrate that the HBK method improves the binarization quality while minimizing the amount of distortion. Moreover, it out-performs several well-known state of the art methods in the OCR evaluation.
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal-upec-upem.archives-ouvertes.fr/hal-01305856
Contributor : Rostom Kachouri <>
Submitted on : Thursday, April 21, 2016 - 9:49:47 PM
Last modification on : Thursday, February 7, 2019 - 5:23:57 PM
Long-term archiving on : Tuesday, November 15, 2016 - 9:31:23 AM

File

ISSCP14_HBK(publie).pdf
Files produced by the author(s)

Identifiers

Citation

Mahmoud Soua, Rostom Kachouri, Mohamed Akil. A new hybrid binarization method based on Kmeans. 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), May 2014, Athens, Greece. ⟨10.1109/ISCCSP.2014.6877830⟩. ⟨hal-01305856⟩

Share

Metrics

Record views

209

Files downloads

431