Skip to Main content Skip to Navigation
Conference papers

A new hybrid binarization method based on Kmeans

Abstract : The document binarization is a fundamental processing step toward Optical Character Recognition (OCR). It aims to separate the foreground text from the document background. In this article, we propose a novel binarization technique combining local and global approaches using the clustering algorithm Kmeans. The proposed Hybrid Binarization, based on Kmeans (HBK), performs a robust binarization on scanned documents. According to several experiments, we demonstrate that the HBK method improves the binarization quality while minimizing the amount of distortion. Moreover, it out-performs several well-known state of the art methods in the OCR evaluation.
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download
Contributor : Rostom Kachouri Connect in order to contact the contributor
Submitted on : Thursday, April 21, 2016 - 9:49:47 PM
Last modification on : Saturday, January 15, 2022 - 3:57:17 AM
Long-term archiving on: : Tuesday, November 15, 2016 - 9:31:23 AM


Files produced by the author(s)



Mahmoud Soua, Rostom Kachouri, Mohamed Akil. A new hybrid binarization method based on Kmeans. 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), May 2014, Athens, Greece. ⟨10.1109/ISCCSP.2014.6877830⟩. ⟨hal-01305856⟩



Record views


Files downloads