Scan-to-XML: Using Software Component Algebra for Intelligent Document Generation

Abstract : The main objective of this paper is to experiment a new approach to develop a high level document analysis platform by composing existing components from a comprehensive library of state-of-the art algorithms. Starting from the observation that document analysis is conducted as a layered pipeline taking syntax as an input, and producing semantics as an output on each layer, we introduce the concept of a Component Algebra as an approach to integrate different existing document analysis algorithms in a coherent and self-containing manner. Based on xml for data representation and exchange on the one side, and on combined scripting and compiled libraries on the other side, our claim is that this approach can eventually lead to a universal representation for real world document analysis algorithms. The test-case of this methodology consists in the realization of a fully automated method for generating a browsable, hyper-linked document from a simple scanned image. Our example is based on cutaway diagrams. Cutaway diagrams present the advantage of containing simple "browsing semantics", in the sense that they consist of a clearly identifiable legend containing index references, plus a drawing containing one or more occurrences of the same indices.
Type de document :
Communication dans un congrès
Dorothea Blostein and Young-Bin Kwon. 4th International Workshop on Graphics Recognition - Algorithms and Applications - GREC'2001, Sep 2001, Kingston, Ontario, Canada. Springer Verlag, 2390, pp.211-221, 2002, 〈10.1007/3-540-45868-9_18〉
Liste complète des métadonnées

https://hal-upec-upem.archives-ouvertes.fr/hal-00622124
Contributeur : Laurent Najman <>
Soumis le : dimanche 11 septembre 2011 - 18:13:07
Dernière modification le : mercredi 29 novembre 2017 - 14:36:59

Lien texte intégral

Identifiants

Collections

Citation

Bart Lamiroy, Laurent Najman. Scan-to-XML: Using Software Component Algebra for Intelligent Document Generation. Dorothea Blostein and Young-Bin Kwon. 4th International Workshop on Graphics Recognition - Algorithms and Applications - GREC'2001, Sep 2001, Kingston, Ontario, Canada. Springer Verlag, 2390, pp.211-221, 2002, 〈10.1007/3-540-45868-9_18〉. 〈hal-00622124〉

Partager

Métriques

Consultations de la notice

100