NAME Catmandu::HOCR - tools to work with HOCR documents SYNOPSIS #From the command line #Extract OCR data $ catmandu convert HOCR --file input.xml to YAML #In a script use Catmandu::Sane; use Catmandu::Importer::HOCR; my $importer = Catmandu::Importer::HOCR->new( file => "/tmp/input.html" ); $importer->each(sub{ my $record = $_[0]; #.. }); EXAMPLE OUTPUT IN YAML --- h: 38 page: 1 page_h: 3316 page_w: 2904 page_x: 0 page_y: 0 text: '1' w: 17 x: 2349 y: 2717 ... INSTALLATION In order to install this package you need the following system packages installed Centos * perl-devel * make * gcc * gcc-c++ * libyaml-devel * libxml2 version 2.6.21 or higher. Reason: the module XML::LibXML::Reader uses the libxml2 pull parser to read xml documents incrementally. AUTHORS Nicolas Franck SEE ALSO Catmandu::Importer::HOCR, XML::LibXML::Reader, Catmandu, Catmandu::Importer LICENSE AND COPYRIGHT This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License. See http://dev.perl.org/licenses/ for more information.