Programming: Problems and Solutions: Installing tesseract ocr engine

In order to build the tesseract ocr engine from source on my machine, I had to jump through a few hoops that I figure I might as well document here. Note: I am using linux.

What I did:

downloaded tesseract-3.01.tar.gz , and
tesseract-ocr-3.01.eng.tar.gz . Follow the instructions so that the english language files end up in the tessdata folder created when the tesseract tarball is extracted.
downloaded and installed leptonica. This is listed as a dependency in the tesseract README, and was pretty straightforward. Just a simple ./configure, make, make install.
To build tesseract itself, it is not enough to do the standard ./configure , make , make install. you must also run ./autogen.sh. If you don't, you will get a cryptic error, something about missing Makefile.in
For autogen.sh to run, i had to sudo apt-get install autoconf automake libtool. This is not made obvious, and I had to dig around a bit to figure it out.

update: I was looking through the tesseract README later and found a section i missed before that explains all this. http://code.google.com/p/tesseract-ocr/wiki/ReadMe look for linux installation instructions.

after autogen.sh ran to successful completion, ./configure , make , make install in the tesseract directory all worked fine as well.

Programming: Problems and Solutions

Thursday, July 12, 2012

Installing tesseract ocr engine

No comments:

Post a Comment

About Me