In order to build the tesseract ocr engine from source on my machine, I had to jump through a few hoops that I figure I might as well document here. Note: I am using linux.
What I did:
What I did:
- downloaded tesseract-3.01.tar.gz , and
- tesseract-ocr-3.01.eng.tar.gz . Follow the instructions so that the english language files end up in the tessdata folder created when the tesseract tarball is extracted.
- downloaded and installed leptonica. This is listed as a dependency in the tesseract README, and was pretty straightforward. Just a simple ./configure, make, make install.
- To build tesseract itself, it is not enough to do the standard ./configure , make , make install. you must also run ./autogen.sh. If you don't, you will get a cryptic error, something about missing Makefile.in
- For autogen.sh to run, i had to sudo apt-get install autoconf automake libtool. This is not made obvious, and I had to dig around a bit to figure it out.
- update: I was looking through the tesseract README later and found a section i missed before that explains all this. http://code.google.com/p/tesseract-ocr/wiki/ReadMe look for linux installation instructions.
- after autogen.sh ran to successful completion, ./configure , make , make install in the tesseract directory all worked fine as well.
No comments:
Post a Comment