Thursday, July 12, 2012

Installing tesseract ocr engine

In order to build the tesseract ocr engine from source on my machine, I had to jump through a few hoops that I figure I might as well document here. Note: I am using linux.

What I did:
  • downloaded tesseract-3.01.tar.gz , and 
  • tesseract-ocr-3.01.eng.tar.gz . Follow the instructions so that the english language files end up in the tessdata folder created when the tesseract tarball is extracted.
  • downloaded and installed leptonica. This is listed as a dependency in the tesseract README, and was pretty straightforward. Just a simple ./configure, make, make install.
  • To build tesseract itself, it is not enough to do the standard ./configure , make , make install. you must also run ./autogen.sh. If you don't, you will get a cryptic error, something about missing Makefile.in
  • For autogen.sh to run, i had to sudo apt-get install autoconf automake libtool. This is not made obvious, and I had to dig around a bit to figure it out.
  • after autogen.sh ran to successful completion, ./configure , make , make install in the tesseract directory all worked fine as well.

Tuesday, June 26, 2012

Losing PGM comments


Context: QT, C++, Custom image library, which uses PNM format to save images. I generated a set of small images from a larger image. The image library I use saves image properties in the header of the PNM image. I use these properties in another program that does analysis on the images. I did further editing on some of the small images with GIMP, and then left work for the day.


Problem: Next day, went to run my analysis program on the set of images, and it crashes in the image library. The error message states basically that the index is out of bounds for the image property. 

Solution: After modifying the image library to print more debug information, and recompiling the whole library, taking up time figuring out the build system for the library, I discovered that when I had edited the images with GIMP, and re-saved them, GIMP had not preserved the image properties my image library had saved in the header. I regenerated the images from my program, and the analysis program ran fine. If I need to do edits on them, I will have to find a different way to edit them than GIMP.

Lesson: If your program isn't working, check your data sources as well the code itself.

Friday, June 22, 2012

Useful Tricks in QT


  • qPrintable(QString str) - skips the several function calls it usually takes to get to a standard char*
  • .pro file basics
    • Need to include a header from a new directory?
      • INCLUDEPATH += /path/to/directory/
      • HEADERS += myheader.h
      • #include <myheader.h> in the file you are working on.
      • For some reason, it took a while for the header to be recognized. I reran qmake and tried rebuilding a few times. This may not always happen.
        • update: i think the reason the header was not recognized was that I had misspelled it. Whoops! Once INCLUDEPATH and HEADERS were updated, the header appeared in the autocomplete.
  • Random Numbers
    • call the following first and only once: qsrand(QDateTime::currentDateTime().toTime_t());
    • to get a random number 0..max, do something like the function below
    • qint32 randNum(qint32 max)
      {
      return ((qreal)qrand())/RAND_MAX * max;
      }
      
      

Segfault from Missing Library

ContextI'm using Qt for c++. External libraries include opencv, opengl, pthread, among others. I left on vacation, and didn't use my computer for a month, though others used the computer while I was gone. As far as I know, my code, .pro file, everything has remain untouched. 


Problem: When I returned to work yesterday, my code compiles, but segfaults almost immediately after execution begins. Because it compiled, but failed at runtime, I suspected a problem with shared libraries.


Things I discovered as I searched for a solution:
  • Definitions
    • DSO - Dymanic Shared Object. These are libraries that a program uses during run time.
  • ldd - a linux command that tells you what DSOs are required for a program.
    • example execution: user@comp:~/directory$ ldd <Name-Of-Executable>
    • example output: libopencv_core.so.2.3 => /usr/local/lib/libopencv_core.so.2.3 (0x00007f33f28d0000)
    • if a library cannot be found, it will say so (like libpng12.so.0 => not found)
  • http://www.cyberciti.biz/tips/linux-shared-library-management.html - helpful for learning more about debugging shared library issues

Solution:
  • The segfault was occurring in an opencv file. I commented out the code that uses opencv, which I am not currently using anyway. Having done this, I now got linking errors from a colleague's code, which uses libjpeg, libpng, and libtiff. I had recently added sections of his code into my project.
  • To solve the linking errors, I added explicit links to these libraries in the .pro file. Specifically,
    • LIBS += /usr/lib/x86_64-linux-gnu/libpng12.so \
          /usr/lib/x86_64-linux-gnu/libjpeg.so \
          /usr/lib/x86_64-linux-gnu/libtiff.so

I'll admit that linker errors and runtime libraries trip me up way too often. My best guess is that the location of the libraries changed while I was gone.


Update: I uncommented out the code that used opencv, and the program still executes correctly. Thus, the problem was solely with the missing libraries.