Skip to main content

Free OCR Software Reviewed

Many Indians who come to the Netherlands get many letters that are in Dutch. It takes great effort to try to understand their content. Google Translate (and other services like it) can help us translate digitized content easily and with a good degree of accuracy, but its still hard to translate printed material. One option is always to seek the help of your colleagues, friends or neighbours, but sometimes it becomes one letter too many. Also, if the letter is personal, you may not want someone you know to read the contents.

An option is to scan the documents you wish to get translated and digitize the content of the same. The way to do this is to run the scanned document through an OCR software. OCR stands for Optical Character Recognition. The idea is to convert the scanned document into digital content from which you can copy and paste sentences and paragraphs. This content can then be used as input to Google Translate.

There are many OCR software available in the market. But the idea is to use the cheapest possible option. Thankfully, many totally free options are available, but the results can vary in terms of speed and accuracy. So I put a few of these free options to test. Here are the findings of the same.

I tyeped the following query in Google Search 'free online ocr' and got many pages of results. Relying on Google to rank the pages according to some metrics of popularity and usefulness, and ignoring the paid searches, I short listed a few services for a simple test. I took a letter I got for the Electronic PatientDossier somtime in 2009. I scanned it using my ageing Canon MP390 at 300 dpi in JPEG format. This is the standard output of a scanner and 300 dpi should be enough for a good OCR job. I then submitted the document to these short-listed services and reviewed the output document. The focus was on ease of uploading your docuent to these services, the time it took to return the digital content and the accuracy fo the same.

free-ocr.com

This was the first search result for me. The interface is simple and clear. You can upload files in five formats but each file is lomited to 2 MB file size. To protect the service from abuse, Re-Captcha has been implemented. The choice of input language is also very wide; you can choose out of 19 languages and Dutch is an option.

The output was available in little time and was presennted in ascii text. It was easy to copy and paste into another service.

The output, sadly, did not look very nice. About 20 percent of it was not recognized correctly and hence it was not a coherent document that could be translated by Google.

onlineocr.net

This was the second service in the list. This service has advanced features whre you can register with them and login. That way you can store the outputs of your OCR inputs. Without logging in, you can use their service in 'guest mode'where you can upload upto 15 documents per hour.

You can upload documents in five formats. The maximum file size is 4 Mb, better than the competition. The interface where you upload the document needs some improvement. You select the file, specify the source language and enter a simple captch text. Then you go up a litto to click on the 'Recognize'button, which seems to be a bit odd.

This is where the bad news stops. The time to present the output is minimal, and the output quality is very good. I found only very few issues, like nl being recognized as n1 in the URL. Also, the end of one paragraph and the beginnning of the next was sometimes not clear, but that is a really minor annoyance. The translation was a great success as well.

newocr.com

Perhaps the best service of all that were tested. The input interface is really easy. Select the document to upload and select the source language. The selection of languages is really large, almost seventy languages, including Tamil, Telugu and other Indian languages  There is no mention of an upper limit of the file that can be uploaded, but I did not hit that limit with the 650 KB file I was using. You then click the Preview button. This step takes a while, but the result is great.
The preview screen shows you the whole file you have uploaded. You are provided with an overlay not unlike those given by scanning software. You can then restrict the OCR effort to a sub-section of the document, and ignore things like logos on the top and the statuary footer information. This makes the OCR output clear, simple and precise. You also have here the option to rotate the input file and also perform page layout analysis; split multi-column text into columns. This is great for documents lie fine print of services and bills, instruction pages of forms for the Genmeente and the IND and the like.

Once you set the area of the document you wish to be recognized, you click the OCR button. In a few seconds, you get the recognized text of the selected area of the document. The output I received was

free-online-ocr.com

Comments

Popular posts from this blog

Arrived in Amsterdam, first views

I've just moved to Amsterdam and its been a couple of days here. This is from my first email to my friends back in India. I'm just settling here. Work has just began here; the dev guy who is to train me has been keeping really busy due to upcoming release and some sessions with the traders. From tomorrow, I will be sitting with the integrator who sits on the trading floor. I have been told that IT people avoid going to the trading floor, because the traders are always after the IT guys for bugs and feature requests. Its like Saurabh Chandna trying to avoid coming to 5th floor if something or the other keeps now and then. Maybe the guy was exaggerating; I guess I'll soon find out. Amsterdam is a nice place. Its expensive; apart from dairy products, potatoes and perhaps wheat, they import almost everything. I just bought tomatoes from Albert Heijn (pronounced Haain, the j being silent or sounding as a y, AH is to supermarkets what Xerox is to photocopy). Costs 1 Eur 35 cents ...

Divorce statistics in Netherlands

I was just curious. I wanted to look at Divorce statistics of Netherlands. I Googled (yes, I still feel its a good verb) the info and landed at this page It says Divorces (as percentage of marriages) = 38.3 Divorces (as per 1000 population per year) = 2.04 I tried some reverse math and reached at this number 5.38 marriages take place every year for every 1000 population. Does this finding say anything? Is it realistic? Find the flaw in my math, if any.

Public Transport website

If you survive in the Netherlands on Public Transport, then this is your one stop guide for Public Transport Information. 9292ov.nl The one website that will tell you what connections to take when going from place A to place B. You can specify the source and destination in many ways; address, train station, museum, theater, shopping center etc. You tell it the date and time. The time could be time of departure (from the source) or arrival (at your destination). Once the info is given, click geef reisadvies (Give Route Advice). If the info matches its database, it will give the fastest connections. Else it suggests corrections (for typos). Public Transport options include walk, bus, tram, train and ferry. Using the proper options, it can tell you all the stops on your route. It also tells you how much the fare will be (in strippens and train/ferry tickets). The only issue is that the website is all in Dutch. Unless you get familiar with it (and you soon will!), use Google Translate by...