Skip to main content

Free OCR Software Reviewed

Many Indians who come to the Netherlands get many letters that are in Dutch. It takes great effort to try to understand their content. Google Translate (and other services like it) can help us translate digitized content easily and with a good degree of accuracy, but its still hard to translate printed material. One option is always to seek the help of your colleagues, friends or neighbours, but sometimes it becomes one letter too many. Also, if the letter is personal, you may not want someone you know to read the contents.

An option is to scan the documents you wish to get translated and digitize the content of the same. The way to do this is to run the scanned document through an OCR software. OCR stands for Optical Character Recognition. The idea is to convert the scanned document into digital content from which you can copy and paste sentences and paragraphs. This content can then be used as input to Google Translate.

There are many OCR software available in the market. But the idea is to use the cheapest possible option. Thankfully, many totally free options are available, but the results can vary in terms of speed and accuracy. So I put a few of these free options to test. Here are the findings of the same.

I tyeped the following query in Google Search 'free online ocr' and got many pages of results. Relying on Google to rank the pages according to some metrics of popularity and usefulness, and ignoring the paid searches, I short listed a few services for a simple test. I took a letter I got for the Electronic PatientDossier somtime in 2009. I scanned it using my ageing Canon MP390 at 300 dpi in JPEG format. This is the standard output of a scanner and 300 dpi should be enough for a good OCR job. I then submitted the document to these short-listed services and reviewed the output document. The focus was on ease of uploading your docuent to these services, the time it took to return the digital content and the accuracy fo the same.

free-ocr.com

This was the first search result for me. The interface is simple and clear. You can upload files in five formats but each file is lomited to 2 MB file size. To protect the service from abuse, Re-Captcha has been implemented. The choice of input language is also very wide; you can choose out of 19 languages and Dutch is an option.

The output was available in little time and was presennted in ascii text. It was easy to copy and paste into another service.

The output, sadly, did not look very nice. About 20 percent of it was not recognized correctly and hence it was not a coherent document that could be translated by Google.

onlineocr.net

This was the second service in the list. This service has advanced features whre you can register with them and login. That way you can store the outputs of your OCR inputs. Without logging in, you can use their service in 'guest mode'where you can upload upto 15 documents per hour.

You can upload documents in five formats. The maximum file size is 4 Mb, better than the competition. The interface where you upload the document needs some improvement. You select the file, specify the source language and enter a simple captch text. Then you go up a litto to click on the 'Recognize'button, which seems to be a bit odd.

This is where the bad news stops. The time to present the output is minimal, and the output quality is very good. I found only very few issues, like nl being recognized as n1 in the URL. Also, the end of one paragraph and the beginnning of the next was sometimes not clear, but that is a really minor annoyance. The translation was a great success as well.

newocr.com

Perhaps the best service of all that were tested. The input interface is really easy. Select the document to upload and select the source language. The selection of languages is really large, almost seventy languages, including Tamil, Telugu and other Indian languages  There is no mention of an upper limit of the file that can be uploaded, but I did not hit that limit with the 650 KB file I was using. You then click the Preview button. This step takes a while, but the result is great.
The preview screen shows you the whole file you have uploaded. You are provided with an overlay not unlike those given by scanning software. You can then restrict the OCR effort to a sub-section of the document, and ignore things like logos on the top and the statuary footer information. This makes the OCR output clear, simple and precise. You also have here the option to rotate the input file and also perform page layout analysis; split multi-column text into columns. This is great for documents lie fine print of services and bills, instruction pages of forms for the Genmeente and the IND and the like.

Once you set the area of the document you wish to be recognized, you click the OCR button. In a few seconds, you get the recognized text of the selected area of the document. The output I received was

free-online-ocr.com

Comments

Popular posts from this blog

Avondvierdaagse - The four evenings walks

Every late-spring, groups of schoolkids, some 500000 in total, walk the Avondvierdaagse and stay up late til the evening. What is this strange tradition and is it worth spending time on while staying late in the evening? What is the Avondvierdaagse? It’s basically a community walk that takes place over four evenings. Thousands of children, and some of their teachers and parents, walk either 5, 10 or 15 kilometers per evening. The majority parents decide to walk along behind each other for 5 kilometers so that children actually get to bed before midnight. It is worth noting that, because of the sheer volume of bodies moving in the same direction at one time, it feels like you walk at least twice the 5km distance. Many children are accompanied by one parent, whilst the sane one enjoys the peace and quiet at home. Who takes part in the Avondvierdaagse? Schools, families, sports clubs, walking groups, random people and their dogs. What’s the point of the Avondvierdaagse? The...

Complicated to get a Dutch Drivers License

A friend of mine had told me his story of getting his Dutch. His story was a long one, perhaps because he was not good at driving cars with proper road signs. Perhaps he has driven more cycles than cars in his life in India (and the same here in Holland) and hence may be on the extreme side. But then, these people have greeting cards that congratulate the person who has passed his/her driving test and has been awarded a license to drive. The person celebrates this long-awaited moment with his friends and family, so its not a mean achievement. The author summarizes the problems very concisely and precisely Just what makes it so difficult? Cost, language barrier, cost, cultural confusion, and cost. The process starts with clearing the theory test. Even though the test is in English, the book having the possible set of 1500 questions is only available in Dutch. Once you clear the test, you need to find a good driving school Ask the price per hour. This should be 32-36 euros, which seems l...

Free legal advice

Free legal advice is provided by the Juridisch Loket . This is a free service that is run in partnership with the town halls. The service is provided by lawyers who are certified to provide legal advice. You usually do not need to make an appointment (and hence if its busy, you may have to wait a while), but for more complex items they may insist on making an appointment. The advice is not going to be the best you can get, because these lawyers are investing their time here for free so as to get more experience with their work. But if you do not have much money, or if the work you need help with is simple and does not need real expert advice, you can definitely get in touch with them. Click here for locating the nearest Loket