Many Indians who come to the Netherlands get many letters that are in Dutch. It takes great effort to try to understand their content. Google Translate (and other services like it) can help us translate digitized content easily and with a good degree of accuracy, but its still hard to translate printed material. One option is always to seek the help of your colleagues, friends or neighbours, but sometimes it becomes one letter too many. Also, if the letter is personal, you may not want someone you know to read the contents.
An option is to scan the documents you wish to get translated and digitize the content of the same. The way to do this is to run the scanned document through an OCR software. OCR stands for Optical Character Recognition. The idea is to convert the scanned document into digital content from which you can copy and paste sentences and paragraphs. This content can then be used as input to Google Translate.
There are many OCR software available in the market. But the idea is to use the cheapest possible option. Thankfully, many totally free options are available, but the results can vary in terms of speed and accuracy. So I put a few of these free options to test. Here are the findings of the same.
I tyeped the following query in Google Search 'free online ocr' and got many pages of results. Relying on Google to rank the pages according to some metrics of popularity and usefulness, and ignoring the paid searches, I short listed a few services for a simple test. I took a letter I got for the Electronic PatientDossier somtime in 2009. I scanned it using my ageing Canon MP390 at 300 dpi in JPEG format. This is the standard output of a scanner and 300 dpi should be enough for a good OCR job. I then submitted the document to these short-listed services and reviewed the output document. The focus was on ease of uploading your docuent to these services, the time it took to return the digital content and the accuracy fo the same.
The output was available in little time and was presennted in ascii text. It was easy to copy and paste into another service.
The output, sadly, did not look very nice. About 20 percent of it was not recognized correctly and hence it was not a coherent document that could be translated by Google.
You can upload documents in five formats. The maximum file size is 4 Mb, better than the competition. The interface where you upload the document needs some improvement. You select the file, specify the source language and enter a simple captch text. Then you go up a litto to click on the 'Recognize'button, which seems to be a bit odd.
This is where the bad news stops. The time to present the output is minimal, and the output quality is very good. I found only very few issues, like nl being recognized as n1 in the URL. Also, the end of one paragraph and the beginnning of the next was sometimes not clear, but that is a really minor annoyance. The translation was a great success as well.
Once you set the area of the document you wish to be recognized, you click the OCR button. In a few seconds, you get the recognized text of the selected area of the document. The output I received was
free-online-ocr.com
An option is to scan the documents you wish to get translated and digitize the content of the same. The way to do this is to run the scanned document through an OCR software. OCR stands for Optical Character Recognition. The idea is to convert the scanned document into digital content from which you can copy and paste sentences and paragraphs. This content can then be used as input to Google Translate.
There are many OCR software available in the market. But the idea is to use the cheapest possible option. Thankfully, many totally free options are available, but the results can vary in terms of speed and accuracy. So I put a few of these free options to test. Here are the findings of the same.
I tyeped the following query in Google Search 'free online ocr' and got many pages of results. Relying on Google to rank the pages according to some metrics of popularity and usefulness, and ignoring the paid searches, I short listed a few services for a simple test. I took a letter I got for the Electronic PatientDossier somtime in 2009. I scanned it using my ageing Canon MP390 at 300 dpi in JPEG format. This is the standard output of a scanner and 300 dpi should be enough for a good OCR job. I then submitted the document to these short-listed services and reviewed the output document. The focus was on ease of uploading your docuent to these services, the time it took to return the digital content and the accuracy fo the same.
free-ocr.com
This was the first search result for me. The interface is simple and clear. You can upload files in five formats but each file is lomited to 2 MB file size. To protect the service from abuse, Re-Captcha has been implemented. The choice of input language is also very wide; you can choose out of 19 languages and Dutch is an option.The output was available in little time and was presennted in ascii text. It was easy to copy and paste into another service.
The output, sadly, did not look very nice. About 20 percent of it was not recognized correctly and hence it was not a coherent document that could be translated by Google.
onlineocr.net
This was the second service in the list. This service has advanced features whre you can register with them and login. That way you can store the outputs of your OCR inputs. Without logging in, you can use their service in 'guest mode'where you can upload upto 15 documents per hour.You can upload documents in five formats. The maximum file size is 4 Mb, better than the competition. The interface where you upload the document needs some improvement. You select the file, specify the source language and enter a simple captch text. Then you go up a litto to click on the 'Recognize'button, which seems to be a bit odd.
This is where the bad news stops. The time to present the output is minimal, and the output quality is very good. I found only very few issues, like nl being recognized as n1 in the URL. Also, the end of one paragraph and the beginnning of the next was sometimes not clear, but that is a really minor annoyance. The translation was a great success as well.
newocr.com
Perhaps the best service of all that were tested. The input interface is really easy. Select the document to upload and select the source language. The selection of languages is really large, almost seventy languages, including Tamil, Telugu and other Indian languages There is no mention of an upper limit of the file that can be uploaded, but I did not hit that limit with the 650 KB file I was using. You then click the Preview button. This step takes a while, but the result is great.
The preview screen shows you the whole file you have uploaded. You are provided with an overlay not unlike those given by scanning software. You can then restrict the OCR effort to a sub-section of the document, and ignore things like logos on the top and the statuary footer information. This makes the OCR output clear, simple and precise. You also have here the option to rotate the input file and also perform page layout analysis; split multi-column text into columns. This is great for documents lie fine print of services and bills, instruction pages of forms for the Genmeente and the IND and the like.Once you set the area of the document you wish to be recognized, you click the OCR button. In a few seconds, you get the recognized text of the selected area of the document. The output I received was
free-online-ocr.com
Comments