Need a translation of a PDF file? Google Docs does it for you

Google recently added Optical Character Recognition (OCR) in multiple languages to Google Docs. So you now can use Google Docs to convert PDF files to text and translate them. Does it work well? Check it out yourself as follows:

1. Go to Google Docs and click the Upload button.

2. Check the Convert text from PDF or image files to Google Docs document box. Then select the language in which the PDF is written from the pulldown field. Very important: Since Google Docs uses OCR to convert the PDF to text, it is very important to select the correct language so that Google can recognize the word and character structures.

3. Click the Start Upload button.

4. When the upload has completed, open the file in Google Docs. From the main menu, select Tools-Translate Document.

5. Enter the file name which you wish to assign to the translated file  and click OK. The translation is displayed in Google Docs.

I tested this new feature on a few good quality PDFs. While the OCR seems to work fairly well, the combined functions of OCR and translation did not return good results and the translations came out poorly. But who can complain? Google Docs is free and now has a new feature which will be very useful to many people.

See also: last year I wrote a blog post on how to use Google Translate for translating a PDF file.

4 thoughts on “Need a translation of a PDF file? Google Docs does it for you”

  1. By combining two well-below-100%-reliable technologies such as character recognizion (OCR) and machine translation will multiply the errors the system makes. Even small error in recognizing characters (that a human would not care about) will cause that the machine is translates wrong text, causing unpredictable results. In effect, you at least double the already high error rate. This obviously severely limits the usability of OCR+machine translation but one should remember that this technology is trying to solve a problem for which there are currently no other solutions! Therefore I think this solution from Google has it’s position in the market, despite its clear problems.

    Brief note about Google’s PDF translation technology in general: it completely removes images and charts from the PDF thus making the translated document sometimes completely useless. This has created a market niche for companies like us: we have developed a tool for machine translating PDF files preserving the images and charts in the document.

