Statistical machine translation live

Friday, April 28, 2006 at 4/28/2006 03:40:00 PM



Because we want to provide everyone with access to all the world's information, including information written in every language, one of the exciting projects at Google Research is machine translation. Most state-of-the-art commercial machine translation systems in use today have been developed using a rules-based approach and require a lot of work by linguists to define vocabularies and grammars.

Several research systems, including ours, take a different approach: we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model. We have achieved very good results in research evaluations.

Now you can see the results for yourself. We recently launched an online version of our system for Arabic-English and English-Arabic. Try it out! Arabic is a very challenging language to translate to and from: it requires long-distance reordering of words and has a very rich morphology. Our system works better for some types of text (e.g. news) than for others (e.g. novels) -- and you probably should not try to translate poetry ... but do stay tuned for more exciting developments.

Update: We've just opened a discussion forum for all topics related to machine translation.

Update: Fixed broken link to NIST results.

8 comments:

Ken said...

Our company is proud to feature the Google Translate Tool: http://www.analtech.com/analtech.com-uses-google-translate-to-offer-site-in-more-than-20-languages.html

AK said...

Try translating the following word from Arabic to English with GT:
غاصب

I get:
Mystified vision also precluded discrimination
Word un-alignment or statistical learning gone horribly wrong? ;-)

Edward J. Yoon said...

Hmm. English-Korean, Korean-English translation of corpus seems really bad yet.

BTW, How/where to gather translated texts?

The map/reduce used for summing the probabilities of all alignments?

asiri said...

Link : very good results

is a broken link.
Can you fix it.

Beto said...

What about Spanish-English translations? More useful than Arabic.

lycos3 said...

Thats an awesome technology. I really like to know more about it. I am more interested in knowing it cuz of the fact that google has its own verstile way of doing things so i am waiting for something excieting

Amit Rathod said...

Do you have Gujarati-English & English-Gujarati translator?

A CS Student said...

Malayalam???