Thursday, February 24, 2005

Lost in Translation

I wish I could have talked with some more people in the industry or on the research side of machine translation, but I couldn't get it coordinated with my deadline for the article. Hopefully some knowledgeable readers will add to the content... the beauty of blogs. This is going up tomorrow.

UPDATE: Since the old AlwaysOn site was taken down and posts were not properly transferred, I'm pasting my article from my original word doc here:

The Blogosphere: Lost in Translation?
It could be if translation technology can't keep pace with the instantaneous, spontaneous nature of communications on blogs and social networks.

Last week I was sitting with Vassil Mladjov (AO streaming guru) at Canvas CafĂ© talking about various business ideas and he briefly brought up the subject of online translation software. I asked how the quality of the translation was, and he said at least in Russian to English it’s good enough to get the basic ideas. From my experience, Korean to English was horrible beyond basic translation of words and short phrases. Anyway, I was not strongly interested at the time, so I carried on for a few more sentences and we transitioned into another topic. Last Friday as I did my daily read through of various technology news rags, I came across an article from Reuters, “Google Online Book Plan Sparks French War of Words”, that discussed the “war cry” France’s national library raised on concerns that Google’s efforts to put some of the world's great libraries on the Internet does not lead to a “domination of American ideas” and the English language. Even though I was somewhat amused by the French, began to think about how much of a real concern this was to the rest of the world and wondered if language would become such a barrier to the communication and transfer of ideas in the future. I thought about my mother’s favorite Christian philosopher, Jacque Ellul, since she tried to force his books on me as much as Allan Bloom’s “The Closing of the American Mind” during my high school years. If his books weren’t translated, my mother would never of experience her joys and I would have escaped the torment of her persistence.

Then I thought if Ellul had a blog, how would someone like my mother be able to benefit and be edified from his writing since it would be in French? I began to think about other topics and issues around the globe that people would be interested in reading about from people on ground zero. What would other Iraqis, besides Mohammed and Omar from Iraq The Model blog and who do not have knowledge of English, write about if there was a blog service in their native tongue? Or listening to the voices from Rwanda’s horrible past? Or learning about the latest Japanese gadgets from Japanese bloggers months or years before it they the pages of WIRED Magazine or Engadget?

So it became clear to me that the one of the next next things for the blogosphere is instantaneous translation of languages for blogs. English to Spanish, French to English, Arabic to Japanese, etc. Even though “globalization” is an old, hot buzzword and people consistently talk about a “smaller world,” blogs and social networks are driving it smaller. People talk about dominance of Asians on Friendster, large Brazil contigent on Orkut, or the changing mix on LinkedIn. In the blogosphere, there is a growing world of timely, relevant and important information on all subject matters in many languages. To be limited by language would be an unfortunate bind.

The technology for translating text within documents or the web is called Machine Translation (MT). Some of the leaders are France-based SYSTRAN and Israel-based Babylon. Companies such as Ford, Cisco, and Google use SYSTRAN’s MT technology. Have you ever seen “translate this page” when a foreign language site comes up in Google? SYSTRAN. Have you ever clicked through? If you try Korean for a news article or blog post, the translation results are at best 50% and typically worse. IDC states:

“MT systems work with natural language - a data set that is infinitely varying, ambiguous, and structurally complex. To translate adequately, an MT system must encode knowledge of hundreds of syntactic patterns, variations, and exceptions, as well as relationships among these patterns… A human translator prioritizes and selectively applies linguistic rules based on this knowledge. MT software, unless explicitly coded for each possibility, cannot. Thus, MT will never attain the overall quality of human translation. The primary advantages of MT over human translation are speed, cost, and consistency.”

Since the current technology is far from adequate at best, it seems it might be several years down the road before we can reap the benefits of MT technology capable of translating Japanese blogs on the cool gadgets and the Italian blogs on the latest wines.

