Following up on the previous post and Mark Liberman’s Language Log post, I thought I’d delve a little deeper into the Babel Fish translation of the opening paragraph of the Leiden University newsletter article on the WNT going online. In Dutch, it goes as follows:

Met ingang van zaterdag 27 januari is het Woordenboek der Nederlandsche Taal (WNT) voor iedereen gratis op het internet te raadplegen. Is dit nieuws alleen van belang voor neerlandici, filologen en taalkundigen? “Magnifiek”, reageert Harm Beukers, hoogleraar geschiedenis van de geneeskunde.

I gave a translation in my previous post, which I will modify here slightly to be more literal:

As of Saturday 27 January, the Dictionary of the Dutch Language will be on the internet for everyone to consult for free. Is this news only important to scholars of Dutch, philologists and linguists? “Magnificent,” responds Harm Beukers, professor in history of medicine.

Altavista’s Babel Fish service gives the following translation, as already provided by Mark in his LL post:

As of Saturday 27 January the dictionary is for free consult language (WNT) for everyone of the Nederlandsche on the Internet. Is this news important only for neerlandici, philologists and linguists? “magnificent”, Harm tired cherry, hoogleraar history of medicine react.

Surprisingly, the type of quotation marks used around magnifiek matters to Babel Fish. I used double quotes, whereas Mark used single quotes, resulting in a slightly different translation:

  • “Magnifiek”, reageert Harm Beukers, hoogleraar geschiedenis van de geneeskunde. is translated as “magnificent”, Harm tired cherry, hoogleraar history of medicine react.
  • ‘Magnifiek’, reageert Harm Beukers, hoogleraar geschiedenis van de geneeskunde. is translated as ‘ magnificently, react Harm tired cherry, hoogleraar history of medicine.

Why is the capital M lost in both cases? Why is the closing quotation mark lost in the single-quotes case? Why is a space inserted after the opening quotation mark in the single-quotes case? Why does the position of react depend on the type of quotation marks?

Moving on to the rest of the text, it is clear that the old-style spelling of Nederlandsche is confusing. Using modern spelling (Nederlandse), the translation is better, but still not good: As of Saturday 27 January the dictionary is for free consult of the Dutch language (WNT) for everyone on the Internet.

The word neerlandici (scholars of the Dutch language) is left untranslated. The singular form, neerlandicus, is also unknown to Babel Fish. The word hoogleraar (professor), a rather common Dutch word and easy to translate, poses another problem.

Most puzzling is the transformation of the name Harm Beukers into Harm tired cherry. If I remove the rest of text, leaving only the name, it yields the same translation. Removing the first name, leaving only Beukers, results in a translation of tired cherry. This makes absolutely zero sense to me. A Google search on {“beukers” “tired cherry”} comes up empty, only adding to my wonder. Where did Babel Fish pull this from? [Update (January 24th, 2007): See the comments for a further discussion and a likely answer.]

On a related note, can anyone tell me what Beukers means as a last name? As a word, it is something like batterers or bashers, from beuken, to batter, to bash, but I doubt it means the same as a name. If no one knows, I’ll consult the WNT in a few days and see what I can find there.

[Update 2 (January 24th, 2007): Mark Liberman posted a follow-up on Language Log, adding his thoughts on Babel Fish’s handling of Beukers and hoogleraar and machine translation in general.]

9 thoughts on “Professor Harm tired cherry”

  1. Beats me. I have never seen it do that before, though – the truth be said – I hardly ever use it, being a living and breathing translator myself. It does the same with the made up name Beubeuk, rendering ‘tired beech’. That would seem to suggest a pattern, but Beueik is left untranslated.

  2. The “tired cherry” pattern also holds for other fruits, including those with more than one syllable: beupeer (pear), beuappel (apple), beubanaan (banana), beumandarijn (mandarin) and even beusinaasappel (orange) are all translated as tired X. Don’t like fruits? Babel Fish provides tired vegetables as well, like beusla (lettuce) and beuwortel (carrot). That goes with a beubiefstuk (steak) and some beuaardappelen (potatoes); beupatat (fries/chips) is not on the menu, unfortunately. All of this is served by be(a)utiful, though somewhat weary, beumannen (men) and beuvrouwen (women) in your local beurestaurant.

    Oddly, another Dutch word for tired (in the sense of needing sleep), moe, does not produce tired X with any of the words above. Why not? Or perhaps the better question is, why does it happen with beu? I don’t know any Dutch compound words with beu as the first part, certainly not if the second part is a noun.

    I can also answer the question from my previous comment: Babel Fish does not invent a space. It simply thinks all these beuxyz words are compounds. It goes even further, correctly picking out the adjectives zoete and rode in beuzoetekers and beurodekers to yield tired sweet cherry and tired red cherry.

  3. Somehow Babelfish has been made to conceive of the word ‘beu’ as an extremely productive adjectival prefix (or whatever it is called in linguistics).

    By the way, ‘beuvisser’ yields ‘tired fisherman’…

  4. It seems clear that Babelfish simply does not recognize the open single quote as valid Dutch, so it inserts it into the output as a stand-alone word, the first word in the sentence (though it cannot be capitalized). Since “Magnifiek” is no longer quoted, it must be an adverb modifying “reageert”, and since it is no longer the first word in the sentence, it is not capitalized either.

    Why it can’t deal with “hoogleraar” is a mystery.

  5. The site of the Meertens Instituut says that the name Beukers is related to a profession:

    My 1970 Dikke Van Dale writes that “beuken” means to hit the stockfish until it becomes soft. Apparently, it used to be someone’s profession to do this all day.

    On a Language Log related sidenote: I was a bit surprised to see the Eskimo snow hoax translated into a Dutch water hoax, in last week’s NRC by someone from Leiden university.

  6. For every target language, Babelfish apparently uses diferent algorithms to analyse the source language. Beuker in the Dutch-to-French version is correctly rendered as Beuker…

