O grupo no qual você está postando é um grupo da Usenet. As mensagens postadas neste grupo farão com que o seu e-mail fique visível para qualquer pessoa na internet.
I see a number of very positive and useful suggestions being offered
on this Group, but I rarely see any comments by GT staff. There is
only a small number of comments per day, not exactly overwhelming.
For example, I had a simple question and suggestion regarding words
that are not translated. It would seem like GT could provide a simple
answer, and indeed, would wish to incorporate the suggestion. But
simply no reply.
Permit me to repeat that recommendation, and to offer another straight-
forward issue that so far have been completely ignored by GT.
-------------
Untranslated words.
When GT is unable to translate a word, it simply inserts the same word
into the translation. There is no indication that this is not a valid
translation but simply the original word. It is very difficult to find
such errors, especially in similar languages such as English <> Dutch.
Since GT knows that it cannot translate the word, it would be easy to
the flag the translation with something like *gobblygoop* that would
alert the user to the fact that GT found no suitable translation. In
other words, use an * or two *s, or something else, to flag
untranslated words.
-------------------
"The quick brown fox jumps over the lazy dog." (a standard test
sentence) is not translated AT ALL from English into Dutch. I
submitted the correct translation a couple of times in the last year,
but no improvement in GT.
There is one person from Google who posts here, and apparently several
more who read without posting. That they don't say anything doesn't
mean they don't hear you. Maybe they aren't allowed to say exactly
what they are working at. I would be surprised if they weren't working
at some sort of transliteration/pronounciation scheme, since it's the
#1 most requested feature.
Also remember that although there isn't all that much volume on the
group, there are a lot of weird and ill-explained suggestions, a lot
of questions that are explained in the FAQ, etc. It would not be
surprising if they missed a few good ones in all the noise.
"The quick brown fox jumps over the lazy dog" is a classic test
sentence to help you judge the look of a font, since it includes every
letter in the (english) alphabet at least once - that's what they call
a pangram. If you would translate it literally, for example to
Norwegian ("Den raske brune reven hoppper over den late hunden") it no
longer is a pangram!
In most contexts, the correct translation would be to a pangram in the
target language, like for Norwegian: "Høvdingens kjære squaw får litt
pizza i Mexico by" (The chief's dear squaw gets some pizza in Mexico
city). You can't really expect the translator to come up with pangrams
on its own, it has to learn them from somewhere. It's not really
surprising that it hasn't learned a dutch pangram yet (like "Pa's
wijze lynx bezag vroom het fikse aquaduct" - was it that one you
submitted?).
I think there may be some sort of sanity check in the suggestions
system to prevent suggestions that have a meaning widely different
from what the translator would believe. The translator would probably
expect (wrongly) that a translation of "The quick brown fox jumps over
the lazy dog" should include some words about dogs and foxes. I've
tried to teach the translator that the French game "La Guerre des
Moutons" is called "Wooly Bully" in English, but it still hasn't
picked it up...
But on the whole, I appreciate this sanity check, otherwise we'd see a
lot more translations about where to get viagra!
> "The quick brown fox jumps over the lazy dog" is a classic test
> sentence to help you judge the look of a font, since it includes every
> letter in the (english) alphabet at least once - that's what they call
> a pangram. If you would translate it literally, for example to
> Norwegian ("Den raske brune reven hoppper over den late hunden") it no
> longer is a pangram!
Yes. However, Google Translate should not be a place where people go
to find out what are the pangrams in other languages. It should be a
place where a person comes to find out what the sentence of the
pangram in one language is in another language. In other words, it
should be the translation of the sentence, not the replacement of one
pangram with another. That is more the function of professional
translation software or Wikipedia, etc.
My other suggestion, that untranslated words be clearly identified, is
so basic and fundamental that I can't believe that it is not already a
part of Google Translate. Can you imagine having professional
translation software that didn't tell you when it was unable to
translate a word, but simply used the same word as if it were the
correct translation. You would surely object. You would quickly lose
trust in it.
Maybe if Google Translate people would provide some feedback on this
Forum, the ratio of good to bad posts would significantly improve. Not
many people are going to bother posting here if it appears nobody from
GT is even reading them.
This is essentially the difference between human quality and near
human quality. In Arabic I am simply asking for technical accuracy.
Like 4 times the temperature instead of the 4th power and the correct
surface area for a sphere.
On Sep 16, 9:16 am, Harald Korneliussen wrote:
> There is one person from Google who posts here, and apparently several
> more who read without posting. That they don't say anything doesn't
> mean they don't hear you. Maybe they aren't allowed to say exactly
> what they are working at. I would be surprised if they weren't working
> at some sort of transliteration/pronounciation scheme, since it's the
> #1 most requested feature.
Not only that but Google is working on robots and avatars. Why can't
an Arab or Chinese avatar (which I am sure they are working on) be
incorporated into Google Translate?
> Also remember that although there isn't all that much volume on the
> group, there are a lot of weird and ill-explained suggestions, a lot
> of questions that are explained in the FAQ, etc. It would not be
> surprising if they missed a few good ones in all the noise.
> "The quick brown fox jumps over the lazy dog" is a classic test
> sentence to help you judge the look of a font, since it includes every
> letter in the (english) alphabet at least once - that's what they call
> a pangram. If you would translate it literally, for example to
> Norwegian ("Den raske brune reven hoppper over den late hunden") it no
> longer is a pangram!
> In most contexts, the correct translation would be to a pangram in the
> target language, like for Norwegian: "Høvdingens kjære squaw får litt
> pizza i Mexico by" (The chief's dear squaw gets some pizza in Mexico
> city). You can't really expect the translator to come up with pangrams
> on its own, it has to learn them from somewhere. It's not really
> surprising that it hasn't learned a dutch pangram yet (like "Pa's
> wijze lynx bezag vroom het fikse aquaduct" - was it that one you
> submitted?).
> I think there may be some sort of sanity check in the suggestions
> system to prevent suggestions that have a meaning widely different
> from what the translator would believe. The translator would probably
> expect (wrongly) that a translation of "The quick brown fox jumps over
> the lazy dog" should include some words about dogs and foxes. I've
> tried to teach the translator that the French game "La Guerre des
> Moutons" is called "Wooly Bully" in English, but it still hasn't
> picked it up...
That surprises me. Google is said to work on n-grams. Can't "guerre
des moutons" be put in as a n-gram.
I found these references on a simple Google search. The French one was
the top of the list. The second reference actually gives "guerre des
moutons" and then translates as "war of the sheep".
Check, the Wiki reference is top of list in BOTH cases. Google is
capable of finding out these things in a straight search. Why can't it
make use of such information when translating.
Here is a clear example of why it is so important for Google Translate
to clearly mark words that it can't translate.
From English into Dutch:
The arm of the man was extra large.
De arm van de man was extra groot.
That is a perfectly good translation because arm, man and extra are
the same in English and Dutch. However. if you replace man with a non-
word or difficult word that GT cannot translate:
The arm of the goroply was extra large.
De arm van de goroply was extra groot.
How is the user to know that goroply was not translated? I assume this
problem occurs in all language pairs in GT. For languages that are
completely different, that is not a problem. But for all those that
are similar (and that is many), then it is a serious problem. The user
would have to question every word that was translated as the same
word. For language pairs like English <> Dutch, that is a lot of
words. Why can't GT simply mark untranslated words?
> > "The quick brown fox jumps over the lazy dog" is a classic test
> > sentence to help you judge the look of a font, since it includes every
> > letter in the (english) alphabet at least once - that's what they call
> > a pangram. If you would translate it literally, for example to
> > Norwegian ("Den raske brune reven hoppper over den late hunden") it no
> > longer is a pangram!
> Yes. However, Google Translate should not be a place where people go
> to find out what are the pangrams in other languages. It should be a
> place where a person comes to find out what the sentence of the
> pangram in one language is in another language. In other words, it
> should be the translation of the sentence, not the replacement of one
> pangram with another. That is more the function of professional
> translation software or Wikipedia, etc.
Oo, no no no no no. Unlike traditional machine translation systems, GT
is in principle capable of recognising language features like this (it
hasn't for this one, yet, but the fact that it's left untranslated
probably means GT has already noticed there's SOMETHING unusual about
that sentence!) This is what's awesome about GT, why would you want it
different?
There hardly is such a thing as a literal translation anyway.
Translation always needs context. Sometimes "the quick brown fox..."
is a pangram, and should be recognized as such, just as "Little Rock"
should not be translated to "Lille Stein" if we're talking about the
place in Arkansas!
The request for HTML/XML tags for no translation has been made some
time ago. It does not seem that complicated to me. It would be a
useful facility if you want to design a multilingual website.
> I see a number of very positive and useful suggestions being offered
> on this Group, but I rarely see any comments by GT staff. There is
> only a small number of comments per day, not exactly overwhelming.
> For example, I had a simple question and suggestion regarding words
> that are not translated. It would seem like GT could provide a simple
> answer, and indeed, would wish to incorporate the suggestion. But
> simply no reply.
> Permit me to repeat that recommendation, and to offer another straight-
> forward issue that so far have been completely ignored by GT.
> -------------
> Untranslated words.
> When GT is unable to translate a word, it simply inserts the same word
> into the translation. There is no indication that this is not a valid
> translation but simply the original word. It is very difficult to find
> such errors, especially in similar languages such as English <> Dutch.
> Since GT knows that it cannot translate the word, it would be easy to
> the flag the translation with something like *gobblygoop* that would
> alert the user to the fact that GT found no suitable translation. In
> other words, use an * or two *s, or something else, to flag
> untranslated words.
> -------------------
> "The quick brown fox jumps over the lazy dog." (a standard test
> sentence) is not translated AT ALL from English into Dutch. I
> submitted the correct translation a couple of times in the last year,
> but no improvement in GT.
Could I take a simple example. If document is in Arabic why can't
there be a rule that everything in any other script (Roman or Greek)
is a formula? In that way we would get the surface area of a sphere
correct. There are other means of detecting formulae, like having
single letters, symbols and numbers.
> What happens if a person who knows nothing about a pangram, (and
> couldn't care less) wants to translate the following sentence into
> Dutch?
> "A quick brown fox chased a rabbit into the woods."
GT's strength and limitation is that it's a statistical translation
system - I really suggest you read up on it if you haven't.
Statistically, people are very unlikely to want to talk about quick
brown foxes, or even brown foxes, unless it's in the context of that
pangram, or someone making an oblique reference to it (such as the
Quick Brown Fox design consultancy in Dublin, Ireland). The google
translator _will fail_ when faced with sentences that are very
unlikely - just how unlikely they can be allowed to get while still
giving sensible translations, is what improving the system is all
about.
Statistical translation has the potential to _both_ translate "A quick
brown fox jumped over... " into the appropriate pangram/example
phrase, and still translate your person's query correctly - although
it doesn't today, it may eventually encounter enough different cases
that it "understands" what to do with them. (But as I said, since non-
pangram quick brown foxes are extremely rare, it's no surprise it
doesn't yet).
Yes. Quick brown fox very rarely means a fast brown fox. So, yes, GT
goes with the vast majority. I have read quite a bit about statistical
translation, but there is not much information about how ST is
corrected when it gets things wrong. You have to have a mix of
statistics and rules, and they fight each other. That is probably the
reason there doesn't seem to be much happening when GT users submit
improvements. How do they tell GT to ignore the statistics and use a
look-up table for some words? And how is GT going to handle things
like a new phrase that becomes very popular, such as the name of a hot
new music band with a name such as "The Blue Sky"? To borrow from
another recent post in this Forum, why hasn't GT picked up on 't as an
abbreviation for "het" (the) in Dutch? It occurs probably millions of
times. And for all the phrases that are very common in a language but
that rarely if ever occur in another language, do all those have to be
entered manually? For example: "Tis the season to be jolly." is from a
song but now popularly refers to Christmas. It gets 7, 540,000 hits in
Google. But GT fails on "tis" into Dutch. Sure, GT does great when
there is a lot of very accurate parallel text available, such as EU
parliament and Canadian parliament transcripts. But the overwhelming
portion of the Internet isn't even vaguely parallel. So much as I like
GT (I use it a lot), it may have a lot of trouble getting much better,
unless a huge amount of "manpower" is applied to help balance the
statistics and the rules.
>For example: "Tis the season to be jolly." is from a
> song but now popularly refers to Christmas.
I entered "'tis the season to be jolly, falalalala la la lala
la" (notice the initial ' ) and got out
"det er sesongen for å være blid, falalalala la la lala la"
Can't judge the Dutch translation, but this one is pretty OK (though
not perfect: "season" should become "årstiden" or "tiden", not
"sesongen", which is more the kind of season you have in "hunting
season").
> So much as I like
> GT (I use it a lot), it may have a lot of trouble getting much better,
> unless a huge amount of "manpower" is applied to help balance the
> statistics and the rules.
The essence is that n-grams in the source language are matched to n-
grams in the target language. There is a lot of complexity about
weighting etc., but that is what essentially happens. Google does not
use generics eg. {Proper Name} or {Country}. This is the way in which
a language is taught to humans. Google effectively needs a "Direct
Product" to use the mathematical juju term. The fact that Direct
Products can be large is the reason why US troops committed atrocities
in Burma.
There is also truth. I have spoken about the Stefan Boltzmann law and
the surface area of a sphere. The Vietnam war ended in 1975 and no US
troops have been in SE Asia since then. Google searches establish
this.
> > What happens if a person who knows nothing about a pangram, (and
> > couldn't care less) wants to translate the following sentence into
> > Dutch?
> > "A quick brown fox chased a rabbit into the woods."
> GT's strength and limitation is that it's a statistical translation
> system - I really suggest you read up on it if you haven't.
> Statistically, people are very unlikely to want to talk about quick
> brown foxes, or even brown foxes, unless it's in the context of that
> pangram, or someone making an oblique reference to it (such as the
> Quick Brown Fox design consultancy in Dublin, Ireland). The google
> translator _will fail_ when faced with sentences that are very
> unlikely - just how unlikely they can be allowed to get while still
> giving sensible translations, is what improving the system is all
> about.
> Statistical translation has the potential to _both_ translate "A quick
> brown fox jumped over... " into the appropriate pangram/example
> phrase, and still translate your person's query correctly - although
> it doesn't today, it may eventually encounter enough different cases
> that it "understands" what to do with them. (But as I said, since non-
> pangram quick brown foxes are extremely rare, it's no surprise it
> doesn't yet).
I do appreciate that many people would prefer to see untranslated
words flagged, rather than just left in place. However there are
others who would prefer GT to be left the way it is, myself included.
I am a professional translator, and my specialty is the translation of
technical documents from Dutch into English. I use GT as a tool, and
as such, I am aware that it has limitations. Sometimes the translation
is perfect and can be used as-is, sometimes it can be used with some
rearrangement or re-translation of words, and sometimes the
translation is terrible, even reversing the sense of the original in
some cases. I find the dictionary view of single root words to be
especially useful.
My documents frequently contain words that cannot, or should not, be
translated. Examples would include the names of software objects such
as database tables, etc. It is common to have several of these in a
sentence, and I would be most unhappy if these were replaced by
********'s or some such.
My only real gripe about the service is the non-translation of
abbreviations. I am not talking about common acronyms, but abbreviated
forms such as enz (enzovoorts, and so forth, 'etc') or t/m (tot en met
= up to and including - used for ranges). Abbreviations are very
common in most Dutch texts, and lack of GT support means that most
documents will be improperly translated. I hate to say this, but
Yahoo's Babelfish has the edge here, although this too fails on some
common abbreviations.
If GT is relying on parallel texts from the European Parliament, then
it is likely that these *extremely formal* documents would not contain
any abbreviations and thus fail to be truly representative of the
Dutch language.
> I see a number of very positive and useful suggestions being offered
> on this Group, but I rarely see any comments by GT staff. There is
> only a small number of comments per day, not exactly overwhelming.
> For example, I had a simple question and suggestion regarding words
> that are not translated. It would seem like GT could provide a simple
> answer, and indeed, would wish to incorporate the suggestion. But
> simply no reply.
> Permit me to repeat that recommendation, and to offer another straight-
> forward issue that so far have been completely ignored by GT.
> -------------
> Untranslated words.
> When GT is unable to translate a word, it simply inserts the same word
> into the translation. There is no indication that this is not a valid
> translation but simply the original word. It is very difficult to find
> such errors, especially in similar languages such as English <>Dutch.
> Since GT knows that it cannot translate the word, it would be easy to
> the flag the translation with something like *gobblygoop* that would
> alert the user to the fact that GT found no suitable translation. In
> other words, use an * or two *s, or something else, to flag
> untranslated words.
> -------------------
> "The quick brown fox jumps over the lazy dog." (a standard test
> sentence) is not translated AT ALL from English intoDutch. I
> submitted the correct translation a couple of times in the last year,
> but no improvement in GT.
> It is common to have several of these in a
> sentence, and I would be most unhappy if these were replaced by
> ********'s or some such.
That's not what we want (well, not what I want anyway). Rather, I'd
like an option for these words to be coloured differently in the web
interface. You could still cut and paste, and uploaded documents etc.
would not be affected.
> > It is common to have several of these in a
> > sentence, and I would be most unhappy if these were replaced by
> > ********'s or some such.
> That's not what we want (well, not what I want anyway). Rather, I'd
> like an option for these words to be coloured differently in the web
> interface. You could still cut and paste, and uploaded documents etc.
> would not be affected.
Yes, this could be also good decision. It is also good idea to get
list of untranslated words through ajax-inteface.
"Pa's wijze lynx bezag vroom het fikse aquaduct"
In English: Pa's wise lynx saw devout the firm aquaduct".
The amazing thing is that GT gives this as "Dad's way lynx piously
looked the lazy dog". No kidding - try it yourself.
One other point: the sentence is supposedly a pangram but it *doesn't*
contain the letter 'j'.
> There is one person from Google who posts here, and apparently several
> more who read without posting. That they don't say anything doesn't
> mean they don't hear you. Maybe they aren't allowed to say exactly
> what they are working at. I would be surprised if they weren't working
> at some sort of transliteration/pronounciation scheme, since it's the
> #1 most requested feature.
> Also remember that although there isn't all that much volume on the
> group, there are a lot of weird and ill-explained suggestions, a lot
> of questions that are explained in the FAQ, etc. It would not be
> surprising if they missed a few good ones in all the noise.
> "The quick brown foxjumps over the lazy dog" is a classic test
> sentence to help you judge the look of a font, since it includes every
> letter in the (english) alphabet at least once - that's what they call
> a pangram. If you would translate it literally, for example to
> Norwegian ("Den raske brune reven hoppper over den late hunden") it no
> longer is a pangram!
> In most contexts, the correct translation would be to a pangram in the
> target language, like for Norwegian: "Høvdingens kjære squaw får litt
> pizza i Mexico by" (The chief's dear squaw gets some pizza in Mexico
> city). You can't really expect the translator to come up with pangrams
> on its own, it has to learn them from somewhere. It's not really
> surprising that it hasn't learned a dutch pangram yet (like "Pa's
> wijze lynx bezag vroom het fikse aquaduct" - was it that one you
> submitted?).
> I think there may be some sort of sanity check in the suggestions
> system to prevent suggestions that have a meaning widely different
> from what the translator would believe. The translator would probably
> expect (wrongly) that a translation of "The quick brown foxjumps over
> the lazy dog" should include some words about dogs and foxes. I've
> tried to teach the translator that the French game "La Guerre des
> Moutons" is called "Wooly Bully" in English, but it still hasn't
> picked it up...
> But on the whole, I appreciate this sanity check, otherwise we'd see a
> lot more translations about where to get viagra!