Mashable published an article under the title “Google Translate Might have a Gender Problem“, and published the evidence of the problem, a series of tweets. The complaint was that Google Translate translates the Turkish phrase “o bir doktor” as “he is a doctor” when in fact the Turkish doesn’t give any gender information.
How did this happen? English uses gendered pronouns; he and she, but not all languages do. Turkish uses one pronoun “o” regardless of gender. Which means that to translate a text from Turkish to English a translator must decide whether to translate ‘o’ as he or she. A human translator will look for evidence within the document to determine which pronoun to use in the translation.
Google Translate works in a different way, it’s essentially a big data project which uses existing translations on the internet and a statistical analysis of the proximity of words in phrases.
So the google translate engine has seen multiple instances where ” o bir doktor” in Turkish was translated as “he is a doctor” in English. Or, where there are few language matches, the frequency of that word sequence is high. In fact another Google tool, ngrams, illustrates how much more commonly we think of doctors as male. Ngrams compares data from books rather than internet sites, but it does reflect how our culture assigns gender to the occupation of doctor.