The vastness of internet content can shrink or grow according to the language in which the user chooses to browse.
An unprecedented report gives the dimension of the linguistic inequality of the internet in the world: to use the 39 analyzed platforms, which include Wikipedia, YouTube and Facebook, 90% of Africans and Asians depend on a second language.
The State of Internet Languages Report, the first of its kind in the world, brought together the organizations Whose Knowledge?, Oxford Internet Institute and India’s Center for Internet and Society to produce and gather information on linguistic diversity across leading platforms. online. In addition to the institutions, there were more than a thousand collaborators, including speakers, reviewers and translators.
According to the report, more than three quarters of Internet users browse in just ten languages.
25.9% do it in English and 19.4% choose a language from the Chinese family, such as Mandarin. The third group in the ranking, that of Spanish speakers, drops by more than ten percentage points, concentrating only 7.9% of Internet users. 3.7% use Portuguese on the internet, which places the group in sixth position.
The content offered on the internet follows a similar logic — European colonial languages are predominant. Wikipedia, a kind of online and collaborative encyclopedia, is available in more than 300 languages, but in only 20 of them the platform holds more than 1 million articles. Those that support more than 100,000 are only 70.
“Information on places in Europe and North America is highly detailed, while several other regions of the world are relatively underrepresented, especially places in Africa, parts of Asia and other regions of the Global South,” the report says.
This inequality can lead to the paradoxical situation of a user having to change their mother tongue to learn more about their own country.
Lack of content can also be a problem, the report said, especially for already excluded populations.
“When there is information and knowledge in other, more marginalized languages, content in those languages is limited by who has access and power to create it, or to prevent others from producing alternative information,” the report says.
“For example, the lack of feminist online content in Sinhalese [falado no Sri Lanka] or the lack of positive content for LGBT+ people and people with disabilities in Bengali [falado em Bangladesh] or bahasa [falado na Indonésia].”
Just as the knowledge accessed depends on the language, the world also expands or contracts according to the language spoken by the user. The conclusion was made with the investigation of 44 terms in Google Maps, the main geolocation tool in the world. Words included those representing common places such as cafe, church, and hairdresser.
While in English the information is voluminous and dispersed around the globe, in Bengali there are results almost exclusively in India, Bangladesh and Bhutan. There are even those that are barely represented on the map.
“Despite efforts to examine coverage of Zulu and Xhosa in South Africa and Guarani in Paraguay, these languages are barely represented on Google Maps, despite being spoken by millions,” the report concludes.
“Even Swahili — one of the 15 most spoken languages in the world — is virtually absent, and in two of the cities where it is spoken, English content is predominant.”
This reinforces researchers’ impression that African languages are less supported on major platforms.
There are still technical limitations in relation to these languages. Unicode, a standard that allows the encoding and manipulation of texts on computers, currently has 143,859 characters for about 30 writing systems — which do not cover all languages.
The research launch takes place in the year that inaugurates the decade of action for indigenous languages, which runs until 2032 — an initiative defined by UNESCO at the end of 2019, which was already the year of indigenous languages.
“Digital technologies offer us many possibilities to represent the forms of language that are based on texts, sounds, gestures and much more,” states the report.
Of the more than 7,000 languages that exist today, 4,000 have written systems — many of them developed during periods of colonization and incomprehensible to most speakers.
“Languages with an oral tradition don’t fit on the internet we have today”, says in an excerpt from the document Ana Alonso, a Zapotec linguist from Oaxaca, Mexico.
Contrary to what was observed in the study, according to the report, the technology could help preserve languages that are at risk of extinction, the situation of 40% of the 7,000 languages today.
“Each month, two indigenous languages and the knowledge they express die and are lost to us,” says the report.
I have over 8 years of experience in the news industry. I have worked for various news websites and have also written for a few news agencies. I mostly cover healthcare news, but I am also interested in other topics such as politics, business, and entertainment. In my free time, I enjoy writing fiction and spending time with my family and friends.