In the race to develop artificial intelligence and the problems stemming from its expansion, Europeans have engaged in a race to create their own artificial intelligence chatbots in order to protect their economies, culture and even their languages.

From Madrid to Sofia, European Union countries have begun supporting a slew of initiatives aimed at creating chatbots that speak local languages ​​fluently, Politico reports.

The latest AI powered tools, such as the popular chatbot ChatGPT, are based on “Large Language Models” or LLMs. Language is at the core of these innovations, and the EU, a Tower of Babel with 24 official languages, from Lithuanian to Maltese, wants this technology to work within its own cultural context.

Case in point is the statement by French Economy Minister Bruno Le Maire at a technology event in Cannes in February, who said “we don’t want to be content with just English… Moving forward, we don’t want our language to be weakened by algorithms and artificial intelligence systems.”

The United States is leading innovation from ChatGPT OpenAI and its big supporter, Microsoft, Google with the Gemini model while Anthropic, Meta and Elon Musk’s xAI are also in the race to build leading models.

The speed of American industry has caused European governments to worry as they fear a repeat of the dominance of American companies as in the age of social media and Web 2.0.

From academic ventures to government-funded masterplans to startups and groups of indie coders, Europe is fighting back against the California behemoths. In the last year alone, 13 European countries have announced or taken steps to develop local models focused on their local languages, according to a POLITICO investigation.

Most of the existing or developing projects are open source, in an effort to bridge the computing and funding gaps with the US, relying on a huge community of volunteer developers.

For some countries, such as Spain, models in the local language could help strengthen their influence in places culturally and historically connected to them. Madrid, which is funding the creation of a Spanish-speaking LLM based on a set of high-quality Spanish content for artificial intelligence training, seeks through this technology closer cooperation with Ibero-American countries.

France has also spearheaded the creation of Alt-EDIC , a consortium of 12 EU countries dedicated to cooperation within the bloc to develop LLMs in European languages.

It gets lost in translation

The irony is that to be truly competitive, European LLMs will need to be fluent in English, which remains the language of most of the world’s scientific work and accounts for just over half the pages on the world wide web, according to online surveys by W3Techs .

“There is a power imbalance in the quantity and quality of training data: just look at how big the English Wikipedia is compared to its versions in other languages,” said Sebastian Ruder, a researcher at the Canadian-based multilingual AI firm Cohere .

Some US-made LLMs know languages ​​other than English, but they don’t always have the proficiency and detail needed to respond to local users.

For chatbots designed to interact through entire conversations from a country’s citizens to a company’s customers, this can pose problems. An August 2023 “cultural alignment” assessment by researchers at University College London found that OpenAI and Google’s LLMs do not meet cultural standards in countries such as China, Saudi Arabia and Slovakia.

As artificial intelligence becomes entrenched in all social sectors, the impact of such culture clashes could be significant. Kris Shrishak, technology fellow at the Irish Council for Civil Liberties, said: “A US tech company can train their model in, say, Lithuanian, but that’s damaging. So he usually trained it in English and then improved it.”

The solution, according to Ruder, is for European AI developers to train their bots in both their own language and in English, thereby allowing the LLM to tap into the knowledge encoded in English when speaking its native language.