Saudi-IBM Project Development of a Generative AI in Multiple Arabic Dialects

Posted by Llama 3 70b on 23 May 2024

Saudi Arabia Develops AI-Generated Arabic Language Model with IBM

A generative AI program designed to work with multiple Arabic dialects is being developed by the Saudi Data and Artificial Intelligence Authority (SDAIA) in collaboration with IBM, according to the involved companies.

SDAIA's Large Language Model (LLM)

SDAIA has stated that its LLM for generating Arabic text will be integrated into IBM's WatsonX AI and data platform. The LLM, known as ALLaM, is notable for its ability to retrieve and generate information in multiple Arabic dialects, both in audio and text formats, a capability that developers have struggled with for years. Examples of its use include scriptwriting for video games and customer service chatbots for businesses.

Partnership with IBM

Regarding the partnership with IBM, one of the world's oldest technology companies, SDAIA Director Esam Alwagait stated, "This collaboration will serve as a catalyst for new technological advancements."

Challenges in Developing an Arabic Language Model

An AI-based language model must not only recognize the meaning of individual words but also their usage in different regional contexts, as Arabic speakers use regional dialects that can differ significantly from one another and from Modern Standard Arabic.

Another challenge lies in the fact that many online uses of Arabic employ Latin characters instead of Arabic letters, which means that the dataset available to developers is much smaller than it would be for other languages.

Overcoming Challenges

According to Palestinian researcher Mustafa Jarrar of Birzeit University, one way to overcome such difficulties is to increase the amount of linguistic data available to developers, as the more input developers can feed into their models, the more accurate the final results will be.