The KSAA-2026 Shared Task introduces a new multimodal benchmark focused on transforming raw Arabic speech transcripts into fully diacritized text. This challenge targets diacritic restoration, a persistent and unresolved problem in Arabic NLP, stemming from lexical ambiguity, syntactic variation, and the absence of diacritics in most written texts.
Automatic diacritization of speech dictation remains a challenging task due to the mismatch between speech-based transcriptions and traditional text-only diacritization approaches. While ASR systems often produce undiacritized or partially normalized text, text-based diacritization models are not designed to leverage acoustic information. This shared task aims to bridge this gap by focusing on speech-aware diacritization.
This shared task provides two subtasks:
Participants need to register via this Link
https://forms.office.com/r/KF4bvNNASP?origin=lprLink
CODABENCH platform Link
https://www.codabench.org/competitions/11859/
The dataset consists of approximately five hours of Arabic speech audio collected via VoiceWall, a crowdsourcing audio platform developed by the King Salman Global Academy for Arabic Language. The recordings were obtained from male and female speakers and cover Modern Standard Arabic (MSA) as well as Arabic dialectal speech.
All utterances are short, with a maximum duration of nine seconds, to support accurate speech–text alignment and automatic diacritization. The recordings span multiple domains and underwent automatic validation and manual review to ensure audio quality and transcription accuracy.
The annotation process involved aligning the speech with written transcripts and ensuring diacritic accuracy. Multiple layers of quality control were implemented, including file normalization, systematic labeling, and manual reviews of diacritization to guarantee consistency and reliability.
Task 1: Data Contribution
Each team is required to contribute at least one hour of speech data. All submitted recordings will undergo automatic validation followed by a manual review conducted by another member within the same team, based on a shared evaluation guideline provided to all participants.
Contributors are required to follow predefined transcription and diacritization guidelines to ensure consistency between speech content and textual representation.
After validation, the contributed data will be released to all participating teams to support fair benchmarking and encourage continuous dataset growth and diversity.
Task 2: Automatic Diacritization of Speech Dictation
In this task, participants are required to build systems that take speech audio and undiacritized transcripts as input and generate fully diacritized Arabic text.
The task requires predicting full Arabic diacritics at the character level, including fatḥa, ḍamma, kasra, sukūn, shaddah, and tanwīn marks, for each character in the undiacritized text.
The table below illustrates the input–output data structure for the task.
|
Input |
Speech clip |
|
|
Undiacritized transcript |
أريد أن أشرب كوبًا من الشاي |
|
|
Output |
Diacritized transcript |
أُرِيدُ أَنْ أَشْرَبَ كُوبًا مِنَ الشَّاي |
Task 1
The data contribution will be evaluated based on the duration and quality of the contributed speech recordings:
Task 2
Systems are evaluated using three complementary metrics:
Among these metrics, Word Error Rate (WER) is considered the primary evaluation measure, as it requires the full word to be diacritized correctly and therefore provides a stricter assessment of system performance. Diacritic Error Rate (DER) and Sentence Error Rate (SER) are reported as complementary metrics to provide finer-grained and sentence-level analysis.
To ensure a comprehensive and transparent evaluation, results are reported under two evaluation settings that reflect different levels of linguistic difficulty:
Case endings (iʿrāb) correspond to the final-word diacritics that encode grammatical roles and represent the most challenging aspect of Arabic diacritization due to their strong dependence on syntactic context.
We provide two baseline systems corresponding to the two participation tracks. These baselines are intended as reference implementations to illustrate the task setup and are not optimized for performance.
Baseline results are reported under both evaluation settings described above to illustrate the impact of case endings (iʿrāb) on system performance.
|
Evaluation Setting (%) |
Text+ASR |
Text-only |
Fine-Tuned Text+ASR |
|||||||
|
DER |
WER |
SER |
DER |
WER |
SER |
DER |
WER |
SER |
||
|
Including no diacritic |
With case ending |
16.16 |
47.96 |
86.54 |
19.38 |
54.21 |
95.00 |
10.70 |
36.60 |
90.77 |
|
Without case ending |
10.98 |
28.22 |
79.62 |
13.49 |
32.07 |
85.38 |
7.47 |
21.35 |
76.92 |
|
|
Excluding no diacritic |
With case ending |
17.57 |
43.33 |
83.08 |
22.28 |
51.44 |
94.62 |
12.04 |
34.32 |
89.23 |
|
Without case ending |
10.72 |
22.21 |
73.08 |
14.35 |
27.29 |
82.31 |
7.78 |
18.39 |
73.85 |
|
* Lower is better
|
[ { "id": "utt_00123", "text_diacritized": "النص المُشَكَّل هنا" }, { "id": "utt_00124", "text_diacritized": "هَذا نَصٌّ مُشَكَّلٌ آخَر" } ] |
Participants must generate a fully diacritized transcript
for each provided audio + undiacritized text pair.
Output format will follow a simple JSON structure (released
with the final data package):
We are pleased to announce the awards for the Shared Task at LREC 2026. The top-ranked teams in each task will receive cash prizes as follows:
The winners will be determined based on the official
evaluation metrics specified for each task. Best of luck to
all the teams, and we look forward to announcing the winners
at the conclusion of the competition!
Email: aalwazrah@ksaa.gov.sa , ralrasheed@ksaa.gov.sa