{
"id":"ar.45",
"word":"عين",
"gloss":"عضو الإبصار في ...",
"pos":"n",
"electra":[0.4, 0.3, …],
"bertseg":[0.7, 2.9, …],
"bertmsa":[0.8, 1.4, …],
}
{
"context_id":"context.301",
"context":"يأتي برمجان اللغة العربية...",
"word": "اللغة",
"gloss_id":"gloss.205",
"lemma_id":"ar.301",
}
The dictionary data contains word, gloss, and gloss ID. The dictionary data is derived
from the "Contemporary Arabic Language Dictionary" by Ahmed Mokhtar Omar (Omar, 2008). As
a concrete instance, here is an example from the WSD dictionary:
{
"lemma_id": "ar.301",
"gloss_id":"gloss.205",
"gloss":" كُلُّ وسيلة لتبادل المشاعر والأفكار كالإشارات ..."
}
{
"lemma_id": "ar.301",
"gloss_id":"gloss.211",
"gloss":"علم يخْتص بدراسة اللُّغة دراسة منهجيَّة في إطار..."
}
Task | Train | Dev | Test |
---|---|---|---|
RD Entries | 31,372 | 3,921 | 3,921 |
WSD Entries | 22,404 | 2,801 | 2,801 |
WSD dictionary | 15,865 |
Model | Embedding | Dev | ||
---|---|---|---|---|
Cosine similarity | MSE | Rank | ||
CamelBERT | bertmsa | 81.85 | 21.95 | 1.09 |
bertseg | 84.36 | 5.55 | 1.26 | |
electra | 51.13 | 24.28 | 3.34 | |
MARBERT | bertmsa | 69.48 | 50.16 | 3.34 |
bertseg | 76.03 | 8.18 | 3.34 | |
electra | 73.68 | 14.57 | 0.84 |
Model | Dev | Test |
---|---|---|
CamelBERT | 91.54% | 91.61% |
AraBERTv2 | 91.32% | 91.25% |
E5 +LSTTM | 89.50% | 88.83% |
{
"context_id":"context.301",
"gloss_id":"gloss.305",
"ranking_score": 0.9
}
{
"context_id":"context.301",
"gloss_id":"gloss.466",
"ranking_score": 0.7
}