Key NLP Resources & Collaborations

Your main access points for datasets, models, and long-term collaborations

Explore the core sources powering Arabic NLP research — models, datasets, repositories, and institutional collaborations.

Featured Arabic NLP Resources

AraFinNews

AraFinNews

212,500 financial news articles from Argaam.com for summarisation, event extraction, and financial NLP.

ArabJobs

ArabJobs

8,546 job ads with metadata for gender, salary, profession, and dialect-sensitive labour market analysis.

MCWC

MCWC

Multilingual constitutions from 191 nations, aligned for legal NLP and MT research.

Kalimat

Kalimat

18,256 cleaned Arabic news articles in a modern ML-ready format.

Habibi

Habibi

30,000+ Arabic song lyrics across 18 countries for dialect, authorship, and cultural NLP.

EASC

EASC

EASC contains 153 documents and 765 human-made summaries. A foundational resource for Arabic extractive summarisation.

MultiLing

MultiLing

Multilingual single-document and multi-document abstractive summarisation in over 10 languages.

Arabic Dialects

Arabic Dialects

A bivalency and code-switching focused corpus covering five major Arabic dialect varieties.

DARES

DARES

DARES: Dataset for Arabic Readability Estimation of School Materials - Transformers based evaluation of readability of school materials in Saudi Arabia from grades 1 to 12.

Arabic Dialects

OSMAN

Osman is a readability metric for Arabic that functions as the closest counterpart to the Flesch measure. The repository also includes a parallel Arabic–English dataset from the UN corpus. .



Featured Corpus

Browse Corpora



All Corpora

Loading...