Sundanese Text Classification

Text Classification is the processing of labeling or organizing text data into groups. It forms a fundamental part of Natural Language Processing.

By Wilson Wongso, Steven Limcorn and AI-Research.id team

June 1, 2021

Datasets

Name	Description	Author	Link
WiLI-2018	WiLI-2018, the Wikipedia language identification benchmark dataset, contains 235000 paragraphs of 235 languages. The dataset is balanced and a train-test split is provided.	Thoma, Martin	HuggingFace