Sundanese Text Classification

Models and its Dataset for Sundanese Text Classification

Text Classification is the processing of labeling or organizing text data into groups. It forms a fundamental part of Natural Language Processing.

By Wilson Wongso, Steven Limcorn and AI-Research.id team

June 1, 2021

Datasets

Name Description Author Link
WiLI-2018 WiLI-2018, the Wikipedia language identification benchmark dataset, contains 235000 paragraphs of 235 languages. The dataset is balanced and a train-test split is provided. Thoma, Martin HuggingFace