Sundanese Text Classification
Models and its Dataset for Sundanese Text Classification
Text Classification is the processing of labeling or organizing text data into groups. It forms a fundamental part of Natural Language Processing.
By Wilson Wongso, Steven Limcorn and AI-Research.id team
June 1, 2021
Datasets
Name | Description | Author | Link |
---|---|---|---|
WiLI-2018 | WiLI-2018, the Wikipedia language identification benchmark dataset, contains 235000 paragraphs of 235 languages. The dataset is balanced and a train-test split is provided. | Thoma, Martin | HuggingFace |