Indonesian Text Summarization
Model and its Dataset for Indonesian Text Summarization
The task of producing a shorter version of one or several documents that preserves most of the input's meaning
By Wilson Wongso, Steven Limcorn and AI-Research.id team
June 1, 2021
Models
Name | Description | Author | Link |
---|---|---|---|
Indonesian T5 Summarization Base Model | t5-base-indonesian-summarization-cased model is based on t5-base-bahasa-summarization-cased by huseinzol05, finetuned using id_liputan6 dataset. | Cahya Wirawan | HuggingFace |
Indonesian BERT2GPT Summarization Model | bert2gpt-indonesian-summarization model is based on cahya/bert-base-indonesian-1.5G and cahya/gpt2-small-indonesian-522Mby cahya, finetuned using id_liputan6 dataset. | Cahya Wirawan | HuggingFace |
Indonesian BERT2BERT Summarization Model | bert2bert-indonesian-summarization model is based on cahya/bert-base-indonesian-1.5G by cahya, finetuned using id_liputan6 dataset. | Cahya Wirawan | HuggingFace |
Indonesian T5 Summarization Small Model | t5-small-indonesian-summarization-cased model is based on t5-small-bahasa-summarization-cased by huseinzol05, finetuned using indosum dataset. | Panggi Libersa Jasri Akadol | HuggingFace |
Indonesian T5 Summarization Base Model | t5-base-indonesian-summarization-cased model is based on t5-base-bahasa-summarization-cased by huseinzol05, finetuned using indosum dataset. | Panggi Libersa Jasri Akadol | HuggingFace |
Datasets
Name | Description | Author | Link |
---|---|---|---|
WikiLingua | A large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems. Authors extracted article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors. | Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown | HuggingFace |
Liputan6 | A large-scale Indonesian summarization dataset. Authors harvested articles from an online news portal, and obtain 215,827 document-summary pairs. | Fajri Koto and Jey Han Lau and Timothy Baldwin | HuggingFace |
XLSum | A comprehensive and diverse dataset comprising 1.35 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 45 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation. | Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat | HuggingFace |