Indonesian Question Answering

Indonesian Question Answering and its Datasets

Building Systems that automatically answer questions posed by humans in a natural language

By Wilson Wongso, Steven Limcorn and AI-Research.id team

June 1, 2021

Models

Name Description Author Link
IndoBERT-Lite base fine-tuned on Translated SQuAD v2 IndoBERT-Lite trained by Indo Benchmark and fine-tuned on Translated SQuAD 2.0 for Q&A downstream task. Akmal HuggingFace
IndoBERT-Lite-SQuAD base fine-tuned on Full Translated SQuAD v2 IndoBERT-Lite trained by Indo Benchmark and fine-tuned on Translated SQuAD 2.0 for Q&A downstream task. Akmal HuggingFace
SQuAD Bahasa Albert Model Finetuned Albert base language model with translated SQuAD. Based on huseinzol05’s Albert Bahasa. Akmal HuggingFace
SQuAD IndoBERT-Lite Base Model Fine-tuned IndoBERT-Lite from IndoBenchmark using Translated SQuAD datasets. Akmal HuggingFace
IndoBERT Base-Uncased fine-tuned on Translated Squad v2.0 IndoBERT trained by IndoLEM and fine-tuned on Translated SQuAD 2.0 for Q&A downstream task. Rifky HuggingFace

Datasets

Name Description Author Link
FacQA The goal of the FacQA dataset is to find the answer to a question from a provided short passage from a news article. Each row in the FacQA dataset consists of a question, a short passage, and a label phrase, which can be found inside the corresponding short passage. There are six categories of questions: date, location, name, organization, person, and quantitative. Ayu Purwarianti, Masatoshi Tsuchiya, and Seiichi Nakagawa HuggingFace
TyDi QA TyDi QA is a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology – the set of linguistic features that each language expresses – such that we expect models performing well on this set to generalize across a large number of the languages in the world. Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki HuggingFace
mLAMA This dataset provides the data for mLAMA, a multilingual version of LAMA. Regarding LAMA see https://github.com/facebookresearch/LAMA. For mLAMA the TREx and GoogleRE part of LAMA was considered and machine translated using Google Translate, and the Wikidata and Google Knowledge Graph API. Nora Kassner and Philipp Dufter and Hinrich Schütze HuggingFace