Indonesian Question Answering

Indonesian Question Answering and its Datasets

Building Systems that automatically answer questions posed by humans in a natural language

By Wilson Wongso, Steven Limcorn and AI-Research.id team

June 1, 2021

Models

Name	Description	Author	Link
IndoBERT-Lite base fine-tuned on Translated SQuAD v2	IndoBERT-Lite trained by Indo Benchmark and fine-tuned on Translated SQuAD 2.0 for Q&A downstream task.	Akmal	HuggingFace
IndoBERT-Lite-SQuAD base fine-tuned on Full Translated SQuAD v2	IndoBERT-Lite trained by Indo Benchmark and fine-tuned on Translated SQuAD 2.0 for Q&A downstream task.	Akmal	HuggingFace
SQuAD Bahasa Albert Model	Finetuned Albert base language model with translated SQuAD. Based on huseinzol05’s Albert Bahasa.	Akmal	HuggingFace
SQuAD IndoBERT-Lite Base Model	Fine-tuned IndoBERT-Lite from IndoBenchmark using Translated SQuAD datasets.	Akmal	HuggingFace
IndoBERT Base-Uncased fine-tuned on Translated Squad v2.0	IndoBERT trained by IndoLEM and fine-tuned on Translated SQuAD 2.0 for Q&A downstream task.	Rifky	HuggingFace

Datasets

Name	Description	Author	Link
FacQA	The goal of the FacQA dataset is to find the answer to a question from a provided short passage from a news article. Each row in the FacQA dataset consists of a question, a short passage, and a label phrase, which can be found inside the corresponding short passage. There are six categories of questions: date, location, name, organization, person, and quantitative.	Ayu Purwarianti, Masatoshi Tsuchiya, and Seiichi Nakagawa	HuggingFace
TyDi QA	TyDi QA is a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology – the set of linguistic features that each language expresses – such that we expect models performing well on this set to generalize across a large number of the languages in the world.	Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki	HuggingFace
mLAMA	This dataset provides the data for mLAMA, a multilingual version of LAMA. Regarding LAMA see https://github.com/facebookresearch/LAMA. For mLAMA the TREx and GoogleRE part of LAMA was considered and machine translated using Google Translate, and the Wikidata and Google Knowledge Graph API.	Nora Kassner and Philipp Dufter and Hinrich Schütze	HuggingFace