Knowledge Engineering and Data Science (KEDS)  pISSN 2597-4602 

Vol 6, No 2, October 2023, pp. 129–144  eISSN 2597-4637 

 
https://doi.org/10.17977/um018v6i22023p129-144 

©2023 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id 

This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) 

Systematic Literature Review on Ontology-based Indonesian 
Question Answering System 

Fadhila Tangguh Admojo a,1, Adidah Lajis b,2,*, Haidawati Nasir b,3 

a Universitas Bina Darma 
Jl. Jenderal Ahamd Yani No. 3, Palembang 30111, Indonesia 

b Universiti Kuala LumpurMalaysian Institute of Information Technology (MIIT) 
1016, Jalan Sultan Ismail, Kuala Lumpur 50250, Malaysia,  

1 fadhila.tangguh@binadarma.ac.id; 2 adidahl@unikl.edu.my*; 3 haidawati@unikl.edu.my 
* corresponding author 

 
I. Introduction 

As one of Artificial Intelligence (AI) applications, Question Answering (QA) stands at the 

intersection of Natural Language Processing (NLP), Information Retrieval (IR), knowledge 

representation, and computational linguistics [1][2]. The primary objective of the QA system is to 

provide relevant responses to queries presented in the form of natural language [3]. With the 

increasing amount of available online information, QA systems offer greater convenience and 

efficiency than search engines by presenting the final answer directly to the question rather than 

returning a list of relevant information or hyperlinks [4]. 

In an ontology-based QA system, the Knowledge Base (KB) structure, which is the source of 

answers to questions, is defined in an ontology [5]. Ontology describes the concepts in the domain 

and the relationships among these concepts [6][7]. Ontologies help create a shared understanding of 

data, information, and knowledge for human-to-machine or machine-to-machine communication 

and collaboration [8]. The standard to specify ontologies proposed by the World Wide Web 

Consortium (W3C) is the Resource Description Framework (RDF) format for modeling KB, Web 

Ontology Language (OWL) to support the description logics inference ability and SPARQL query 

language for accessing the data. 

ARTICLE INFO A B S T R A C T   

Article history: 

Received 06 September 2023 

Revised 18 September 2023 

Accepted 30 September 2023 

Published online 19 October 2023 

 
Question-Answering (QA) systems at the intersection of natural language 
processing, information retrieval, and knowledge representation aim to provide 
efficient responses to natural language queries. These systems have seen extensive 
development in English and languages like Indonesian present unique challenges 
and opportunities. This literature review paper delves into the state of ontology-
based Indonesian QA systems, highlighting critical challenges. The first challenge 
lies in sentence understanding, variations, and complexity. Most systems rely on 
syntactic analysis and struggle to grasp sentence semantics. Complex sentences, 
especially in Indonesian, pose difficulties in parsing, semantic interpretation, and 
knowledge extraction. Addressing these linguistic intricacies is pivotal for accurate 
responses. Secondly, template-based SPARQL query construction, commonly used 
in Indonesian QA systems, suffers from semantic gaps and inflexibility. Advanced 
techniques like semantic matching algorithms and dynamic template generation can 
bridge these gaps and adapt to evolving ontologies. Thirdly, lexical gaps and 
ambiguity hinder QA systems. Bridging vocabulary mismatches between user 
queries and ontology labels remains a challenge. Strategies like synonym expansion, 
word embedding, and ontology enrichment must be explored further to overcome 
these challenges. Lastly, the review discusses the potential of developing multi-
domain ontologies to broaden the knowledge coverage of QA systems. While this 
presents complex linguistic and ontological challenges, it offers the advantage of 
responding to various user queries across various domains. This literature review 
identifies crucial challenges in developing ontology-based Indonesian QA systems 
and suggests innovative approaches to address these challenges. 

This is an open access article under the CC BY-SA license 

(https://creativecommons.org/licenses/by-sa/4.0/).  

Keywords: 

Literature Review 

Ontology-based 

Indonesian QA System 

Semantic Parser    

http://u.lipi.go.id/1502081730
http://u.lipi.go.id/1502081046
http://journal2.um.ac.id/index.php/keds
mailto:keds.journal@um.ac.id
https://creativecommons.org/licenses/by-sa/4.0/
https://creativecommons.org/licenses/by-sa/4.0/


 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 130 

 
The essence of an ontology-based QA system is to retrieve the desired information from one or 

many ontologies using a natural language query; however, to be able to achieve this, the QA system 

must first be able to understand the meaning/intention of the natural language query and then 

transform it into an appropriate SPARQL query to obtain the desired information from ontologies 

[9]. Hence, the challenge in developing the QA system is due to human language’s complexity and 

ambiguity. Many systems have been developed using different approaches for different questions in 

different languages. However, most of the available QA systems operate exclusively in English, and 

the existing approaches are also not designed to adapt quickly to new knowledge bases and other 

languages [10][11][12][13][14][15][16][17][18][19][20][21]. 

Indonesian, locally called Bahasa Indonesia, is the national language of Indonesia, spoken by 

over 198 million people, and is one of the most frequently spoken languages in the world 

[22][23][24]. According to Internet World Stats data, as of Mar. 31, 2020, Indonesian was the sixth 

among the top ten languages utilized on the web, with 306 million users and an internet penetration 

rate of 64.6% [25]. Regardless, compared to other world’s top 10 most used languages, such as 

English, Mandarin, Hindi, Spanish, French, and Arabic, the developments of Indonesian QA are still 

far behind. Over the past decade, increasing demand for ontology-based Indonesian QA systems has 

been driven by the need to efficiently navigate and utilize ever-increasing sources of digital content 

while addressing language diversity [26]. Critical to facilitating access to digital information, these 

systems have become essential tools in various sectors, including education, research, business, and 

government. 

The literature review explores the landscape of existing ontology-based Indonesian QA systems, 

aiming to uncover the challenges, limitations, and gaps that hinder the system from reaching its full 

potential. The challenges facing the ontology-based Indonesian QA system have many aspects and 

require different considerations. These challenges include the sophistication of semantic 

understanding, the complexity of sentence variations, the constraints of template-based query 

construction, the traps of lexical gaps and ambiguities, and the uncharted territory of multi-domain 

ontology. Each challenge presents opportunities for more profound research, innovation, and 

collaboration among linguists, computer scientists, and domain experts. 

This section provides an overview of previous reviews and surveys regarding NLP and QA 

systems for Indonesians to ensure that the discussion in this paper has never been discussed before. 

Seven publications can be found that review and survey the Indonesian QA system and NLP 

resources. None of these papers discuss the ontology-based Indonesian QA system. A summary of 

the previous review is shown in Table 1. 

Table 1. A Summary of review on Indonesian QA system 

Reference Brief explanation Coverage 

Sulistyanto et al.2013 

[27] 

Identify approaches and methods that have been used in QAS Indonesia, 

as well as discuss development trends and emerging challenges 

2008-2013 

Wongso et al. 2017 [28] Review the Indonesian QA System using Named Entity Recognition 

(NER) 

2005-2015 

Utomo et al. 2017 [29] Reviewing the state of question analysis, document processing, and 

answer extraction techniques 

- 

Abdiansyah et al. 2018 

[30] 

Survey on answer validation (AV) 2005-2017 

Puspitarani et al. 2021 

[31] 

Review of current research trends, challenges, and information extraction 

opportunities using Indonesian. 

2014-2019 

Aji et al. 2022 [32] Provides an overview of the current state of NLP research and highlights 

challenges in Indonesian NLP 

2011-2021 

 
The contributions of this paper are as follows: 

• Update review of existing literature covering recent work on Indonesian ontology-based QA 
systems. 

• Several previous reviews have focused on components and techniques. This paper attempts to 
analyze the ontology-based Indonesian QA system from a linguistic aspect to evaluate the 

system’s ability to understand natural language-based input. 


131 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 

 
• Looking for gaps and presenting existing challenges for future development. 

The remaining sections of the paper are structured as follows. Section 2 delves into the 

methodology employed in the literature review, offering insights into the strict approach. Section 3 

provides a comprehensive overview of the architecture and classification of QA systems, shedding 

light on these systems' fundamental components and typologies. Section 4 categorizes and explains 

all the systems featured in the selected papers, providing a detailed analysis of their respective 

characteristics and functionalities. Section 5 outlines the pervasive gaps and formidable challenges 

that characterize the landscape of ontology-based Indonesian QA systems, underscoring the critical 

areas necessitating further exploration and resolution. Finally, Section 6 draws insightful 

conclusions from the review and charting direction for the future development of ontology-based 

Indonesian QA systems. 

II. Methods 

This literature review follows a systematic and structured methodology to identify, select, and 

analyze relevant literature on the challenges encountered in ontology-based Indonesian QA systems. 

The primary goal of this literature review is to investigate the challenges and limitations faced by 

ontology-based QA systems when applied to the Indonesian language. This review seeks to identify 

the key issues hindering such systems' development and effectiveness and understand potential 

solutions or strategies proposed in the existing literature. 

The selection of appropriate databases and a well-defined search strategy are essential to ensure 

comprehensive coverage of the relevant literature. Google Scholar will be utilized for the literature 

search. The search strategy will employ a combination of keywords and phrases relevant to the 

research topic. These include: “ontology-based question answering”, "Indonesian QA systems", 

"Indonesian question answering", "ontology-driven QA challenges", "semantic parsing in 

Indonesian", "knowledge base-driven QA issues", "ontology-based QA limitations", "Indonesian 

language processing challenges", "QA system semantic analysis problems", "ontology-based QA 

system difficulties", "challenges in Indonesian language QA". The search process will involve 

iterative refinements of the search terms to ensure the retrieval of the most relevant literature. 

Explicit inclusion and exclusion criteria will be applied to select relevant literature while excluding 

irrelevant. The Inclusion and Exclusion criteria describe in Table 2. 

Table 2. Inclusion dan exclusion criteria  

Inclusion Criteria Exclusion Criteria 

Papers are written in English. 

Papers published in conferences or online journals 
platform. 

Papers about ontology-based Indonesian QA systems.  

Papers published between 2010 to 2022 

Papers that are not written in English 

duplicated paper 
Full content of papers not available/could not be found. 

Paper contains theoretical concepts without proof of 

implementation 

 
The screening process will encompass an initial title and abstract screening, followed by a full-

text review. For the selected literature, relevant data will be extracted and organized. This data will 

include publication details, research objectives, methodologies employed, key findings, and 

challenges identified. The extracted data will be synthesized to identify common themes, patterns, 

and recurring challenges across the literature. 

The quality and rigor of the selected literature will be assessed through critical appraisal. Each 

paper's research methodology, experimental design, and contribution to understanding ontology-

based Indonesian QA system challenges will be evaluated. The assessment will consider factors 

such as the validity of findings and the credibility of the research. 

The synthesized data will undergo thematic analysis to identify overarching themes and patterns 

related to the challenges faced by ontology-based Indonesian QA systems. The identified challenges 

will be categorized and discussed in detail in the literature review. 

The findings of this literature review will be documented in a structured report format, including 

sections dedicated to the introduction, methodology, literature review, thematic analysis, discussion, 


 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 132 

 
and conclusion. Proper citation and referencing of sources will be ensured, adhering to academic 

writing standards and citation guidelines. Ethical guidelines for conducting research will be followed 

diligently. 

III. QA System Architecture and Classification 

Typically, QA systems follow a pipeline design, where data undergoes sequential processing, 

ensuring that the outcome of one component serves as the input for the subsequent one. The 

architecture of QA systems comprises three key components: question analysis, document analysis, 

and answer analysis [26]. In practice, the components of the QA system depend on the approach 

used and the type of data source underlying it [27]. An Ontology-based QA system does not involve 

document analysis to get answers from candidate from unstructured text or documents like corpus-

based systems. However, it uses SPARQL query language to extract information from the ontology 

that represents the data source [28]. 

An ontology-based QA system’s working stages include question analysis, query construction, 

and answer analysis. Question analysis is the first stage of processing natural language queries, 

which aims to understand the question using various analyses, including morphological analysis to 

separate words into individual units, syntactic analysis to identify grammar, usually involving 

constructing parse trees, and semantic analysis to identify relationships between words, phrases, 

clauses, and sentence levels to obtain the correct meaning. The second stage is query construction, 

aimed at producing SPARQL queries based on the results from the question analysis stage. The final 

stage is answer analysis, executing SPARQL queries to extract answers from the ontology. 

According to [17][27] QA systems can be classified based on specific criteria, as shown in Figure 

1. Questions can be categorized into factoid, non-factoid, list, and confirmation. Factoid questions 

usually contain When, Where, How many/much, What, and Who. Answers to factoid questions are 

usually short and specific. Non-factoid questions ask for explanations; answers are presented as 

definitions, usually using Why and How to. The list of questions is almost the same as the factoid 

questions; the difference lies in the number of answers. A Boolean-type question, also known as a 

confirmation question type. The answer to a confirmation question is yes or no, true or false. The 

types of domains in the QA system are divided into closed and open domains. Close domain focuses 

on answering questions under a specific domain, such as sports, history, and culture. Open domain 

does not focus on just one domain. The goal is to answer questions with a wide range. The QA 

system relies entirely on data sources to generate answers. Data source types can be structured, 

semi-structured, and unstructured. According to [26][29][33][34][35], the approach that forms the 

basis of the work phase of the QA system can be classified into linguistic-based, rule-based, 

statistical-based, and pattern-based approaches. 

 
Fig. 1.QA system classification 

 
133 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 

 
IV. Ontology-based Indonesian QA system 

Based on the methodology stages that have been carried out and filtering using inclusion and 

exclusion criteria of previous papers regarding the ontology-based Indonesian QA system from 2010 

to 2022, only 14 papers were selected and arranged based on their classification, as shown in Table 

3. This section reviews the 14 ontology-based Indonesian QA systems and their novelties to 

highlight current challenges. The reviews are grouped according to four main approaches: (1) 

linguistic, (2) rule-based, (3) pattern matching, and (4) statistical.  

Table 3. Classification of 14 selected papers 

Citation Ref. Question Type Domain Approach 

Darari et al. 2010 [36] Factoid Close Linguistic 

Putra et al. 2016  [37] Factoid Close Statistical 

Atina er al. 2017  [38] Factoid Close Pattern Matching 

Wahyudi et al. 2018  [39] Factoid Open Pattern Matching 
Utomo et al. 2019  [40] Factoid Close Statistical 

Yunmar et al. 2019 [41] Factoid Close Pattern Matching 

Amalia et al. 2020 [42] Factoid Close Pattern Matching 

Ishlakhuddin et al. 2021  [43] Factoid Close Rule-Based 

Rahajeng et al. 2021  [44] Factoid Close Statistical 

Perangin-Angin et al. 2022 [45] Factoid Close Statistical 

Hasanah et al. 2022 [46] Factoid, Non-Factoid Close Statistical 

Jasmi et al. 2022 [47] Factoid Close Statistical 
Saldhi et al. 2022 [48] Factoid Close Statistical 

Anggrayni et al 2022 [49] Factoid Close Pattern Matching 

A. Linguistic Approach 

The ontology-based Indonesian QA system presented by [36] consists of an NLP semantic 
analyzer and SPARQL query generator module. [36] reused the Semantic Analyzer module 

developed by [50]. This module has four parts: (1) The lexicon contains vocabulary words and 

linguistic information. (2) Grammar to determine the structure of sentences syntactically. (3) Lexical 

semantics store semantic values for each word in the lexicon. (4) Semantic attachment rules 

instructions to produce semantic representations based on grammatical rules. 

The NLP semantic analyzer receives interrogative sentence input and then performs a parsing 

process to produce semantic notations (lambda-calculus) using a syntax-driven semantic analysis 

technique. NLP semantic analyzer generates questions and conditional variables. SPARQL query 

generator translates the semantic notation into SPARQL query by arranging the question variable for 

the SELECT clause and conditional variable for the WHERE clause, and the final answer is 

generated from the query execution result. The SPARQL query formation process is illustrated in 

Figure 2. 

 
Fig. 2. The formation of SPARQL query 

B. Rule-based Approach 

The development of a QA system with a rule-based approach was carried out by [43]. The 

system receives input in the form of interrogative sentences. The question analysis process begins by 

breaking sentences into tokens (tokenizing), then removing stop words (filtering), and classifying 

the order of tokens according to their class (class) in Ontology. Classification results are stored in 

variables for examination using specific rules to determine answers. The final answer is not 

extracted from the ontology and does not use SPARQL queries. The ontology developed by [43] is 

only used as a lexicon for the classification process. 

 
 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 134 

 
C. Pattern Matching Approach 

The reference authors [38] developed an information retrieval system that accepts command 

sentences as input. NLP tasks consist of case folding, tokenization, and filtering to generate a series 

of words to be matched against specific patterns to generate SPARQL queries. The final information 

is presented based on the results of the SPARQL query execution. 

The authors of reference [39] suggest a Question Answering (QA) system employing the graph 

pattern association rule (QAGPAR) within the YAGO knowledge base. The input question using 

interrogative sentences is translated into graph form through four steps: (1) question classification, 

(2) graph component formulation, (3) query formulations, and (4) query processing. In the first step, 

the question must match the existing templates. Next, in the second step, the output from the first 

step will be transformed into a graph form. Then, the third step produces a query based on the model 

from the graph component. Finally, query processing executes the query to obtain the answer from 

the KB. Moreover, [39] adds an optimization query feature to not found-answers from the database 

by using graph-pattern association. 

The reference authors [41] design an ontology-based Indonesian QA system that can process 

incomplete question sentences, such as a question sentence without a question word or an object of 

the question and a question sentence with unclear adjectives. The stages in question analysis are 

Stemming, Stop word Removal, Tokenizing, Post Tagging, and Keyword Identification. The 

SPARQL query is formulated using keyword association, predicate identification, and property 

identification to fill the slots in the prepared query template. [41] Also, design an ontology that uses 

a keyword property that functions as a thesaurus to find the question objective. Another study that 

used steps similar to [41] in the analysis process to form SPARQL queries was [49]. The difference 

lies in the ontology domain. An illustration of sentence processing with pattern matching is 

presented in Figure 3. 

 
Fig. 3. Illustration of sentence processing with pattern matching 

D. Statistical Approach 

In research conducted by [37], to obtain semantic similarity between the questions and each verse 

in the KB, the questions are transformed into weighted vectors using the term frequency-inverse 

document frequency (TF-IDF). Then, the semantic similarity is measured using the cosine similarity 

algorithm between questions and verses in the knowledge base to retrieve relevant verses. After 

obtaining the semantic relevance verses, named entity recognition (NER) and feature extraction are 

performed to select the best verse and extract the answers. The verse with the highest score based on 

 
135 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 

 
the correctness of the probability calculation is returned as the answer.The authors of reference [40] 

build QA framework consisting of six steps: (1) pre-processing, (2) morphological analysis, (3) 

question classification, (4) query expansion, (5) document processing, and (6) answer extraction.  

The first step, pre-processing, consists of tokenization, punctuation removal, stop-word removal, 

and stemming operation. The output of the morphological analysis is the essential keyword in the 

root form, which then becomes the input for the question classification and query expansion. In the 

third step, the Radial Basis Function Network algorithm (RBFN) is used to extract and determine the 

answer type [40] utilize their own training data set that is created by the TF-IDF technique. Query 

expansion acts to find the synonym and extend the keyword using available Indonesian WordNet. 

The answer type and the keyword are used as the inputs in the fifth step, generating the SPARQL 

query with answer type (CLASS) and keyword (Instance) as the parameters to be executed. Lastly, 

the execution result from the fifth step becomes the input (candidate answers) for answer extraction, 

implementing a word-matching scoring technique to list the answers by counting the number of 

similar words between synonyms and candidate answers and determining the best score based on the 

highest score. Word matching scoring technique is applied to create a list of answers by counting the 

number of similar words between synonyms and candidate answers. The best answer is determined 

based on the highest score. 

The authors of reference [44] developed an Indonesian QA system that employs a Knowledge 

Graph as its data source. This system comprises four distinct modules: (1) question classification,(2) 

information extraction, (3) token mapping, and (4) query construction. The first module involves 

classifying the question to determine the appropriate class for the 'SELECT' statement. The second 

module identifies a set of extracted tokens and assigns them token-type labels, which can be 

correspondingly mapped to the 'WHERE' statement. The third module uses a set of extracted tokens, 

a token type label, and a lexicalization dictionary sourced from the Knowledge Graph resources. 

This dictionary is established using translations and synonyms from the training data. Each extracted 

token's lexical similarity to resources of the same type is computed. The token is then paired with 

the resource exhibiting the highest similarity value, which becomes the input for the final module. In 

the last module, the results of token mapping and the answer type class are utilized, employing basic 

query templates to formulate SPARQL queries. 

In the first and second modules, the author of [44] compares three language models, SVM, 

LSTM, fine-tuned IndoBERT, and three text representations: TF-IDF, FastText, and IndoBERT. 

The labeled data for training and testing purposes were collected from 503 questions. An illustration 

of system architecture with a statistical approach is presented in Figure 4. Summary of the 

evaluation of ontology-based Indonesian QA system can be seen in Table 4. 

 
Fig. 4. Illustration of sentence processing with the statistical approach 

 
 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 136 

 
Table 4. Summary of the evaluation of ontology-based Indonesian QA system 

Ref. Ontology Domain QA Analysis Evaluation 

[36] Economic - Semantic Analyzer [50] 
- SPARQL generator 

- 

[37] Indonesian 

Translated Quran 

- Stemming, NER, 

- TF-IDF, Cosine similarity 

- NER, Feature Extraction 

- 

[38] Indonesian 

Manuscript 

- Case Folding, Tokenizing, Filtering 

- SPARQL query template 

- Indexing 

Precision: 100% 

Recall: 93.3% 

 
[39] - - Question classification (template pattern) 
- Graph transformation (Neo4j), Query generation 

- Query processing with graph-pattern association rules 

Accuracy: 90% 

[40] Quran - Tokenizing, punctuation and Stop word remover, Stemming  

- TF-IDF + Classification (RBFN), Query expansion 
- SPARQL query template, word scoring, answer ranking 

- 

[41] University 

Admission 

- Stemming, Stop word Remover, Tokenizing, POS Tagging 

- Keyword Identification 

- SPARQL query template 

- 

[42] Drug and Disease - Case folding, tokenizing, phrase detection, Stemming, 

Filtering 

- Keyword Identification 

- SPARQL query template 

Accuracy: 90% 

[43] Computer Server - Tokenizing, Stop word remover, classification 

- Rule-based syntactic parser 

- data extraction, answer generating 

Accuracy: 95% 

[44] Face Beauty 
Product 

- Question classification and Information Extraction 
(SVM + IndoBERT, LSTM + IndoBERT) 

- Token Mapping: Lexical similarity 

- SPARQL query template 

Precision: 0.8823529 
Recall: 0.8418301 

F-Measure: 0.8499703 

[45] National History - Case Folding, Tokenizing, POS tagging 
- N-Gram + Classification (SVM) 

- SPARQL query template 

Accuracy: 
base question 93% 

non-base question 80% 

[46] National History - Case Folding, Tokenizing, POS tagging 

- N-Gram + Classification (Multi-Layer Perceptron) 
- SPARQL query template 

Accuracy: 57.37% 

[47] National History - Case Folding, Stemming (Sastrawi), stop word removal, , 

TF-IDF 

- Cosine Similarity (sklearn) 
- SPARQL query template 

Precision: 0.70 

Recall: 0.94 

F-Measure: 0.80 

[48] National History - Case Folding, Stemming (Sastrawi), stop word removal, 

TF-IDF 

- Question Analysis: Classification (Naive Bayes) 
- SPARQL query template 

Classification accuracy: 

67% (8.2 and 7:3 ratio) 

[49] National History - Case Folding, Stop word removal (Sastrawi), Tokenizing, 

POS tagging  

- Question Analysis: Word Identification 
- SPARQL query template 

Accuracy: 87% 

V. Challenges in ontology-based Indonesian QA system 

Exploring the challenges faced by ontology-based Indonesian QA systems is a crucial effort in 

advancing the field of natural language processing and knowledge retrieval for the Indonesian 

language. These systems, designed to leverage structured knowledge representations to answer user 

questions, face complex, diverse, and often interconnected challenges. These challenges encompass 

a spectrum of linguistic, semantic, and technical intrigue. Uncovering and addressing this 

complexity effectively is not only crucial for developing QA systems that are more accurate, 

context-aware, and user-friendly but also holds substantial implications for the advancement of 

natural language understanding and information retrieval adapted to Indonesia's linguistic and 

semantic landscape. This section discusses the gaps and challenges the ontology-based Indonesian 

QA system faces based on an investigation of the reviewed literature. In addition, since most of the 

challenges mentioned in this section are open, and some still need to be explored, it also highlights 

potential solutions and their implications for future progress. 


137 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 

 
A. Sentence meaning, sentence variation, and complex sentence. 

The system's ability to understand sentences semantically is still challenging in developing an 

ontology-based Indonesian QA system. Almost all existing ontology-based Indonesian QA systems 

rely on syntactic analysis to understand questions, with two main tasks in the parsing process are 

checking the suitability of the word sequence structure based on predefined patterns (domain-

specific grammar) and identifying keywords based on instances (subjects or objects), properties, 

classes, or relationships between words from the ontology. For these weaknesses [41][49] seek to 

improve performance and accuracy by adding semantic features such as synonyms, hyponyms, 

antonyms, and co-occurrent words (neighbors on the left or right). The aim is to expand the meaning 

of keywords based on the similarity of context/meaning on specific labels in the ontology or 

predefined dictionary. Understanding semantics at the word level is not enough to understand the 

meaning of a sentence. For example, to understand the relationship between "aku memiliki kaka tua" 

and "aku punya peliharaan" or to be able to distinguish between "apa yang Joko makan" and "apa 

yang memakan Joko" the QA system must also be able to understand how words relate to each other 

in sentences. As reported in studies [44][45][46][47], limitations in understanding questions 

semantically also occur in statistical-based approaches. Using several classification techniques with 

Indonesian language text representations has excellent performance results in classifying questions. 

However, the problem lies in the information extraction process, which relies on the results of 

lexical similarity calculations in determining which word or phrase corresponds to an instance 

(subject or object), property, or class in the ontology. This problem can cause prediction errors so 

that the resulting SPARQL query does not match the sentence's meaning or the question's context, 

which causes the system to give the wrong answer. The only work that discusses deep syntactic and 

semantic processing for Indonesian is OWLizr [36]. OWLizr can perform deep parsing by 

combining syntax rules with semantic composition to produce semantic interpretations, as shown in 

Figure 5. 

 
Fig. 5.Parsing illustration with semantic attachment 

In this way, the sentence can be fully understood from a semantic point of view (deep structure) 

so that the system can understand that "nasi memakan joko" is semantically incorrect even though it 

is syntactically correct. Although deep syntactic and semantic processing models used by [36] are 

promising, the complexity inherent in using lambda calculus notation and Prolog can be an obstacle 

when dealing with complex linguistic phenomena, such as long, convoluted sentences or sentences 

with many clauses. Parsing processes are driven by rigid rules and formalisms, not easily adapted to 

accommodate the structural complexity often encountered in real-world questions. The model's 

performance will also suffer when faced with sentences involving nested or embedded structures, 

where the interaction of various grammatical rules and lexical semantics can become very 

complicated. The model's ability to handle sentence variations, including differences in word order, 

sentence length, and syntactic construction, is also a limitation. Although this model's grammatical 

rules and lexical semantics are precise, it will be challenging to adapt to the diversity of linguistic 

expressions found in Indonesian, where word order and phrase structure can vary significantly based 

on context and language style. These limitations may cause the interpretation of questions to be 

 
 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 138 

 
inaccurate or incomplete. Moreover, the effectiveness of [36] models relies on the completeness and 

accuracy of their linguistic resources. Incomplete or outdated resources can limit performance and 

hamper a model's ability to provide accurate answers in a rapidly evolving linguistic landscape. 

Complex sentences are still recognized as one of the toughest challenges faced by QA systems 

and are still the focus of discussion in recent research, even for the most widely studied language, 

English [27][51][52]. The ability of the ontology-based QA system to understand sentences 

semantically is related to its ability to handle complex sentence structures. Based on facts from the 

literature reviewed, it shows that the development of an ontology-based Indonesian QA system often 

focuses on handling more straightforward questions based on facts and structured knowledge, so 

handling complex sentences, including sentence variations, is an area that is less explored and 

discussed. According to [53], Indonesian complex sentences, commonly called multilevel compound 

sentences, consist of two clauses, and one of the clauses becomes part of the other. Complex 

questions usually contain many subjects, express many relationships, and include numerical 

operations [54][55]. 

An example of a complex question in Indonesian is "bisakah anda menyajikan peta lokasi 

gunung yang ada di pulau jawa yang ketinggianya lebih dari 3000 mdpl yang memiliki lebih dari 1 

jalur pendakian?" in English "Can you present a map of the location of mountains on the island of 

Java with a height of more than 3000 meters above sea level and which has more than 1 climbing 

route?". These sentences are rich in meaning but often cause automatic parsing, semantic 

interpretation, and knowledge extraction difficulties. Moreover, it involves geographic entities or 

concepts that require spatial operations that still need to be solved for QA systems to answer [1]. 

Ontology-based Indonesian QA systems typically depend on well-defined entities and relationships, 

making extracting precise information from sentences with multiple levels of embedding or intricate 

phrasing challenging. Moreover, complex sentences often involve various linguistic phenomena, 

including idiomatic expressions, subordination, and coordination, which can further complicate the 

process of knowledge extraction. 

The problem of handling complex sentences is inherently related to the issue of sentence 

variation because complex sentences represent a subset of sentence variations. Both challenges 

demand robust natural language understanding capabilities and sophisticated linguistic analysis. 

Complex sentences, with their complex syntactic and semantic structures, exemplify the nuances 

and intricacies of sentence variation. Therefore, addressing complex sentences effectively is a 

foundational step toward tackling the broader problem of sentence variation.  

In Indonesian, sentence variation includes various grammatical forms, word orders, and 

contextual dependencies. As an example of variations of sentences in Indonesian, “Siapakah yang 

menulis buku The Study in Scarlet?”, “Buku The Study in Scarlet ditulis oleh siapa?” and "Ditulis 

oleh siapa buku The Study in Scarlet?" These sentences are different syntactically but have the same 

meaning, asking who is the author of The Study in Scarlet? Most QA research in Indonesia has 

traditionally focused on more fundamental challenges, such as ontology design, knowledge 

representation, information retrieval techniques, or improving entity recognition. While sentence 

variation remains a relatively under-discussed problem in ontology-based Indonesian QA research, it 

is a crucial challenge that requires more dedicated attention. Addressing this issue can lead to the 

development of more accurate and adaptable QA systems that can effectively navigate the linguistic 

intricacies of Indonesian and provide contextually relevant answers to a broader range of user 

queries.  

Understanding sentence semantics, handling complex sentence structures, and dealing with 

sentence variations are interconnected challenges at the core of developing a robust and advanced 

ontology-based Indonesian QA system. These challenges have a symbiotic relationship, where each 

aspect significantly influences and impacts the other aspects, thus underscoring the need for a 

holistic approach. Addressing these interconnected challenges in an ontology-based Indonesian QA 

system will dramatically improve the accuracy and relevance of the answers provided, making them 

more valuable for information retrieval and increasing user satisfaction. Therefore, further research 


139 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 

 
needs to focus on answering these challenges in the context of the ontology-based Indonesian QA 

system. Several strategies can be studied further to answer the challenges. 

Firstly, leveraging advanced NLP techniques, which combine syntactic analysis with semantic 

parsing techniques, can empower the system to deconstruct complex sentences and extract 

meaningful information accurately. Second, build an Indonesian linguistic ontology as a 

fundamental source for semantic parsing. Developing a semantic parser requires a foundation that 

provides structured knowledge about language-specific concepts, word meanings, and relationships 

between words to perform accurate and contextually relevant semantic analysis. Building a 

linguistic ontology can be considered a solution to the difficulties of semantic parsers in interpreting 

language effectively and overcoming weaknesses, such as in research [36]. Linguistic ontologies 

differ from general domain ontologies in their scope and purpose. Linguistic ontology captures 

semantic nuances, contextual variations, synonyms, antonyms, and other language-specific attributes 

essential for accurate language understanding. 

In contrast, general domain ontologies focus on representing knowledge and concepts within a 

specific subject area or domain, such as medicine, finance, or geography. The development of this 

general domain ontology dominates ontology-based Indonesian QA research. Building a linguistic 

ontology is critical to improving the precision and reliability of semantic parsers, ultimately 

improving the overall performance of a natural language understanding system and its ability to 

provide meaningful responses to user queries. 

Third, specifically for the statistical approach employed in ontology-based Indonesian QA 

systems, it is crucial to consider the incorporation of more sophisticated and context-aware language 

models, such as transformer-based models like GPT-4, RoBERTa, or even more recent iterations 

that may emerge since knowledge cutoff in September 2021. These models capture nuanced 

semantic information, understand context, and produce coherent text. Upgrading the language model 

can significantly improve the accuracy of question classification and information extraction. Using 

pre-trained word embeddings such as FastText and IndoBERT, as in [44], is also a respectable 

approach. However, there may be potential benefits in exploring newer embeddings that have 

emerged in the natural language processing field, which can provide a richer semantic representation 

for tokens such as Word2Vec or ELMo. 

Moreover, the ontology-based Indonesian QA system covered in this review relies on small 

datasets. Expanding the labeled dataset could be beneficial, as it will allow the model to generalize 

better and handle the wide variety of semantic nuances in real-world questions. Additionally, it is 

essential to ensure that the dataset remains up to date to accommodate changes and developments in 

the domain ontology. Additionally, continuously updating and maintaining the lexicalization 

dictionary is critical to ensure the system remains aligned with the evolving semantics and additions 

to the knowledge resources. 

B. Template-based SPARQL query construction 

Most of Indonesia's existing ontology-based QA systems use simple template-based SPARQL 

query constructs because simple questions can generally be answered with a set of triplet patterns. A 

query template is a predefined set of queries consisting of SELECT and WHERE clauses with one 

or two triple pattern slots to fill that can be supplemented with FILTER expression slots. 

Undoubtedly, the use of template-based in the construction of SPARQL queries is the fastest and 

easiest way to develop a new QA system, especially for simple questions on a single domain QA. 

However, based on evidence from research [44][45][46][47][49], the drawback of using template-

based queries is that they are prone to semantic gaps, which are caused by the effect of exact string 

matching so that the resulting form of the query is not following the results of the deduction of the 

question which ultimately produces the wrong answer. The limited number of existing query 

templates is also a weakness; not all queries can be handled using existing templates, and it is almost 

impossible to manually provide all forms of queries for various questions, especially for complex 

questions in a multi-domain QA system. Plus, template-based queries are not flexible in dealing with 

changes in ontology, where problems will arise when new relations are added to the ontology. 


 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 140 

 
Exploring more sophisticated query generation techniques for future advancements in ontology-

based Indonesian QA systems is necessary. This exploration may include the implementation of 

semantic matching algorithms that go beyond exact string matching by utilizing semantic similarity 

measures or using natural language understanding techniques to align questions more precisely with 

the user's intended meaning. This algorithm is important because it bridges the semantic gap in 

template-based queries. Additionally, it is recommended to develop a system that can generate query 

templates based on user queries dynamically. This approach offers the advantage of accommodating 

a broader spectrum of query forms, thereby reducing reliance on manually created templates. 

Ontology evolution techniques can also be applied to systems to adapt to changes in ontology 

structure, ensuring that new relationships or concepts are integrated into the QA system smoothly. 

C. Lexical gap and ambiguity. 

A question can only be answered if every vocabulary that references an entity is identified in an 

ontology. Describing terms (vocabulary) for an RDF resource can be done through the property 

value rdfs:label. Several labels are commonly used to model synonyms for words that refer to the 

same RDF resource, as shown in the study [41][49]. However, knowledge bases usually only 

contain different terms that can refer to a particular entity. When the vocabulary used in a question 

differs from that used in the ontology labels, a lexical gap occurs, significantly reducing the 

percentage of questions that a system can answer. 

On the other hand, strategies to enrich vocabulary to overcome lexical gaps can raise ambiguity 

problems because the exact words can have different definitions [56]. Ambiguity hinders correct 

interpretation of user questions and retrieval of contextually relevant information. Research [46] 

shows that ambiguity dramatically affects the system's accuracy.  

Another cause of the lexical gap is vocabulary differences from using another language to label 

resources, as reported in research [44]. Since every QA system is very dependent on language 

resources, it is necessary to mention that Indonesian is still classified as a low-resource language 

[25] and the limitations of NLP resources and tools are also a challenge in developing a QA system. 

Using resources and tools for other languages can be a temporary solution. It causes language 

barriers. As reported in [37], using English stemmers (Lucene) causes over-stemmer, thus 

dramatically affecting accuracy. Several studies also report that already available Indonesian 

resources, such as POS tagging, still need support with standardization. Although research 

discussing Indonesian language resources has increased in the last decade, the availability of these 

resources still needs to be improved, and contributions are still needed to increase their use. 

Several strategies can be applied to overcome the challenges of lexical gaps, such as synonym 

expansion, contextual understanding, word embedding, and ontology enrichment. However, 

addressing lexical gaps in ontology-based QA systems involves a trade-off between enhancing 

vocabulary coverage and managing semantic ambiguity. Therefore, further investigation is needed 

to combine several strategies, such as synonym expansion, contextual understanding, and ontology 

enrichment, to create a hybrid approach that effectively overcomes lexical gaps and ambiguity. 

Although the described solution offers a way to bridge the gap between user queries and ontology 

labels, it also presents challenges regarding resource availability (POS tagging, named entity 

recognition, lexical databases, and others). Therefore, this paper encourages collaboration between 

the NLP community and related organizations to standardize linguistic resources for low-resource 

languages such as Indonesian, also includes encouraging the exchange of linguistic resources and 

data between researchers and organizations in Indonesian language processing. Open data initiatives 

can help overcome resource limitations, system complexity, and evaluation. 

D. Multiple-domain ontology. 

The ontology-based Indonesian QA system reviewed in this paper, except for [39], which uses 

YAGO [57] as its KB, operates in limited and closed domains. These systems are designed to 

answer questions for one particular topic and usually use one ontology to cover one particular 

knowledge domain, such as economics, history, and tourism. The grammar and lexicon built are also 

limited to the ontology specifications of the domain scope. Therefore, designing an Indonesian QA 

system based on a multi-domain ontology poses challenges from various linguistic features, domain-


141 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 

 
specific nuances, and the complexity of managing various ontological frameworks. Adapting a 

system to a different domain will amplify its complexity. Each domain requires a different ontology, 

terminology, and contextual interpretation. Creating and maintaining these domain-specific 

resources requires significant time and resources, especially in a language with relatively little 

existing ontological structure. 

Despite the complexity, developing a multi-domain ontology is very important to advance the 

ontology-based Indonesian QA system because it can significantly expand the scope of knowledge. 

Closed-domain QA systems are limited to a particular subject or domain, limiting their usefulness in 

answering user questions. By developing a multi-domain ontology, the QA system can respond to 

various domain topics, varying from economics, history, science, culture, and more. Comprehensive 

knowledge coverage provides a more flexible and user-friendly experience, where users can ask 

questions across multiple domains without needing a separate dedicated QA system. Enabling a 

single QA system to handle questions across multiple domains will increase user satisfaction and 

convenience. In real-world scenarios, questions often require a multidisciplinary approach or cross-

domain knowledge. For example, understanding complex issues such as climate change requires 

meteorology, environmental science, and economics knowledge. Multi-domain ontologies equip QA 

systems to provide comprehensive and contextually relevant responses, reflecting complex 

knowledge interactions in real-life scenarios. 

Moreover, by dealing with multiple domains, QA systems must understand a broader spectrum 

of linguistic expressions, including domain-specific terminologies and nuances. This linguistic 

challenge drives advancement in NLP and understanding, enhancing QA systems and other NLP 

applications. Developing multiple-domain ontologies also stimulates research and development 

opportunities. It encourages collaboration among domain experts, linguists, and NLP researchers, 

driving innovation in ontology construction, semantic interoperability, and knowledge 

representation. These advancements have broader implications beyond QA systems, benefiting the 

broader field of artificial intelligence and human-computer interaction. 

Addressing the challenge of developing multiple-domain ontologies for ontology-based 

Indonesian QA systems requires a multifaceted approach, especially in the context of limited 

previous discussions. Firstly, future research should emphasize knowledge sharing and collaboration 

among researchers, domain experts, and linguists to create a collective pool of domain-specific 

ontologies. Establishing a centralized repository or platform for sharing ontological resources and 

experiences can accelerate progress in this area. 

Secondly, research should emphasize the development of domain-agnostic ontology frameworks 

that can be adapted efficiently to various domains. It involves designing ontological structures that 

are inherently flexible and capable of accommodating new domains without requiring extensive 

manual adjustments. Lastly, research should explore semi-automatic or machine-assisted approaches 

for ontology adaptation. Leveraging natural language processing and machine learning techniques 

can facilitate the automatic alignment of ontologies with new domains, making the process more 

efficient and less resource-intensive. 

VI. Conclusion 

The ontology-based Indonesian QA systems area faces a series of complex challenges essential 

to address for advancing natural language processing and knowledge retrieval in the Indonesian 

language context. These challenges encompass various linguistic, semantic, and technical aspects, 

and they are intricately interconnected. Some primary challenges are understanding sentence 

semantics, handling complex sentence structures, and dealing with sentence variations. Addressing 

these issues is critical to developing a robust and sophisticated ontology-based Indonesian QA 

system, as they directly impact the system's ability to provide contextually relevant and accurate 

answers to user questions. Addressing these challenges requires a holistic approach that combines 

advanced natural language processing techniques, linguistic ontologies, and sophisticated language 

models. 


 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 142 

 
Although convenient, template-based SPARQL query construction has limitations, such as 

semantic gaps and inflexibility in adapting to ontology changes. Future advances should explore 

semantic matching algorithms, dynamic query templating, and ontology evolution techniques to 

improve the precision and adaptability of query construction. Lexical gaps and ambiguities in 

Indonesian QA systems are vocabulary coverage and semantic understanding challenges. Strategies 

such as synonym expansion, contextual analysis, word embedding, and ontology enrichment require 

further investigation to bridge this gap effectively. 

Developing multiple-domain ontologies is crucial for expanding the knowledge coverage of 

Indonesian QA systems. While it is a complex effort, it offers significant benefits regarding user 

convenience and the ability to address cross-domain questions. Collaboration among experts, 

flexible ontology frameworks, and machine-assisted ontology adaptation are vital strategies to tackle 

this challenge. Addressing these challenges requires a collaborative effort from researchers, domain 

experts, and linguists. Overcoming these obstacles will improve the performance of ontology-based 

Indonesian QA systems and advance the broader field of natural language processing and 

knowledge retrieval in low-resource languages like Indonesian. 

Declarations  

Author contribution  

All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. 

Funding statement  

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit 
sectors.  

Conflict of interest  

The authors declare no known conflict of financial interest or personal relationships that could have appeared to 
influence the work reported in this paper.  

Additional information  

Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. 

Publisher’s Note: Department of Electrical Engineering and Informatics - Universitas Negeri Malang remains neutral with 

regard to jurisdictional claims and institutional affiliations. 

 
References 

[1] G. Mai, K. Janowicz, R. Zhu, L. Cai, and N. Lao, “Geographic Question Answering: Challenges, Uniqueness, 
Classification, and Future Directions,” AGILE: GIScience Series, vol. 2, no. 8, 2021. 

[2] E. M. Nabil Alkholy, M. Hassan Haggag, and A. Aboutabl, “Question Answering Systems: Analysis and Survey,” 
International Journal of Computer Science & Engineering Survey, vol. 09, no. 06, 2018. 

[3] W. Franco et al., “Ontology-based Question Answering Systems over Knowledge Bases: A Survey,” in Proceedings 
of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS, 2020, pp. 532–539. 

[4] I. Mahmoud Ibrahim Alturani and M. Pouzi Bin Hamzah, “An Efficient Semantic Analysis Technique for the 
Question Answering Systems,” Journal of Engineering and Applied Sciences, vol. 14, no. 22, 2019. 

[5] A. Abdi, N. Idris, and Z. Ahmad, “QAPD: An ontology-based question answering system in the physics domain,” 
Soft comput, vol. 22, no. 1, pp. 213–230, 2018. 

[6] C. Trojahn, R. Vieira, D. Schmidt, A. Pease, and G. Guizzardi, “Foundational ontologies meet ontology matching: A 
survey,” Semant Web, vol. 13, no. 4, pp. 685–704, 2022. 

[7] M. B. Canciglieri, A. L. Szejka, O. Canciglieri Junior, and L. Yoshida, “Current issues in multiple domain semantic 
reconciliation for ontology-driven interoperability in product design and manufacture,” in IFIP Advances in 
Information and Communication Technology, 2018. 

[8] G. R. Roldán-Molina, D. Ruano-Ordás, V. Basto-Fernandes, and J. R. Méndez, “An ontology knowledge inspection 
methodology for quality assessment and continuous improvement,” Data Knowl Eng, vol. 133, 2021. 

[9] A. F. Khan et al., “When linguistics meets web technologies. Recent advances in modelling linguistic linked data,” 
Semant Web, vol. 13, no. 6, pp. 987–1050, 2022. 

[10] D. Diefenbach, A. Both, K. Singh, and P. Maret, “Towards a question answering system over the Semantic Web,” 
Semant Web, vol. 11, no. 3, pp. 421–439, 2020. 

[11] T. H. Alwaneen, A. M. Azmi, H. A. Aboalsamh, E. Cambria, and A. Hussain, “Arabic question answering system: a 
survey,” Artif Intell Rev, vol. 55, no. 1, pp. 207–253, Jan. 2022. 

[12] A. Arbaaeen and A. Shah, “Ontology-Based Approach to Semantically Enhanced Question Answering for Closed 
Domain: A Review,” Information (Switzerland), vol. 12, no. 5, 2021. 

http://journal2.um.ac.id/index.php/keds
https://doi.org/10.5194/agile-giss-2-8-2021
https://doi.org/10.5194/agile-giss-2-8-2021
https://doi.org/10.5121/ijcses.2018.9601
https://doi.org/10.5121/ijcses.2018.9601
https://doi.org/10.5220/0009392205320539
https://doi.org/10.5220/0009392205320539
https://doi.org/10.36478/jeasci.2019.8289.8292
https://doi.org/10.36478/jeasci.2019.8289.8292
https://doi.org/10.1007/s00500-016-2328-2
https://doi.org/10.1007/s00500-016-2328-2
https://doi.org/10.3233/SW-210447
https://doi.org/10.3233/SW-210447
https://doi.org/10.1007/978-3-030-01614-2_12
https://doi.org/10.1007/978-3-030-01614-2_12
https://doi.org/10.1007/978-3-030-01614-2_12
https://doi.org/10.1016/j.datak.2021.101889
https://doi.org/10.1016/j.datak.2021.101889
https://doi.org/10.3233/SW-222859
https://doi.org/10.3233/SW-222859
https://doi.org/10.3233/SW-190343
https://doi.org/10.3233/SW-190343
https://doi.org/10.1007/s10462-021-10031-1
https://doi.org/10.1007/s10462-021-10031-1
https://doi.org/10.3390/info12050200
https://doi.org/10.3390/info12050200


143 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 

 
[13] A. Abdi, S. Hasan, M. Arshi, S. M. Shamsuddin, and N. Idris, “A question answering system in hadith using 
linguistic knowledge,” Comput Speech Lang, vol. 60, 2020. 

[14] M. Jarrar, “The Arabic ontology – an Arabic wordnet with ontologically clean content,” Appl Ontol, vol. 16, no. 1, 
pp. 1–26, 2021. 

[15] G. M. R. I. Rasiq, A. Al Sefat, T. Hossain, Md. I.-E.-H. Munna, J. J. Jisha, and M. M. Hoque, “Question Answering 
System over Linked Data: A Detailed Survey,” ABC Research Alert, vol. 8, no. 1, 2020. 

[16] M. A. Calijorne Soares and F. S. Parreiras, “A literature review on question answering techniques, paradigms and 
systems,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 6. King Saud bin 
Abdulaziz University, pp. 635–646, Jul. 01, 2020. 

[17] C. Antoniou and N. Bassiliades, “A survey on semantic question answering systems,” The Knowledge Engineering 
Review, vol. 37, no. 3. 2022. 

[18] A. Pereira, A. Trifan, R. P. Lopes, and J. L. Oliveira, “Systematic review of question answering over knowledge 
bases,” IET Software, vol. 16, no. 1, pp. 1–13, Feb. 2022. 

[19] A. Albarghothi, F. Khater, and K. Shaalan, “Arabic Question Answering Using Ontology,” Procedia Comput Sci, 
vol. 117, pp. 183–191, 2017. 

[20] M. Breja and S. K. Jain, “A survey on non-factoid question answering systems,” International Journal of Computers 
and Applications, vol. 44, no. 9, pp. 830–837, 2022. 

[21] M. Mattila and A. Dahanayke, “Systematic Literature Review of Question Answering Systems,” in Lecture Notes in 
Networks and Systems, 2021. 

[22] D. Eberhard, G. Simons, and C. Fennig, “Languages of the World,” Ethnologue. 25rd ed. Dallas, Texas: SIL 
International, 2022. Accessed: Oct. 11, 2022. 

[23] I. Ghosh, “Ranked: The 100 Most Spoken Languages Worldwide,” 2020. Accessed: Oct. 11, 2022. 
[24] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: a benchmark dataset and pre-trained 

language model for Indonesian NLP,” in Proceedings of the 28th International Conference on Computational 
Linguistics, 2020, pp. 757–770. 

[25] S. Li, N. Lin, L. Xiao, and S. Jiang, “IndoAbbr: A New Benchmark Dataset for Indonesian Abbreviation 
Identification,” in 2020 International Conference on Asian Language Processing, IALP 2020, 2020. 

[26] S. S. Alanazi, N. Elfadil, M. Jarajreh, and S. Algarni, “Question Answering Systems: A Systematic Literature 
Review,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 3, 2021. 

[27] E. Dimitrakis, K. Sgontzos, and Y. Tzitzikas, “A survey on question answering systems over linked data and 
documents,” J Intell Inf Syst, vol. 55, no. 2, pp. 233–259, 2020. 

[28] F. T. Admojo and E. Winarko, “Sistem Pencarian Informasi Berbasis Ontologi untuk Jalur Pendakian Gunung 
Menggunakan Query Bahasa Alami dengan Penyajian Peta Interaktif,” IJCCS (Indonesian Journal of Computing 
and Cybernetics Systems), vol. 10, no. 1, pp. 23–34, Jan. 2017. 

[29] A. A. Shah, S. D. Ravana, S. Hamid, and M. A. Ismail, “Accuracy evaluation of methods and techniques in Web-
based question answering systems: a survey,” Knowl Inf Syst, vol. 58, no. 3, pp. 611–650, 2019. 

[30] H. Sulistyanto and A. SN, “A Few Survey of Developments and Challenges Arising on General and Indonesian 
Question Answering System,” in International Conference on Information Systems for Business Competitiveness 
(ICISBC 2013), 2013, pp. 71–75. Accessed: Sep. 05, 2023  

[31] R. Wongso, Meiliana, and D. Suhartono, “A Literature Review of Question Answering System using Named Entity 
Recognition,” in Proceedings - 2016 3rd International Conference on Information Technology, Computer, and 
Electrical Engineering, ICITACEE 2016, 2016, pp. 274–277. 

[32] S. Fandy, Utomo, N. Suryana, and M. S. Azmi, “Question Answering System : A Review On Question Analysis, 
Document Processing, And Answer Extraction Techniques,” Journal of Theoretical and Applied Information 
Technology, vol. 95, no. 14. pp. 3158–3174, 2017. 

[33] A. Abdiansah, A. Azhari, and A. K. Sari, “Survey on Answer Validation for Indonesian Question Answering System 
(IQAS),” International Journal of Intelligent Systems and Applications, vol. 10, no. 4, pp. 68–78, Apr. 2018. 

[34] Y. Puspitarani, “Indonesian Information Extraction : Challenges and Opportunities,” JATISI (Jurnal Teknik 
Informatika dan Sistem Informasi), vol. 8, no. 1, pp. 421–429, 2021. 

[35] A. F. Aji et al., “One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in 
Indonesia,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2022, pp. 
7226–7249. 

[36] F. Darari, A. A. Krisnandhi, and R. Manurung, “OWLizr: Knowledge Representation System for Bahasa Indonesia 
Based on Web Ontology Language Description Logic (OWL DL),” in International Conference on Advanced 
Computer Science And Information Systems 2010, 2010, pp. 293–298. Accessed: Sep. 05, 2023. 

[37] S. J. Putra, R. H. Gusmita, K. Hulliyah, and H. T. Sukmana, “A semantic-based question answering system for 
indonesian translation of Quran,” in Proceedings of the 18th International Conference on Information Integration 
and Web-Based Applications and Services, in iiWAS ’16. New York, NY, USA: Association for Computing 
Machinery, 2016, pp. 504–507. 

[38] V. Atina, E. Sediyono, and R. Rizal, “Information Retrieval System for Indonesian Manuscript using Semantic 
Web,” Int J Comput Appl, vol. 170, no. 8, 2017. 

[39] Wahyudi, M. L. Khodra, A. S. Prihatmanto, and C. Machbub, “A Question Answering System Using Graph-Pattern 
Association Rules (QAGPAR) on YAGO Knowledge Base,” in 2018 International Conference on Information 
Technology Systems and Innovation, ICITSI 2018 - Proceedings, 2018. 

[40] F. S. Utomo, N. Suryana, and M. S. Azmi, “New instances classification framework on Quran ontology applied to 
question answering system,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 17, no. 1, 
pp. 139–146, Feb. 2019. 

[41] R. A. Yunmar and I. Wayan Wiprayoga Wisesa, “Design of Ontology-based Question Answering System for 
Incompleted Sentence Problem,” in IOP Conference Series: Earth and Environmental Science, 2019. 

https://doi.org/10.1016/j.csl.2019.101023
https://doi.org/10.1016/j.csl.2019.101023
https://doi.org/10.3233/ao-200241
https://doi.org/10.3233/ao-200241
https://doi.org/10.18034/abcra.v8i1.449
https://doi.org/10.18034/abcra.v8i1.449
https://doi.org/10.1016/j.jksuci.2018.08.005
https://doi.org/10.1016/j.jksuci.2018.08.005
https://doi.org/10.1016/j.jksuci.2018.08.005
https://doi.org/10.1017/S0269888921000138
https://doi.org/10.1017/S0269888921000138
https://doi.org/10.1049/sfw2.12028
https://doi.org/10.1049/sfw2.12028
https://doi.org/10.1016/j.procs.2017.10.108
https://doi.org/10.1016/j.procs.2017.10.108
:%20https:/doi.org/10.1080/1206212X.2021.1949117
:%20https:/doi.org/10.1080/1206212X.2021.1949117
https://doi.org/10.1007/978-3-030-68476-1_5
https://doi.org/10.1007/978-3-030-68476-1_5
:%20http:/www.ethnologue.com/
:%20http:/www.ethnologue.com/
https://www.visualcapitalist.com/100-most-spoken-languages/
https://aclanthology.org/2020.coling-main.66
https://aclanthology.org/2020.coling-main.66
https://aclanthology.org/2020.coling-main.66
https://doi.org/10.1109/IALP51396.2020.9310514
https://doi.org/10.1109/IALP51396.2020.9310514
https://doi.org/10.14569/IJACSA.2021.0120359
https://doi.org/10.14569/IJACSA.2021.0120359
https://doi.org/10.1007/s10844-019-00584-7
https://doi.org/10.1007/s10844-019-00584-7
https://doi.org/10.22146/ijccs.11186
https://doi.org/10.22146/ijccs.11186
https://doi.org/10.22146/ijccs.11186
https://doi.org/10.1007/s10115-018-1203-0
https://doi.org/10.1007/s10115-018-1203-0
http://eprints.undip.ac.id/41689/
http://eprints.undip.ac.id/41689/
http://eprints.undip.ac.id/41689/
https://doi.org/10.1109/ICITACEE.2016.7892454
https://doi.org/10.1109/ICITACEE.2016.7892454
https://doi.org/10.1109/ICITACEE.2016.7892454
https://www.researchgate.net/publication/318815274_Question_answering_system_A_review_on_question_analysis_document_processing_and_answer_extraction_techniques
https://www.researchgate.net/publication/318815274_Question_answering_system_A_review_on_question_analysis_document_processing_and_answer_extraction_techniques
https://www.researchgate.net/publication/318815274_Question_answering_system_A_review_on_question_analysis_document_processing_and_answer_extraction_techniques
https://doi.org/10.5815/ijisa.2018.04.08
https://doi.org/10.5815/ijisa.2018.04.08
https://doi.org/10.35957/jatisi.v8i1.710
https://doi.org/10.35957/jatisi.v8i1.710
https://doi.org/10.18653/v1/2022.acl-long.500
https://doi.org/10.18653/v1/2022.acl-long.500
https://doi.org/10.18653/v1/2022.acl-long.500
%5b1%5d%09https:/ir.cs.ui.ac.id/publication/2010/owlizr.pdf
%5b1%5d%09https:/ir.cs.ui.ac.id/publication/2010/owlizr.pdf
%5b1%5d%09https:/ir.cs.ui.ac.id/publication/2010/owlizr.pdf
https://doi.org/10.1145/3011141.3011219
https://doi.org/10.1145/3011141.3011219
https://doi.org/10.1145/3011141.3011219
https://doi.org/10.1145/3011141.3011219
https://doi.org/10.5120/ijca2017914930
https://doi.org/10.5120/ijca2017914930
https://doi.org/10.1109/ICITSI.2018.8696046
https://doi.org/10.1109/ICITSI.2018.8696046
https://doi.org/10.1109/ICITSI.2018.8696046
https://doi.org/10.12928/TELKOMNIKA.v17i1.9794
https://doi.org/10.12928/TELKOMNIKA.v17i1.9794
https://doi.org/10.12928/TELKOMNIKA.v17i1.9794
https://doi.org/10.1088/1755-1315/258/1/012032
https://doi.org/10.1088/1755-1315/258/1/012032


 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 144 

 
[42] A. Amalia, P. Y. C. Sipahutar, E. Elviwani, and F. Purnamasari, “Chatbot Implementation with Semantic 
Technology for Drugs Information Searching System,” in Journal of Physics: Conference Series, 2020. 

[43] F. Ishlakhuddin and A. SN, “Ontology-based Chatbot to Support Monitoring of Server Performance and Security By 
Rule-base,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 2, p. 131, Apr. 2021. 

[44] M. I. Rahajeng and A. Purwarianti, “Indonesian Question Answering System for Factoid Questions using Face 
Beauty Products Knowledge Graph,” Jurnal Linguistik Komputasional, vol. 4, no. 2, pp. 59–63, 2021. 

[45] E. S. B. Perangin-Angin, Z. K. A. Baizal, and D. Richasdy, “Question Answering using Ontology for Sumedang 
Larang History with Support Vector Machine Based on Telegram Bot,” Jurnal Media Informatika Budidarma, vol. 
6, no. 4, pp. 2438–2445, Oct. 2022. 

[46] A. N. Hasanah, A. Baizal, and R. Dharayani, “Question Answering For Sumedang Larang Kingdom Using The 
Multilayer Perceptron Algorithm,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 7, no. 4, 2022. 

[47] R. Jasmi, Z. K. A. Baizal, and D. Richasdy, “Question Answering Chatbot using Ontology for History of the 
Sumedang Larang Kingdom using Cosine Similarity as Similarity Measure,” JURNAL MEDIA INFORMATIKA 
BUDIDARMA, vol. 6, no. 4, 2022. 

[48] R. F. Saldhi, Z. K. A. Baizal, and R. Dharayani, “Question Answering System at the Kingdom of Sumedang Larang 
with Naïve Bayes Method,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 322–329, 2022. 

[49] S. A. Anggrayni, Z. K. A. Baizal, and D. Richasdy, “Question Answering System Using Semantic Reasoning on 
Ontology for The History of The Sumedang Larang Kingdom,” Building of Informatics, Technology and Science 
(BITS), vol. 4, no. 2, pp. 545–553, 2022. 

[50] R. Mahendra, S. D. Larasati, and R. Manurung, “Extending an Indonesian semantic analysis -based question 
answering system with linguistic and world knowledge axioms,” in Proceedings of the 22nd Pacific Asia Conference 
on Language, Information and Computation, PACLIC 22, 2008. 

[51] K. Höffner, S. Walter, E. Marx, R. Usbeck, J. Lehmann, and A. C. Ngonga Ngomo, “Survey on challenges of 
Question Answering in the Semantic Web,” Semant Web, vol. 8, no. 6, pp. 895–920, 2017.  

[52] A. Farea, Z. Yang, K. Duong, N. Perera, and F. Emmert-Streib, “Evaluation of Question Answering Systems: 
Complexity of judging a natural language.” 2022. 

[53] A. M. Moeliono, H. Lapoliwa, H. Alwi, S. S. Tjatur, W. Sasangka, and S. Sugiyono, Tata Bahasa Baku Bahasa 
Indonesia, 4th ed. Jakarta: Kementerian Pendidikan dan Kebudayaan Republik Indonesia, 2017. 

[54] Y. Lan, G. He, J. Jiang, J. Jiang, W. X. Zhao, and J. R. Wen, “A Survey on Complex Know ledge Base Question 
Answering: Methods, Challenges and Solutions,” in Proceedings of the Thirtieth International Joint Conference on 
Artificial Intelligence, 2021, pp. 4483–4491. 

[55] C. Zhang, Y. Lai, Y. Feng, and D. Zhao, “A review of deep learning in question answering over knowledge bases,” 
AI Open, vol. 2. pp. 205–215, 2021. 

[56] A. Dhandapani and V. Vadivel, “Question Answering System over Semantic Web,” IEEE Access, vol. 9, pp. 
46900–46910, 2021. 

[57] T. Rebele, F. Suchanek, J. Hoffart, J. Biega, E. Kuzey, and G. Weikum, “YAGO: A multilingual knowledge base 
from wikipedia, wordnet, and geonames,” in Lecture Notes in Computer Science (including subseries Lecture Notes 
in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016. 

 
https://doi.org/10.1088/1742-6596/1566/1/012077
https://doi.org/10.1088/1742-6596/1566/1/012077
https://doi.org/10.22146/ijccs.58588
https://doi.org/10.22146/ijccs.58588
https://doi.org/10.26418/jlk.v4i2.62
https://doi.org/10.26418/jlk.v4i2.62
https://doi.org/10.30865/mib.v6i4.4574
https://doi.org/10.30865/mib.v6i4.4574
https://doi.org/10.30865/mib.v6i4.4574
https://doi.org/10.29100/jipi.v7i4.3206
https://doi.org/10.29100/jipi.v7i4.3206
https://doi.org/10.30865/mib.v6i4.4530
https://doi.org/10.30865/mib.v6i4.4530
https://doi.org/10.30865/mib.v6i4.4530
https://doi.org/10.47065/josyc.v3i4.2079
https://doi.org/10.47065/josyc.v3i4.2079
https://doi.org/10.47065/bits.v4i2.1910
https://doi.org/10.47065/bits.v4i2.1910
https://doi.org/10.47065/bits.v4i2.1910
https://www.researchgate.net/publication/321016806_Extending_an_Indonesian_Semantic_Analysis-based_Question_Answering_System_with_Linguistic_and_World_Knowledge_Axioms
https://www.researchgate.net/publication/321016806_Extending_an_Indonesian_Semantic_Analysis-based_Question_Answering_System_with_Linguistic_and_World_Knowledge_Axioms
https://www.researchgate.net/publication/321016806_Extending_an_Indonesian_Semantic_Analysis-based_Question_Answering_System_with_Linguistic_and_World_Knowledge_Axioms
https://doi.org/10.3233/SW-160247.
https://doi.org/10.3233/SW-160247.
https://doi.org/10.48550/arXiv.2209.12617
https://doi.org/10.48550/arXiv.2209.12617
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwitwL2HloGCAxUB9DgGHbhRAVQQFnoECBgQAQ&url=https%3A%2F%2Frepositori.kemdikbud.go.id%2F16351%2F1%2FTata%2520Bahasa%2520Baku%2520Bahasa%2520Indonesia%2520edisi%2520keempat.pdf&usg=AOvVaw0bUL_r9654vl10Co3_1_45&opi=89978449
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwitwL2HloGCAxUB9DgGHbhRAVQQFnoECBgQAQ&url=https%3A%2F%2Frepositori.kemdikbud.go.id%2F16351%2F1%2FTata%2520Bahasa%2520Baku%2520Bahasa%2520Indonesia%2520edisi%2520keempat.pdf&usg=AOvVaw0bUL_r9654vl10Co3_1_45&opi=89978449
https://doi.org/10.24963/ijcai.2021/611
https://doi.org/10.24963/ijcai.2021/611
https://doi.org/10.24963/ijcai.2021/611
https://doi.org/10.1016/j.aiopen.2021.12.001
https://doi.org/10.1016/j.aiopen.2021.12.001
https://doi.org/10.1109/ACCESS.2021.3067942
https://doi.org/10.1109/ACCESS.2021.3067942
https://doi.org/10.1007/978-3-319-46547-0_19
https://doi.org/10.1007/978-3-319-46547-0_19
https://doi.org/10.1007/978-3-319-46547-0_19