Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 6, No 2, October 2023, pp. 129–144 eISSN 2597-4637 https://doi.org/10.17977/um018v6i22023p129-144 ©2023 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Systematic Literature Review on Ontology-based Indonesian Question Answering System Fadhila Tangguh Admojo a,1, Adidah Lajis b,2,*, Haidawati Nasir b,3 a Universitas Bina Darma Jl. Jenderal Ahamd Yani No. 3, Palembang 30111, Indonesia b Universiti Kuala LumpurMalaysian Institute of Information Technology (MIIT) 1016, Jalan Sultan Ismail, Kuala Lumpur 50250, Malaysia, 1 fadhila.tangguh@binadarma.ac.id; 2 adidahl@unikl.edu.my*; 3 haidawati@unikl.edu.my * corresponding author I. Introduction As one of Artificial Intelligence (AI) applications, Question Answering (QA) stands at the intersection of Natural Language Processing (NLP), Information Retrieval (IR), knowledge representation, and computational linguistics [1][2]. The primary objective of the QA system is to provide relevant responses to queries presented in the form of natural language [3]. With the increasing amount of available online information, QA systems offer greater convenience and efficiency than search engines by presenting the final answer directly to the question rather than returning a list of relevant information or hyperlinks [4]. In an ontology-based QA system, the Knowledge Base (KB) structure, which is the source of answers to questions, is defined in an ontology [5]. Ontology describes the concepts in the domain and the relationships among these concepts [6][7]. Ontologies help create a shared understanding of data, information, and knowledge for human-to-machine or machine-to-machine communication and collaboration [8]. The standard to specify ontologies proposed by the World Wide Web Consortium (W3C) is the Resource Description Framework (RDF) format for modeling KB, Web Ontology Language (OWL) to support the description logics inference ability and SPARQL query language for accessing the data. ARTICLE INFO A B S T R A C T Article history: Received 06 September 2023 Revised 18 September 2023 Accepted 30 September 2023 Published online 19 October 2023 Question-Answering (QA) systems at the intersection of natural language processing, information retrieval, and knowledge representation aim to provide efficient responses to natural language queries. These systems have seen extensive development in English and languages like Indonesian present unique challenges and opportunities. This literature review paper delves into the state of ontology- based Indonesian QA systems, highlighting critical challenges. The first challenge lies in sentence understanding, variations, and complexity. Most systems rely on syntactic analysis and struggle to grasp sentence semantics. Complex sentences, especially in Indonesian, pose difficulties in parsing, semantic interpretation, and knowledge extraction. Addressing these linguistic intricacies is pivotal for accurate responses. Secondly, template-based SPARQL query construction, commonly used in Indonesian QA systems, suffers from semantic gaps and inflexibility. Advanced techniques like semantic matching algorithms and dynamic template generation can bridge these gaps and adapt to evolving ontologies. Thirdly, lexical gaps and ambiguity hinder QA systems. Bridging vocabulary mismatches between user queries and ontology labels remains a challenge. Strategies like synonym expansion, word embedding, and ontology enrichment must be explored further to overcome these challenges. Lastly, the review discusses the potential of developing multi- domain ontologies to broaden the knowledge coverage of QA systems. While this presents complex linguistic and ontological challenges, it offers the advantage of responding to various user queries across various domains. This literature review identifies crucial challenges in developing ontology-based Indonesian QA systems and suggests innovative approaches to address these challenges. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Literature Review Ontology-based Indonesian QA System Semantic Parser http://u.lipi.go.id/1502081730 http://u.lipi.go.id/1502081046 http://journal2.um.ac.id/index.php/keds mailto:keds.journal@um.ac.id https://creativecommons.org/licenses/by-sa/4.0/ https://creativecommons.org/licenses/by-sa/4.0/ F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 130 The essence of an ontology-based QA system is to retrieve the desired information from one or many ontologies using a natural language query; however, to be able to achieve this, the QA system must first be able to understand the meaning/intention of the natural language query and then transform it into an appropriate SPARQL query to obtain the desired information from ontologies [9]. Hence, the challenge in developing the QA system is due to human language’s complexity and ambiguity. Many systems have been developed using different approaches for different questions in different languages. However, most of the available QA systems operate exclusively in English, and the existing approaches are also not designed to adapt quickly to new knowledge bases and other languages [10][11][12][13][14][15][16][17][18][19][20][21]. Indonesian, locally called Bahasa Indonesia, is the national language of Indonesia, spoken by over 198 million people, and is one of the most frequently spoken languages in the world [22][23][24]. According to Internet World Stats data, as of Mar. 31, 2020, Indonesian was the sixth among the top ten languages utilized on the web, with 306 million users and an internet penetration rate of 64.6% [25]. Regardless, compared to other world’s top 10 most used languages, such as English, Mandarin, Hindi, Spanish, French, and Arabic, the developments of Indonesian QA are still far behind. Over the past decade, increasing demand for ontology-based Indonesian QA systems has been driven by the need to efficiently navigate and utilize ever-increasing sources of digital content while addressing language diversity [26]. Critical to facilitating access to digital information, these systems have become essential tools in various sectors, including education, research, business, and government. The literature review explores the landscape of existing ontology-based Indonesian QA systems, aiming to uncover the challenges, limitations, and gaps that hinder the system from reaching its full potential. The challenges facing the ontology-based Indonesian QA system have many aspects and require different considerations. These challenges include the sophistication of semantic understanding, the complexity of sentence variations, the constraints of template-based query construction, the traps of lexical gaps and ambiguities, and the uncharted territory of multi-domain ontology. Each challenge presents opportunities for more profound research, innovation, and collaboration among linguists, computer scientists, and domain experts. This section provides an overview of previous reviews and surveys regarding NLP and QA systems for Indonesians to ensure that the discussion in this paper has never been discussed before. Seven publications can be found that review and survey the Indonesian QA system and NLP resources. None of these papers discuss the ontology-based Indonesian QA system. A summary of the previous review is shown in Table 1. Table 1. A Summary of review on Indonesian QA system Reference Brief explanation Coverage Sulistyanto et al.2013 [27] Identify approaches and methods that have been used in QAS Indonesia, as well as discuss development trends and emerging challenges 2008-2013 Wongso et al. 2017 [28] Review the Indonesian QA System using Named Entity Recognition (NER) 2005-2015 Utomo et al. 2017 [29] Reviewing the state of question analysis, document processing, and answer extraction techniques - Abdiansyah et al. 2018 [30] Survey on answer validation (AV) 2005-2017 Puspitarani et al. 2021 [31] Review of current research trends, challenges, and information extraction opportunities using Indonesian. 2014-2019 Aji et al. 2022 [32] Provides an overview of the current state of NLP research and highlights challenges in Indonesian NLP 2011-2021 The contributions of this paper are as follows: • Update review of existing literature covering recent work on Indonesian ontology-based QA systems. • Several previous reviews have focused on components and techniques. This paper attempts to analyze the ontology-based Indonesian QA system from a linguistic aspect to evaluate the system’s ability to understand natural language-based input. 131 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 • Looking for gaps and presenting existing challenges for future development. The remaining sections of the paper are structured as follows. Section 2 delves into the methodology employed in the literature review, offering insights into the strict approach. Section 3 provides a comprehensive overview of the architecture and classification of QA systems, shedding light on these systems' fundamental components and typologies. Section 4 categorizes and explains all the systems featured in the selected papers, providing a detailed analysis of their respective characteristics and functionalities. Section 5 outlines the pervasive gaps and formidable challenges that characterize the landscape of ontology-based Indonesian QA systems, underscoring the critical areas necessitating further exploration and resolution. Finally, Section 6 draws insightful conclusions from the review and charting direction for the future development of ontology-based Indonesian QA systems. II. Methods This literature review follows a systematic and structured methodology to identify, select, and analyze relevant literature on the challenges encountered in ontology-based Indonesian QA systems. The primary goal of this literature review is to investigate the challenges and limitations faced by ontology-based QA systems when applied to the Indonesian language. This review seeks to identify the key issues hindering such systems' development and effectiveness and understand potential solutions or strategies proposed in the existing literature. The selection of appropriate databases and a well-defined search strategy are essential to ensure comprehensive coverage of the relevant literature. Google Scholar will be utilized for the literature search. The search strategy will employ a combination of keywords and phrases relevant to the research topic. These include: “ontology-based question answering”, "Indonesian QA systems", "Indonesian question answering", "ontology-driven QA challenges", "semantic parsing in Indonesian", "knowledge base-driven QA issues", "ontology-based QA limitations", "Indonesian language processing challenges", "QA system semantic analysis problems", "ontology-based QA system difficulties", "challenges in Indonesian language QA". The search process will involve iterative refinements of the search terms to ensure the retrieval of the most relevant literature. Explicit inclusion and exclusion criteria will be applied to select relevant literature while excluding irrelevant. The Inclusion and Exclusion criteria describe in Table 2. Table 2. Inclusion dan exclusion criteria Inclusion Criteria Exclusion Criteria Papers are written in English. Papers published in conferences or online journals platform. Papers about ontology-based Indonesian QA systems. Papers published between 2010 to 2022 Papers that are not written in English duplicated paper Full content of papers not available/could not be found. Paper contains theoretical concepts without proof of implementation The screening process will encompass an initial title and abstract screening, followed by a full- text review. For the selected literature, relevant data will be extracted and organized. This data will include publication details, research objectives, methodologies employed, key findings, and challenges identified. The extracted data will be synthesized to identify common themes, patterns, and recurring challenges across the literature. The quality and rigor of the selected literature will be assessed through critical appraisal. Each paper's research methodology, experimental design, and contribution to understanding ontology- based Indonesian QA system challenges will be evaluated. The assessment will consider factors such as the validity of findings and the credibility of the research. The synthesized data will undergo thematic analysis to identify overarching themes and patterns related to the challenges faced by ontology-based Indonesian QA systems. The identified challenges will be categorized and discussed in detail in the literature review. The findings of this literature review will be documented in a structured report format, including sections dedicated to the introduction, methodology, literature review, thematic analysis, discussion, F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 132 and conclusion. Proper citation and referencing of sources will be ensured, adhering to academic writing standards and citation guidelines. Ethical guidelines for conducting research will be followed diligently. III. QA System Architecture and Classification Typically, QA systems follow a pipeline design, where data undergoes sequential processing, ensuring that the outcome of one component serves as the input for the subsequent one. The architecture of QA systems comprises three key components: question analysis, document analysis, and answer analysis [26]. In practice, the components of the QA system depend on the approach used and the type of data source underlying it [27]. An Ontology-based QA system does not involve document analysis to get answers from candidate from unstructured text or documents like corpus- based systems. However, it uses SPARQL query language to extract information from the ontology that represents the data source [28]. An ontology-based QA system’s working stages include question analysis, query construction, and answer analysis. Question analysis is the first stage of processing natural language queries, which aims to understand the question using various analyses, including morphological analysis to separate words into individual units, syntactic analysis to identify grammar, usually involving constructing parse trees, and semantic analysis to identify relationships between words, phrases, clauses, and sentence levels to obtain the correct meaning. The second stage is query construction, aimed at producing SPARQL queries based on the results from the question analysis stage. The final stage is answer analysis, executing SPARQL queries to extract answers from the ontology. According to [17][27] QA systems can be classified based on specific criteria, as shown in Figure 1. Questions can be categorized into factoid, non-factoid, list, and confirmation. Factoid questions usually contain When, Where, How many/much, What, and Who. Answers to factoid questions are usually short and specific. Non-factoid questions ask for explanations; answers are presented as definitions, usually using Why and How to. The list of questions is almost the same as the factoid questions; the difference lies in the number of answers. A Boolean-type question, also known as a confirmation question type. The answer to a confirmation question is yes or no, true or false. The types of domains in the QA system are divided into closed and open domains. Close domain focuses on answering questions under a specific domain, such as sports, history, and culture. Open domain does not focus on just one domain. The goal is to answer questions with a wide range. The QA system relies entirely on data sources to generate answers. Data source types can be structured, semi-structured, and unstructured. According to [26][29][33][34][35], the approach that forms the basis of the work phase of the QA system can be classified into linguistic-based, rule-based, statistical-based, and pattern-based approaches. Fig. 1.QA system classification 133 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 IV. Ontology-based Indonesian QA system Based on the methodology stages that have been carried out and filtering using inclusion and exclusion criteria of previous papers regarding the ontology-based Indonesian QA system from 2010 to 2022, only 14 papers were selected and arranged based on their classification, as shown in Table 3. This section reviews the 14 ontology-based Indonesian QA systems and their novelties to highlight current challenges. The reviews are grouped according to four main approaches: (1) linguistic, (2) rule-based, (3) pattern matching, and (4) statistical. Table 3. Classification of 14 selected papers Citation Ref. Question Type Domain Approach Darari et al. 2010 [36] Factoid Close Linguistic Putra et al. 2016 [37] Factoid Close Statistical Atina er al. 2017 [38] Factoid Close Pattern Matching Wahyudi et al. 2018 [39] Factoid Open Pattern Matching Utomo et al. 2019 [40] Factoid Close Statistical Yunmar et al. 2019 [41] Factoid Close Pattern Matching Amalia et al. 2020 [42] Factoid Close Pattern Matching Ishlakhuddin et al. 2021 [43] Factoid Close Rule-Based Rahajeng et al. 2021 [44] Factoid Close Statistical Perangin-Angin et al. 2022 [45] Factoid Close Statistical Hasanah et al. 2022 [46] Factoid, Non-Factoid Close Statistical Jasmi et al. 2022 [47] Factoid Close Statistical Saldhi et al. 2022 [48] Factoid Close Statistical Anggrayni et al 2022 [49] Factoid Close Pattern Matching A. Linguistic Approach The ontology-based Indonesian QA system presented by [36] consists of an NLP semantic analyzer and SPARQL query generator module. [36] reused the Semantic Analyzer module developed by [50]. This module has four parts: (1) The lexicon contains vocabulary words and linguistic information. (2) Grammar to determine the structure of sentences syntactically. (3) Lexical semantics store semantic values for each word in the lexicon. (4) Semantic attachment rules instructions to produce semantic representations based on grammatical rules. The NLP semantic analyzer receives interrogative sentence input and then performs a parsing process to produce semantic notations (lambda-calculus) using a syntax-driven semantic analysis technique. NLP semantic analyzer generates questions and conditional variables. SPARQL query generator translates the semantic notation into SPARQL query by arranging the question variable for the SELECT clause and conditional variable for the WHERE clause, and the final answer is generated from the query execution result. The SPARQL query formation process is illustrated in Figure 2. Fig. 2. The formation of SPARQL query B. Rule-based Approach The development of a QA system with a rule-based approach was carried out by [43]. The system receives input in the form of interrogative sentences. The question analysis process begins by breaking sentences into tokens (tokenizing), then removing stop words (filtering), and classifying the order of tokens according to their class (class) in Ontology. Classification results are stored in variables for examination using specific rules to determine answers. The final answer is not extracted from the ontology and does not use SPARQL queries. The ontology developed by [43] is only used as a lexicon for the classification process. F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 134 C. Pattern Matching Approach The reference authors [38] developed an information retrieval system that accepts command sentences as input. NLP tasks consist of case folding, tokenization, and filtering to generate a series of words to be matched against specific patterns to generate SPARQL queries. The final information is presented based on the results of the SPARQL query execution. The authors of reference [39] suggest a Question Answering (QA) system employing the graph pattern association rule (QAGPAR) within the YAGO knowledge base. The input question using interrogative sentences is translated into graph form through four steps: (1) question classification, (2) graph component formulation, (3) query formulations, and (4) query processing. In the first step, the question must match the existing templates. Next, in the second step, the output from the first step will be transformed into a graph form. Then, the third step produces a query based on the model from the graph component. Finally, query processing executes the query to obtain the answer from the KB. Moreover, [39] adds an optimization query feature to not found-answers from the database by using graph-pattern association. The reference authors [41] design an ontology-based Indonesian QA system that can process incomplete question sentences, such as a question sentence without a question word or an object of the question and a question sentence with unclear adjectives. The stages in question analysis are Stemming, Stop word Removal, Tokenizing, Post Tagging, and Keyword Identification. The SPARQL query is formulated using keyword association, predicate identification, and property identification to fill the slots in the prepared query template. [41] Also, design an ontology that uses a keyword property that functions as a thesaurus to find the question objective. Another study that used steps similar to [41] in the analysis process to form SPARQL queries was [49]. The difference lies in the ontology domain. An illustration of sentence processing with pattern matching is presented in Figure 3. Fig. 3. Illustration of sentence processing with pattern matching D. Statistical Approach In research conducted by [37], to obtain semantic similarity between the questions and each verse in the KB, the questions are transformed into weighted vectors using the term frequency-inverse document frequency (TF-IDF). Then, the semantic similarity is measured using the cosine similarity algorithm between questions and verses in the knowledge base to retrieve relevant verses. After obtaining the semantic relevance verses, named entity recognition (NER) and feature extraction are performed to select the best verse and extract the answers. The verse with the highest score based on 135 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 the correctness of the probability calculation is returned as the answer.The authors of reference [40] build QA framework consisting of six steps: (1) pre-processing, (2) morphological analysis, (3) question classification, (4) query expansion, (5) document processing, and (6) answer extraction. The first step, pre-processing, consists of tokenization, punctuation removal, stop-word removal, and stemming operation. The output of the morphological analysis is the essential keyword in the root form, which then becomes the input for the question classification and query expansion. In the third step, the Radial Basis Function Network algorithm (RBFN) is used to extract and determine the answer type [40] utilize their own training data set that is created by the TF-IDF technique. Query expansion acts to find the synonym and extend the keyword using available Indonesian WordNet. The answer type and the keyword are used as the inputs in the fifth step, generating the SPARQL query with answer type (CLASS) and keyword (Instance) as the parameters to be executed. Lastly, the execution result from the fifth step becomes the input (candidate answers) for answer extraction, implementing a word-matching scoring technique to list the answers by counting the number of similar words between synonyms and candidate answers and determining the best score based on the highest score. Word matching scoring technique is applied to create a list of answers by counting the number of similar words between synonyms and candidate answers. The best answer is determined based on the highest score. The authors of reference [44] developed an Indonesian QA system that employs a Knowledge Graph as its data source. This system comprises four distinct modules: (1) question classification,(2) information extraction, (3) token mapping, and (4) query construction. The first module involves classifying the question to determine the appropriate class for the 'SELECT' statement. The second module identifies a set of extracted tokens and assigns them token-type labels, which can be correspondingly mapped to the 'WHERE' statement. The third module uses a set of extracted tokens, a token type label, and a lexicalization dictionary sourced from the Knowledge Graph resources. This dictionary is established using translations and synonyms from the training data. Each extracted token's lexical similarity to resources of the same type is computed. The token is then paired with the resource exhibiting the highest similarity value, which becomes the input for the final module. In the last module, the results of token mapping and the answer type class are utilized, employing basic query templates to formulate SPARQL queries. In the first and second modules, the author of [44] compares three language models, SVM, LSTM, fine-tuned IndoBERT, and three text representations: TF-IDF, FastText, and IndoBERT. The labeled data for training and testing purposes were collected from 503 questions. An illustration of system architecture with a statistical approach is presented in Figure 4. Summary of the evaluation of ontology-based Indonesian QA system can be seen in Table 4. Fig. 4. Illustration of sentence processing with the statistical approach F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 136 Table 4. Summary of the evaluation of ontology-based Indonesian QA system Ref. Ontology Domain QA Analysis Evaluation [36] Economic - Semantic Analyzer [50] - SPARQL generator - [37] Indonesian Translated Quran - Stemming, NER, - TF-IDF, Cosine similarity - NER, Feature Extraction - [38] Indonesian Manuscript - Case Folding, Tokenizing, Filtering - SPARQL query template - Indexing Precision: 100% Recall: 93.3% [39] - - Question classification (template pattern) - Graph transformation (Neo4j), Query generation - Query processing with graph-pattern association rules Accuracy: 90% [40] Quran - Tokenizing, punctuation and Stop word remover, Stemming - TF-IDF + Classification (RBFN), Query expansion - SPARQL query template, word scoring, answer ranking - [41] University Admission - Stemming, Stop word Remover, Tokenizing, POS Tagging - Keyword Identification - SPARQL query template - [42] Drug and Disease - Case folding, tokenizing, phrase detection, Stemming, Filtering - Keyword Identification - SPARQL query template Accuracy: 90% [43] Computer Server - Tokenizing, Stop word remover, classification - Rule-based syntactic parser - data extraction, answer generating Accuracy: 95% [44] Face Beauty Product - Question classification and Information Extraction (SVM + IndoBERT, LSTM + IndoBERT) - Token Mapping: Lexical similarity - SPARQL query template Precision: 0.8823529 Recall: 0.8418301 F-Measure: 0.8499703 [45] National History - Case Folding, Tokenizing, POS tagging - N-Gram + Classification (SVM) - SPARQL query template Accuracy: base question 93% non-base question 80% [46] National History - Case Folding, Tokenizing, POS tagging - N-Gram + Classification (Multi-Layer Perceptron) - SPARQL query template Accuracy: 57.37% [47] National History - Case Folding, Stemming (Sastrawi), stop word removal, , TF-IDF - Cosine Similarity (sklearn) - SPARQL query template Precision: 0.70 Recall: 0.94 F-Measure: 0.80 [48] National History - Case Folding, Stemming (Sastrawi), stop word removal, TF-IDF - Question Analysis: Classification (Naive Bayes) - SPARQL query template Classification accuracy: 67% (8.2 and 7:3 ratio) [49] National History - Case Folding, Stop word removal (Sastrawi), Tokenizing, POS tagging - Question Analysis: Word Identification - SPARQL query template Accuracy: 87% V. Challenges in ontology-based Indonesian QA system Exploring the challenges faced by ontology-based Indonesian QA systems is a crucial effort in advancing the field of natural language processing and knowledge retrieval for the Indonesian language. These systems, designed to leverage structured knowledge representations to answer user questions, face complex, diverse, and often interconnected challenges. These challenges encompass a spectrum of linguistic, semantic, and technical intrigue. Uncovering and addressing this complexity effectively is not only crucial for developing QA systems that are more accurate, context-aware, and user-friendly but also holds substantial implications for the advancement of natural language understanding and information retrieval adapted to Indonesia's linguistic and semantic landscape. This section discusses the gaps and challenges the ontology-based Indonesian QA system faces based on an investigation of the reviewed literature. In addition, since most of the challenges mentioned in this section are open, and some still need to be explored, it also highlights potential solutions and their implications for future progress. 137 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 A. Sentence meaning, sentence variation, and complex sentence. The system's ability to understand sentences semantically is still challenging in developing an ontology-based Indonesian QA system. Almost all existing ontology-based Indonesian QA systems rely on syntactic analysis to understand questions, with two main tasks in the parsing process are checking the suitability of the word sequence structure based on predefined patterns (domain- specific grammar) and identifying keywords based on instances (subjects or objects), properties, classes, or relationships between words from the ontology. For these weaknesses [41][49] seek to improve performance and accuracy by adding semantic features such as synonyms, hyponyms, antonyms, and co-occurrent words (neighbors on the left or right). The aim is to expand the meaning of keywords based on the similarity of context/meaning on specific labels in the ontology or predefined dictionary. Understanding semantics at the word level is not enough to understand the meaning of a sentence. For example, to understand the relationship between "aku memiliki kaka tua" and "aku punya peliharaan" or to be able to distinguish between "apa yang Joko makan" and "apa yang memakan Joko" the QA system must also be able to understand how words relate to each other in sentences. As reported in studies [44][45][46][47], limitations in understanding questions semantically also occur in statistical-based approaches. Using several classification techniques with Indonesian language text representations has excellent performance results in classifying questions. However, the problem lies in the information extraction process, which relies on the results of lexical similarity calculations in determining which word or phrase corresponds to an instance (subject or object), property, or class in the ontology. This problem can cause prediction errors so that the resulting SPARQL query does not match the sentence's meaning or the question's context, which causes the system to give the wrong answer. The only work that discusses deep syntactic and semantic processing for Indonesian is OWLizr [36]. OWLizr can perform deep parsing by combining syntax rules with semantic composition to produce semantic interpretations, as shown in Figure 5. Fig. 5.Parsing illustration with semantic attachment In this way, the sentence can be fully understood from a semantic point of view (deep structure) so that the system can understand that "nasi memakan joko" is semantically incorrect even though it is syntactically correct. Although deep syntactic and semantic processing models used by [36] are promising, the complexity inherent in using lambda calculus notation and Prolog can be an obstacle when dealing with complex linguistic phenomena, such as long, convoluted sentences or sentences with many clauses. Parsing processes are driven by rigid rules and formalisms, not easily adapted to accommodate the structural complexity often encountered in real-world questions. The model's performance will also suffer when faced with sentences involving nested or embedded structures, where the interaction of various grammatical rules and lexical semantics can become very complicated. The model's ability to handle sentence variations, including differences in word order, sentence length, and syntactic construction, is also a limitation. Although this model's grammatical rules and lexical semantics are precise, it will be challenging to adapt to the diversity of linguistic expressions found in Indonesian, where word order and phrase structure can vary significantly based on context and language style. These limitations may cause the interpretation of questions to be F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 138 inaccurate or incomplete. Moreover, the effectiveness of [36] models relies on the completeness and accuracy of their linguistic resources. Incomplete or outdated resources can limit performance and hamper a model's ability to provide accurate answers in a rapidly evolving linguistic landscape. Complex sentences are still recognized as one of the toughest challenges faced by QA systems and are still the focus of discussion in recent research, even for the most widely studied language, English [27][51][52]. The ability of the ontology-based QA system to understand sentences semantically is related to its ability to handle complex sentence structures. Based on facts from the literature reviewed, it shows that the development of an ontology-based Indonesian QA system often focuses on handling more straightforward questions based on facts and structured knowledge, so handling complex sentences, including sentence variations, is an area that is less explored and discussed. According to [53], Indonesian complex sentences, commonly called multilevel compound sentences, consist of two clauses, and one of the clauses becomes part of the other. Complex questions usually contain many subjects, express many relationships, and include numerical operations [54][55]. An example of a complex question in Indonesian is "bisakah anda menyajikan peta lokasi gunung yang ada di pulau jawa yang ketinggianya lebih dari 3000 mdpl yang memiliki lebih dari 1 jalur pendakian?" in English "Can you present a map of the location of mountains on the island of Java with a height of more than 3000 meters above sea level and which has more than 1 climbing route?". These sentences are rich in meaning but often cause automatic parsing, semantic interpretation, and knowledge extraction difficulties. Moreover, it involves geographic entities or concepts that require spatial operations that still need to be solved for QA systems to answer [1]. Ontology-based Indonesian QA systems typically depend on well-defined entities and relationships, making extracting precise information from sentences with multiple levels of embedding or intricate phrasing challenging. Moreover, complex sentences often involve various linguistic phenomena, including idiomatic expressions, subordination, and coordination, which can further complicate the process of knowledge extraction. The problem of handling complex sentences is inherently related to the issue of sentence variation because complex sentences represent a subset of sentence variations. Both challenges demand robust natural language understanding capabilities and sophisticated linguistic analysis. Complex sentences, with their complex syntactic and semantic structures, exemplify the nuances and intricacies of sentence variation. Therefore, addressing complex sentences effectively is a foundational step toward tackling the broader problem of sentence variation. In Indonesian, sentence variation includes various grammatical forms, word orders, and contextual dependencies. As an example of variations of sentences in Indonesian, “Siapakah yang menulis buku The Study in Scarlet?”, “Buku The Study in Scarlet ditulis oleh siapa?” and "Ditulis oleh siapa buku The Study in Scarlet?" These sentences are different syntactically but have the same meaning, asking who is the author of The Study in Scarlet? Most QA research in Indonesia has traditionally focused on more fundamental challenges, such as ontology design, knowledge representation, information retrieval techniques, or improving entity recognition. While sentence variation remains a relatively under-discussed problem in ontology-based Indonesian QA research, it is a crucial challenge that requires more dedicated attention. Addressing this issue can lead to the development of more accurate and adaptable QA systems that can effectively navigate the linguistic intricacies of Indonesian and provide contextually relevant answers to a broader range of user queries. Understanding sentence semantics, handling complex sentence structures, and dealing with sentence variations are interconnected challenges at the core of developing a robust and advanced ontology-based Indonesian QA system. These challenges have a symbiotic relationship, where each aspect significantly influences and impacts the other aspects, thus underscoring the need for a holistic approach. Addressing these interconnected challenges in an ontology-based Indonesian QA system will dramatically improve the accuracy and relevance of the answers provided, making them more valuable for information retrieval and increasing user satisfaction. Therefore, further research 139 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 needs to focus on answering these challenges in the context of the ontology-based Indonesian QA system. Several strategies can be studied further to answer the challenges. Firstly, leveraging advanced NLP techniques, which combine syntactic analysis with semantic parsing techniques, can empower the system to deconstruct complex sentences and extract meaningful information accurately. Second, build an Indonesian linguistic ontology as a fundamental source for semantic parsing. Developing a semantic parser requires a foundation that provides structured knowledge about language-specific concepts, word meanings, and relationships between words to perform accurate and contextually relevant semantic analysis. Building a linguistic ontology can be considered a solution to the difficulties of semantic parsers in interpreting language effectively and overcoming weaknesses, such as in research [36]. Linguistic ontologies differ from general domain ontologies in their scope and purpose. Linguistic ontology captures semantic nuances, contextual variations, synonyms, antonyms, and other language-specific attributes essential for accurate language understanding. In contrast, general domain ontologies focus on representing knowledge and concepts within a specific subject area or domain, such as medicine, finance, or geography. The development of this general domain ontology dominates ontology-based Indonesian QA research. Building a linguistic ontology is critical to improving the precision and reliability of semantic parsers, ultimately improving the overall performance of a natural language understanding system and its ability to provide meaningful responses to user queries. Third, specifically for the statistical approach employed in ontology-based Indonesian QA systems, it is crucial to consider the incorporation of more sophisticated and context-aware language models, such as transformer-based models like GPT-4, RoBERTa, or even more recent iterations that may emerge since knowledge cutoff in September 2021. These models capture nuanced semantic information, understand context, and produce coherent text. Upgrading the language model can significantly improve the accuracy of question classification and information extraction. Using pre-trained word embeddings such as FastText and IndoBERT, as in [44], is also a respectable approach. However, there may be potential benefits in exploring newer embeddings that have emerged in the natural language processing field, which can provide a richer semantic representation for tokens such as Word2Vec or ELMo. Moreover, the ontology-based Indonesian QA system covered in this review relies on small datasets. Expanding the labeled dataset could be beneficial, as it will allow the model to generalize better and handle the wide variety of semantic nuances in real-world questions. Additionally, it is essential to ensure that the dataset remains up to date to accommodate changes and developments in the domain ontology. Additionally, continuously updating and maintaining the lexicalization dictionary is critical to ensure the system remains aligned with the evolving semantics and additions to the knowledge resources. B. Template-based SPARQL query construction Most of Indonesia's existing ontology-based QA systems use simple template-based SPARQL query constructs because simple questions can generally be answered with a set of triplet patterns. A query template is a predefined set of queries consisting of SELECT and WHERE clauses with one or two triple pattern slots to fill that can be supplemented with FILTER expression slots. Undoubtedly, the use of template-based in the construction of SPARQL queries is the fastest and easiest way to develop a new QA system, especially for simple questions on a single domain QA. However, based on evidence from research [44][45][46][47][49], the drawback of using template- based queries is that they are prone to semantic gaps, which are caused by the effect of exact string matching so that the resulting form of the query is not following the results of the deduction of the question which ultimately produces the wrong answer. The limited number of existing query templates is also a weakness; not all queries can be handled using existing templates, and it is almost impossible to manually provide all forms of queries for various questions, especially for complex questions in a multi-domain QA system. Plus, template-based queries are not flexible in dealing with changes in ontology, where problems will arise when new relations are added to the ontology. F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 140 Exploring more sophisticated query generation techniques for future advancements in ontology- based Indonesian QA systems is necessary. This exploration may include the implementation of semantic matching algorithms that go beyond exact string matching by utilizing semantic similarity measures or using natural language understanding techniques to align questions more precisely with the user's intended meaning. This algorithm is important because it bridges the semantic gap in template-based queries. Additionally, it is recommended to develop a system that can generate query templates based on user queries dynamically. This approach offers the advantage of accommodating a broader spectrum of query forms, thereby reducing reliance on manually created templates. Ontology evolution techniques can also be applied to systems to adapt to changes in ontology structure, ensuring that new relationships or concepts are integrated into the QA system smoothly. C. Lexical gap and ambiguity. A question can only be answered if every vocabulary that references an entity is identified in an ontology. Describing terms (vocabulary) for an RDF resource can be done through the property value rdfs:label. Several labels are commonly used to model synonyms for words that refer to the same RDF resource, as shown in the study [41][49]. However, knowledge bases usually only contain different terms that can refer to a particular entity. When the vocabulary used in a question differs from that used in the ontology labels, a lexical gap occurs, significantly reducing the percentage of questions that a system can answer. On the other hand, strategies to enrich vocabulary to overcome lexical gaps can raise ambiguity problems because the exact words can have different definitions [56]. Ambiguity hinders correct interpretation of user questions and retrieval of contextually relevant information. Research [46] shows that ambiguity dramatically affects the system's accuracy. Another cause of the lexical gap is vocabulary differences from using another language to label resources, as reported in research [44]. Since every QA system is very dependent on language resources, it is necessary to mention that Indonesian is still classified as a low-resource language [25] and the limitations of NLP resources and tools are also a challenge in developing a QA system. Using resources and tools for other languages can be a temporary solution. It causes language barriers. As reported in [37], using English stemmers (Lucene) causes over-stemmer, thus dramatically affecting accuracy. Several studies also report that already available Indonesian resources, such as POS tagging, still need support with standardization. Although research discussing Indonesian language resources has increased in the last decade, the availability of these resources still needs to be improved, and contributions are still needed to increase their use. Several strategies can be applied to overcome the challenges of lexical gaps, such as synonym expansion, contextual understanding, word embedding, and ontology enrichment. However, addressing lexical gaps in ontology-based QA systems involves a trade-off between enhancing vocabulary coverage and managing semantic ambiguity. Therefore, further investigation is needed to combine several strategies, such as synonym expansion, contextual understanding, and ontology enrichment, to create a hybrid approach that effectively overcomes lexical gaps and ambiguity. Although the described solution offers a way to bridge the gap between user queries and ontology labels, it also presents challenges regarding resource availability (POS tagging, named entity recognition, lexical databases, and others). Therefore, this paper encourages collaboration between the NLP community and related organizations to standardize linguistic resources for low-resource languages such as Indonesian, also includes encouraging the exchange of linguistic resources and data between researchers and organizations in Indonesian language processing. Open data initiatives can help overcome resource limitations, system complexity, and evaluation. D. Multiple-domain ontology. The ontology-based Indonesian QA system reviewed in this paper, except for [39], which uses YAGO [57] as its KB, operates in limited and closed domains. These systems are designed to answer questions for one particular topic and usually use one ontology to cover one particular knowledge domain, such as economics, history, and tourism. The grammar and lexicon built are also limited to the ontology specifications of the domain scope. Therefore, designing an Indonesian QA system based on a multi-domain ontology poses challenges from various linguistic features, domain- 141 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 specific nuances, and the complexity of managing various ontological frameworks. Adapting a system to a different domain will amplify its complexity. Each domain requires a different ontology, terminology, and contextual interpretation. Creating and maintaining these domain-specific resources requires significant time and resources, especially in a language with relatively little existing ontological structure. Despite the complexity, developing a multi-domain ontology is very important to advance the ontology-based Indonesian QA system because it can significantly expand the scope of knowledge. Closed-domain QA systems are limited to a particular subject or domain, limiting their usefulness in answering user questions. By developing a multi-domain ontology, the QA system can respond to various domain topics, varying from economics, history, science, culture, and more. Comprehensive knowledge coverage provides a more flexible and user-friendly experience, where users can ask questions across multiple domains without needing a separate dedicated QA system. Enabling a single QA system to handle questions across multiple domains will increase user satisfaction and convenience. In real-world scenarios, questions often require a multidisciplinary approach or cross- domain knowledge. For example, understanding complex issues such as climate change requires meteorology, environmental science, and economics knowledge. Multi-domain ontologies equip QA systems to provide comprehensive and contextually relevant responses, reflecting complex knowledge interactions in real-life scenarios. Moreover, by dealing with multiple domains, QA systems must understand a broader spectrum of linguistic expressions, including domain-specific terminologies and nuances. This linguistic challenge drives advancement in NLP and understanding, enhancing QA systems and other NLP applications. Developing multiple-domain ontologies also stimulates research and development opportunities. It encourages collaboration among domain experts, linguists, and NLP researchers, driving innovation in ontology construction, semantic interoperability, and knowledge representation. These advancements have broader implications beyond QA systems, benefiting the broader field of artificial intelligence and human-computer interaction. Addressing the challenge of developing multiple-domain ontologies for ontology-based Indonesian QA systems requires a multifaceted approach, especially in the context of limited previous discussions. Firstly, future research should emphasize knowledge sharing and collaboration among researchers, domain experts, and linguists to create a collective pool of domain-specific ontologies. Establishing a centralized repository or platform for sharing ontological resources and experiences can accelerate progress in this area. Secondly, research should emphasize the development of domain-agnostic ontology frameworks that can be adapted efficiently to various domains. It involves designing ontological structures that are inherently flexible and capable of accommodating new domains without requiring extensive manual adjustments. Lastly, research should explore semi-automatic or machine-assisted approaches for ontology adaptation. Leveraging natural language processing and machine learning techniques can facilitate the automatic alignment of ontologies with new domains, making the process more efficient and less resource-intensive. VI. Conclusion The ontology-based Indonesian QA systems area faces a series of complex challenges essential to address for advancing natural language processing and knowledge retrieval in the Indonesian language context. These challenges encompass various linguistic, semantic, and technical aspects, and they are intricately interconnected. Some primary challenges are understanding sentence semantics, handling complex sentence structures, and dealing with sentence variations. Addressing these issues is critical to developing a robust and sophisticated ontology-based Indonesian QA system, as they directly impact the system's ability to provide contextually relevant and accurate answers to user questions. Addressing these challenges requires a holistic approach that combines advanced natural language processing techniques, linguistic ontologies, and sophisticated language models. F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 142 Although convenient, template-based SPARQL query construction has limitations, such as semantic gaps and inflexibility in adapting to ontology changes. Future advances should explore semantic matching algorithms, dynamic query templating, and ontology evolution techniques to improve the precision and adaptability of query construction. Lexical gaps and ambiguities in Indonesian QA systems are vocabulary coverage and semantic understanding challenges. Strategies such as synonym expansion, contextual analysis, word embedding, and ontology enrichment require further investigation to bridge this gap effectively. Developing multiple-domain ontologies is crucial for expanding the knowledge coverage of Indonesian QA systems. While it is a complex effort, it offers significant benefits regarding user convenience and the ability to address cross-domain questions. Collaboration among experts, flexible ontology frameworks, and machine-assisted ontology adaptation are vital strategies to tackle this challenge. Addressing these challenges requires a collaborative effort from researchers, domain experts, and linguists. Overcoming these obstacles will improve the performance of ontology-based Indonesian QA systems and advance the broader field of natural language processing and knowledge retrieval in low-resource languages like Indonesian. Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence the work reported in this paper. Additional information Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. Publisher’s Note: Department of Electrical Engineering and Informatics - Universitas Negeri Malang remains neutral with regard to jurisdictional claims and institutional affiliations. References [1] G. Mai, K. Janowicz, R. Zhu, L. Cai, and N. Lao, “Geographic Question Answering: Challenges, Uniqueness, Classification, and Future Directions,” AGILE: GIScience Series, vol. 2, no. 8, 2021. [2] E. M. Nabil Alkholy, M. Hassan Haggag, and A. Aboutabl, “Question Answering Systems: Analysis and Survey,” International Journal of Computer Science & Engineering Survey, vol. 09, no. 06, 2018. [3] W. Franco et al., “Ontology-based Question Answering Systems over Knowledge Bases: A Survey,” in Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS, 2020, pp. 532–539. [4] I. Mahmoud Ibrahim Alturani and M. Pouzi Bin Hamzah, “An Efficient Semantic Analysis Technique for the Question Answering Systems,” Journal of Engineering and Applied Sciences, vol. 14, no. 22, 2019. [5] A. Abdi, N. Idris, and Z. Ahmad, “QAPD: An ontology-based question answering system in the physics domain,” Soft comput, vol. 22, no. 1, pp. 213–230, 2018. [6] C. Trojahn, R. Vieira, D. Schmidt, A. Pease, and G. Guizzardi, “Foundational ontologies meet ontology matching: A survey,” Semant Web, vol. 13, no. 4, pp. 685–704, 2022. [7] M. B. Canciglieri, A. L. Szejka, O. Canciglieri Junior, and L. Yoshida, “Current issues in multiple domain semantic reconciliation for ontology-driven interoperability in product design and manufacture,” in IFIP Advances in Information and Communication Technology, 2018. [8] G. R. Roldán-Molina, D. Ruano-Ordás, V. Basto-Fernandes, and J. R. Méndez, “An ontology knowledge inspection methodology for quality assessment and continuous improvement,” Data Knowl Eng, vol. 133, 2021. [9] A. F. Khan et al., “When linguistics meets web technologies. Recent advances in modelling linguistic linked data,” Semant Web, vol. 13, no. 6, pp. 987–1050, 2022. [10] D. Diefenbach, A. Both, K. Singh, and P. Maret, “Towards a question answering system over the Semantic Web,” Semant Web, vol. 11, no. 3, pp. 421–439, 2020. [11] T. H. Alwaneen, A. M. Azmi, H. A. Aboalsamh, E. Cambria, and A. Hussain, “Arabic question answering system: a survey,” Artif Intell Rev, vol. 55, no. 1, pp. 207–253, Jan. 2022. [12] A. Arbaaeen and A. Shah, “Ontology-Based Approach to Semantically Enhanced Question Answering for Closed Domain: A Review,” Information (Switzerland), vol. 12, no. 5, 2021. http://journal2.um.ac.id/index.php/keds https://doi.org/10.5194/agile-giss-2-8-2021 https://doi.org/10.5194/agile-giss-2-8-2021 https://doi.org/10.5121/ijcses.2018.9601 https://doi.org/10.5121/ijcses.2018.9601 https://doi.org/10.5220/0009392205320539 https://doi.org/10.5220/0009392205320539 https://doi.org/10.36478/jeasci.2019.8289.8292 https://doi.org/10.36478/jeasci.2019.8289.8292 https://doi.org/10.1007/s00500-016-2328-2 https://doi.org/10.1007/s00500-016-2328-2 https://doi.org/10.3233/SW-210447 https://doi.org/10.3233/SW-210447 https://doi.org/10.1007/978-3-030-01614-2_12 https://doi.org/10.1007/978-3-030-01614-2_12 https://doi.org/10.1007/978-3-030-01614-2_12 https://doi.org/10.1016/j.datak.2021.101889 https://doi.org/10.1016/j.datak.2021.101889 https://doi.org/10.3233/SW-222859 https://doi.org/10.3233/SW-222859 https://doi.org/10.3233/SW-190343 https://doi.org/10.3233/SW-190343 https://doi.org/10.1007/s10462-021-10031-1 https://doi.org/10.1007/s10462-021-10031-1 https://doi.org/10.3390/info12050200 https://doi.org/10.3390/info12050200 143 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 [13] A. Abdi, S. Hasan, M. Arshi, S. M. Shamsuddin, and N. Idris, “A question answering system in hadith using linguistic knowledge,” Comput Speech Lang, vol. 60, 2020. [14] M. Jarrar, “The Arabic ontology – an Arabic wordnet with ontologically clean content,” Appl Ontol, vol. 16, no. 1, pp. 1–26, 2021. [15] G. M. R. I. Rasiq, A. Al Sefat, T. Hossain, Md. I.-E.-H. Munna, J. J. Jisha, and M. M. Hoque, “Question Answering System over Linked Data: A Detailed Survey,” ABC Research Alert, vol. 8, no. 1, 2020. [16] M. A. Calijorne Soares and F. S. Parreiras, “A literature review on question answering techniques, paradigms and systems,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 6. King Saud bin Abdulaziz University, pp. 635–646, Jul. 01, 2020. [17] C. Antoniou and N. Bassiliades, “A survey on semantic question answering systems,” The Knowledge Engineering Review, vol. 37, no. 3. 2022. [18] A. Pereira, A. Trifan, R. P. Lopes, and J. L. Oliveira, “Systematic review of question answering over knowledge bases,” IET Software, vol. 16, no. 1, pp. 1–13, Feb. 2022. [19] A. Albarghothi, F. Khater, and K. Shaalan, “Arabic Question Answering Using Ontology,” Procedia Comput Sci, vol. 117, pp. 183–191, 2017. [20] M. Breja and S. K. Jain, “A survey on non-factoid question answering systems,” International Journal of Computers and Applications, vol. 44, no. 9, pp. 830–837, 2022. [21] M. Mattila and A. Dahanayke, “Systematic Literature Review of Question Answering Systems,” in Lecture Notes in Networks and Systems, 2021. [22] D. Eberhard, G. Simons, and C. Fennig, “Languages of the World,” Ethnologue. 25rd ed. Dallas, Texas: SIL International, 2022. Accessed: Oct. 11, 2022. [23] I. Ghosh, “Ranked: The 100 Most Spoken Languages Worldwide,” 2020. Accessed: Oct. 11, 2022. [24] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: a benchmark dataset and pre-trained language model for Indonesian NLP,” in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 757–770. [25] S. Li, N. Lin, L. Xiao, and S. Jiang, “IndoAbbr: A New Benchmark Dataset for Indonesian Abbreviation Identification,” in 2020 International Conference on Asian Language Processing, IALP 2020, 2020. [26] S. S. Alanazi, N. Elfadil, M. Jarajreh, and S. Algarni, “Question Answering Systems: A Systematic Literature Review,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 3, 2021. [27] E. Dimitrakis, K. Sgontzos, and Y. Tzitzikas, “A survey on question answering systems over linked data and documents,” J Intell Inf Syst, vol. 55, no. 2, pp. 233–259, 2020. [28] F. T. Admojo and E. Winarko, “Sistem Pencarian Informasi Berbasis Ontologi untuk Jalur Pendakian Gunung Menggunakan Query Bahasa Alami dengan Penyajian Peta Interaktif,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 10, no. 1, pp. 23–34, Jan. 2017. [29] A. A. Shah, S. D. Ravana, S. Hamid, and M. A. Ismail, “Accuracy evaluation of methods and techniques in Web- based question answering systems: a survey,” Knowl Inf Syst, vol. 58, no. 3, pp. 611–650, 2019. [30] H. Sulistyanto and A. SN, “A Few Survey of Developments and Challenges Arising on General and Indonesian Question Answering System,” in International Conference on Information Systems for Business Competitiveness (ICISBC 2013), 2013, pp. 71–75. Accessed: Sep. 05, 2023 [31] R. Wongso, Meiliana, and D. Suhartono, “A Literature Review of Question Answering System using Named Entity Recognition,” in Proceedings - 2016 3rd International Conference on Information Technology, Computer, and Electrical Engineering, ICITACEE 2016, 2016, pp. 274–277. [32] S. Fandy, Utomo, N. Suryana, and M. S. Azmi, “Question Answering System : A Review On Question Analysis, Document Processing, And Answer Extraction Techniques,” Journal of Theoretical and Applied Information Technology, vol. 95, no. 14. pp. 3158–3174, 2017. [33] A. Abdiansah, A. Azhari, and A. K. Sari, “Survey on Answer Validation for Indonesian Question Answering System (IQAS),” International Journal of Intelligent Systems and Applications, vol. 10, no. 4, pp. 68–78, Apr. 2018. [34] Y. Puspitarani, “Indonesian Information Extraction : Challenges and Opportunities,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 8, no. 1, pp. 421–429, 2021. [35] A. F. Aji et al., “One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2022, pp. 7226–7249. [36] F. Darari, A. A. Krisnandhi, and R. Manurung, “OWLizr: Knowledge Representation System for Bahasa Indonesia Based on Web Ontology Language Description Logic (OWL DL),” in International Conference on Advanced Computer Science And Information Systems 2010, 2010, pp. 293–298. Accessed: Sep. 05, 2023. [37] S. J. Putra, R. H. Gusmita, K. Hulliyah, and H. T. Sukmana, “A semantic-based question answering system for indonesian translation of Quran,” in Proceedings of the 18th International Conference on Information Integration and Web-Based Applications and Services, in iiWAS ’16. New York, NY, USA: Association for Computing Machinery, 2016, pp. 504–507. [38] V. Atina, E. Sediyono, and R. Rizal, “Information Retrieval System for Indonesian Manuscript using Semantic Web,” Int J Comput Appl, vol. 170, no. 8, 2017. [39] Wahyudi, M. L. Khodra, A. S. Prihatmanto, and C. Machbub, “A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) on YAGO Knowledge Base,” in 2018 International Conference on Information Technology Systems and Innovation, ICITSI 2018 - Proceedings, 2018. [40] F. S. Utomo, N. Suryana, and M. S. Azmi, “New instances classification framework on Quran ontology applied to question answering system,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 17, no. 1, pp. 139–146, Feb. 2019. [41] R. A. Yunmar and I. Wayan Wiprayoga Wisesa, “Design of Ontology-based Question Answering System for Incompleted Sentence Problem,” in IOP Conference Series: Earth and Environmental Science, 2019. https://doi.org/10.1016/j.csl.2019.101023 https://doi.org/10.1016/j.csl.2019.101023 https://doi.org/10.3233/ao-200241 https://doi.org/10.3233/ao-200241 https://doi.org/10.18034/abcra.v8i1.449 https://doi.org/10.18034/abcra.v8i1.449 https://doi.org/10.1016/j.jksuci.2018.08.005 https://doi.org/10.1016/j.jksuci.2018.08.005 https://doi.org/10.1016/j.jksuci.2018.08.005 https://doi.org/10.1017/S0269888921000138 https://doi.org/10.1017/S0269888921000138 https://doi.org/10.1049/sfw2.12028 https://doi.org/10.1049/sfw2.12028 https://doi.org/10.1016/j.procs.2017.10.108 https://doi.org/10.1016/j.procs.2017.10.108 :%20https:/doi.org/10.1080/1206212X.2021.1949117 :%20https:/doi.org/10.1080/1206212X.2021.1949117 https://doi.org/10.1007/978-3-030-68476-1_5 https://doi.org/10.1007/978-3-030-68476-1_5 :%20http:/www.ethnologue.com/ :%20http:/www.ethnologue.com/ https://www.visualcapitalist.com/100-most-spoken-languages/ https://aclanthology.org/2020.coling-main.66 https://aclanthology.org/2020.coling-main.66 https://aclanthology.org/2020.coling-main.66 https://doi.org/10.1109/IALP51396.2020.9310514 https://doi.org/10.1109/IALP51396.2020.9310514 https://doi.org/10.14569/IJACSA.2021.0120359 https://doi.org/10.14569/IJACSA.2021.0120359 https://doi.org/10.1007/s10844-019-00584-7 https://doi.org/10.1007/s10844-019-00584-7 https://doi.org/10.22146/ijccs.11186 https://doi.org/10.22146/ijccs.11186 https://doi.org/10.22146/ijccs.11186 https://doi.org/10.1007/s10115-018-1203-0 https://doi.org/10.1007/s10115-018-1203-0 http://eprints.undip.ac.id/41689/ http://eprints.undip.ac.id/41689/ http://eprints.undip.ac.id/41689/ https://doi.org/10.1109/ICITACEE.2016.7892454 https://doi.org/10.1109/ICITACEE.2016.7892454 https://doi.org/10.1109/ICITACEE.2016.7892454 https://www.researchgate.net/publication/318815274_Question_answering_system_A_review_on_question_analysis_document_processing_and_answer_extraction_techniques https://www.researchgate.net/publication/318815274_Question_answering_system_A_review_on_question_analysis_document_processing_and_answer_extraction_techniques https://www.researchgate.net/publication/318815274_Question_answering_system_A_review_on_question_analysis_document_processing_and_answer_extraction_techniques https://doi.org/10.5815/ijisa.2018.04.08 https://doi.org/10.5815/ijisa.2018.04.08 https://doi.org/10.35957/jatisi.v8i1.710 https://doi.org/10.35957/jatisi.v8i1.710 https://doi.org/10.18653/v1/2022.acl-long.500 https://doi.org/10.18653/v1/2022.acl-long.500 https://doi.org/10.18653/v1/2022.acl-long.500 %5b1%5d%09https:/ir.cs.ui.ac.id/publication/2010/owlizr.pdf %5b1%5d%09https:/ir.cs.ui.ac.id/publication/2010/owlizr.pdf %5b1%5d%09https:/ir.cs.ui.ac.id/publication/2010/owlizr.pdf https://doi.org/10.1145/3011141.3011219 https://doi.org/10.1145/3011141.3011219 https://doi.org/10.1145/3011141.3011219 https://doi.org/10.1145/3011141.3011219 https://doi.org/10.5120/ijca2017914930 https://doi.org/10.5120/ijca2017914930 https://doi.org/10.1109/ICITSI.2018.8696046 https://doi.org/10.1109/ICITSI.2018.8696046 https://doi.org/10.1109/ICITSI.2018.8696046 https://doi.org/10.12928/TELKOMNIKA.v17i1.9794 https://doi.org/10.12928/TELKOMNIKA.v17i1.9794 https://doi.org/10.12928/TELKOMNIKA.v17i1.9794 https://doi.org/10.1088/1755-1315/258/1/012032 https://doi.org/10.1088/1755-1315/258/1/012032 F. T. Admojo et al. / Knowledge Engineering and Data Science 2023, 6 (2): 129–144 144 [42] A. Amalia, P. Y. C. Sipahutar, E. Elviwani, and F. Purnamasari, “Chatbot Implementation with Semantic Technology for Drugs Information Searching System,” in Journal of Physics: Conference Series, 2020. [43] F. Ishlakhuddin and A. SN, “Ontology-based Chatbot to Support Monitoring of Server Performance and Security By Rule-base,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 2, p. 131, Apr. 2021. [44] M. I. Rahajeng and A. Purwarianti, “Indonesian Question Answering System for Factoid Questions using Face Beauty Products Knowledge Graph,” Jurnal Linguistik Komputasional, vol. 4, no. 2, pp. 59–63, 2021. [45] E. S. B. Perangin-Angin, Z. K. A. Baizal, and D. Richasdy, “Question Answering using Ontology for Sumedang Larang History with Support Vector Machine Based on Telegram Bot,” Jurnal Media Informatika Budidarma, vol. 6, no. 4, pp. 2438–2445, Oct. 2022. [46] A. N. Hasanah, A. Baizal, and R. Dharayani, “Question Answering For Sumedang Larang Kingdom Using The Multilayer Perceptron Algorithm,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 7, no. 4, 2022. [47] R. Jasmi, Z. K. A. Baizal, and D. Richasdy, “Question Answering Chatbot using Ontology for History of the Sumedang Larang Kingdom using Cosine Similarity as Similarity Measure,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 6, no. 4, 2022. [48] R. F. Saldhi, Z. K. A. Baizal, and R. Dharayani, “Question Answering System at the Kingdom of Sumedang Larang with Naïve Bayes Method,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 322–329, 2022. [49] S. A. Anggrayni, Z. K. A. Baizal, and D. Richasdy, “Question Answering System Using Semantic Reasoning on Ontology for The History of The Sumedang Larang Kingdom,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 2, pp. 545–553, 2022. [50] R. Mahendra, S. D. Larasati, and R. Manurung, “Extending an Indonesian semantic analysis -based question answering system with linguistic and world knowledge axioms,” in Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, PACLIC 22, 2008. [51] K. Höffner, S. Walter, E. Marx, R. Usbeck, J. Lehmann, and A. C. Ngonga Ngomo, “Survey on challenges of Question Answering in the Semantic Web,” Semant Web, vol. 8, no. 6, pp. 895–920, 2017. [52] A. Farea, Z. Yang, K. Duong, N. Perera, and F. Emmert-Streib, “Evaluation of Question Answering Systems: Complexity of judging a natural language.” 2022. [53] A. M. Moeliono, H. Lapoliwa, H. Alwi, S. S. Tjatur, W. Sasangka, and S. Sugiyono, Tata Bahasa Baku Bahasa Indonesia, 4th ed. Jakarta: Kementerian Pendidikan dan Kebudayaan Republik Indonesia, 2017. [54] Y. Lan, G. He, J. Jiang, J. Jiang, W. X. Zhao, and J. R. Wen, “A Survey on Complex Know ledge Base Question Answering: Methods, Challenges and Solutions,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021, pp. 4483–4491. [55] C. Zhang, Y. Lai, Y. Feng, and D. Zhao, “A review of deep learning in question answering over knowledge bases,” AI Open, vol. 2. pp. 205–215, 2021. [56] A. Dhandapani and V. Vadivel, “Question Answering System over Semantic Web,” IEEE Access, vol. 9, pp. 46900–46910, 2021. [57] T. Rebele, F. Suchanek, J. Hoffart, J. Biega, E. Kuzey, and G. Weikum, “YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016. https://doi.org/10.1088/1742-6596/1566/1/012077 https://doi.org/10.1088/1742-6596/1566/1/012077 https://doi.org/10.22146/ijccs.58588 https://doi.org/10.22146/ijccs.58588 https://doi.org/10.26418/jlk.v4i2.62 https://doi.org/10.26418/jlk.v4i2.62 https://doi.org/10.30865/mib.v6i4.4574 https://doi.org/10.30865/mib.v6i4.4574 https://doi.org/10.30865/mib.v6i4.4574 https://doi.org/10.29100/jipi.v7i4.3206 https://doi.org/10.29100/jipi.v7i4.3206 https://doi.org/10.30865/mib.v6i4.4530 https://doi.org/10.30865/mib.v6i4.4530 https://doi.org/10.30865/mib.v6i4.4530 https://doi.org/10.47065/josyc.v3i4.2079 https://doi.org/10.47065/josyc.v3i4.2079 https://doi.org/10.47065/bits.v4i2.1910 https://doi.org/10.47065/bits.v4i2.1910 https://doi.org/10.47065/bits.v4i2.1910 https://www.researchgate.net/publication/321016806_Extending_an_Indonesian_Semantic_Analysis-based_Question_Answering_System_with_Linguistic_and_World_Knowledge_Axioms https://www.researchgate.net/publication/321016806_Extending_an_Indonesian_Semantic_Analysis-based_Question_Answering_System_with_Linguistic_and_World_Knowledge_Axioms https://www.researchgate.net/publication/321016806_Extending_an_Indonesian_Semantic_Analysis-based_Question_Answering_System_with_Linguistic_and_World_Knowledge_Axioms https://doi.org/10.3233/SW-160247. https://doi.org/10.3233/SW-160247. https://doi.org/10.48550/arXiv.2209.12617 https://doi.org/10.48550/arXiv.2209.12617 https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwitwL2HloGCAxUB9DgGHbhRAVQQFnoECBgQAQ&url=https%3A%2F%2Frepositori.kemdikbud.go.id%2F16351%2F1%2FTata%2520Bahasa%2520Baku%2520Bahasa%2520Indonesia%2520edisi%2520keempat.pdf&usg=AOvVaw0bUL_r9654vl10Co3_1_45&opi=89978449 https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwitwL2HloGCAxUB9DgGHbhRAVQQFnoECBgQAQ&url=https%3A%2F%2Frepositori.kemdikbud.go.id%2F16351%2F1%2FTata%2520Bahasa%2520Baku%2520Bahasa%2520Indonesia%2520edisi%2520keempat.pdf&usg=AOvVaw0bUL_r9654vl10Co3_1_45&opi=89978449 https://doi.org/10.24963/ijcai.2021/611 https://doi.org/10.24963/ijcai.2021/611 https://doi.org/10.24963/ijcai.2021/611 https://doi.org/10.1016/j.aiopen.2021.12.001 https://doi.org/10.1016/j.aiopen.2021.12.001 https://doi.org/10.1109/ACCESS.2021.3067942 https://doi.org/10.1109/ACCESS.2021.3067942 https://doi.org/10.1007/978-3-319-46547-0_19 https://doi.org/10.1007/978-3-319-46547-0_19 https://doi.org/10.1007/978-3-319-46547-0_19