International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol 16 No 16 (2022) Paper—A Systematic Literature Review of Keyphrases Extraction Approaches A Systematic Literature Review of Keyphrases Extraction Approaches https://doi.org/10.3991/ijim.v16i16.33081 Lahbib Ajallouda1(), Fatima Zahra Fagroud2, Ahmed Zellou1, El habib Benlahmar2 1SPM-ENSIAS, Mohammed V University, Rabat, Morocco 2LTIM–FSBM, FSBM Hassan II University, Casablanca, Morocco lahbib_ajallouda@um5.ac.ma Abstract—The keyphrases of a document are the textual units that character- ize its content such as the topics it addresses, its ideas, their field, etc. Thousands of books, articles and web pages are published every day. Manually extract- ing keyphrases is a tedious task and takes a lot of time. Automatic keyphrases extraction is an area of text mining that aims to identify the most useful and important phrases that give meaning to the content of a document. Keyphrases can be used in many Natural Language Processing (NLP) applications, such as text summarization, text clustering and text classification. This article provides a Systematic Literature Review (SLR) to investigate, analyze, and discuss existing relevant contributions and efforts that use new concepts and tools to improve key- phrase extraction. We have studied the supervised and unsupervised approaches to extracting keyphrases published in the period 2015–2022. We have also iden- tified the steps most commonly used by the different approaches. Additionally, we looked at the criteria that should be evaluated to improve the accuracy of key- phrases extraction. Each selected approach was evaluated for its ability to extract keyphrases. Our findings highlight the importance of keyphrase extraction, and provide researchers and practitioners with information about proposed solutions and their limitations, which contributes to extract keyphrases in a powerful and meaningful way effective. Keywords—keyphrases extraction, systematic literature review, text mining, natural language processing 1 Introduction The considerable volume of documents published each year creates a problem to analyze or summarize them. For example, according to [1], nearly 17,000 articles were published in the first quarter of 2020 concerning only COVID-19. To improve the use of this textual data, Keyphrase provides information to understand the content of a text. There are many methods that have provided practical solutions to improve automatic keyphrases extraction, these methods are classified according to [2] into two sets, the first includes unsupervised methods and the second includes supervised methods. These methods have been exploiteded in many NLP applications such as information retrieval, text summarization, text classification, and text clustering. But its performance was not iJIM ‒ Vol. 16, No. 16, 2022 31 https://doi.org/10.3991/ijim.v16i16.33081 mailto:lahbib_ajallouda@um5.ac.ma Paper—A Systematic Literature Review of Keyphrases Extraction Approaches satisfactory. Some reviews [2], [3] shed light on the challenges faced by these methods, and provide solutions to improve the performance of these methods, but these reviews only included the methods that were published before 2019, while there are many mod- ern methods that appeared and were not included in the reviews, especially the methods that predicts key phrases not mentioned in the document. Therefore, this paper aims to provide a comprehensive review of the techniques for extracting and predicting keyphrases in the document. The review aims to analyze and discuss the literature on proposed solutions that has been published in recent years. In addition to supervised and unsupervised methods for key phrase extraction, our arti- cle reviews methods that predict key phrases not mentioned in the document. Also, we will aim to study the identification of candidate keyphrases and discuss criteria that should be evaluated to improve the accuracy of extraction and prediction of key phrases. For each approach, we examine evaluation metrics, datasets used, extraction accuracy, and a discussion of evaluation findings. Finally, we provide solutions to improve the performance of extraction and generation of keyphrases. We will also sug- gest promising research directions. The rest of this paper is structured as follows. Section 2 presents the background and preliminaries. Our research objectives are detailed in Section 3. Section 4 represents the methodology used to carry out this systematic review. Section 5 reports and analyzes the results, while Section 6 discusses and critiques the findings, describes the directions of the research and states the limitations of this review. We conclude our article with Section 7, which also contains future directions for research. 2 Background and preliminaries Automatic keyphrase extraction (AKE) is a domain of text mining that aims to iden- tify the most useful and important terms that give meaning to document content [4]. This section introduces the steps in the AKE process, and the domains that could benefit from using keyphrase extraction techniques. 2.1 Applications Automatic keyphrase extraction is used in many domains dealing with textual data, such as text classification [5], document clustering [6], document summarization [7], and search engines [8]. Although some studies have attempted to limit these domains like [9], which limited their use to five domains, due to importance of the information provided by the keyphrases, the AKE can also be exploited in many other domains such as recommender systems [10], web mining [11], bibliometric analysis [12], and sentiment analysis [13]. 2.2 Keyphrases extraction process The keyphrase extraction process goes through a set of steps. Merrouni et al. in [3] defined it, in five main steps as shown in Figure 1, where the text goes through the 32 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches preprocessing step, which aims to remove unnecessary textual units. In order to elim- inate the noise in the basic text. Many techniques are used, such as tokenization, stop word removal, stemming, and normalization. According to, [14] and [15], candidate keyphrases are terms that do not contain punctuation or stop words and have morphosyntactic structures “adjective* noun+”, for example, (“Big data”, “Computer engineering”, etc.). Many techniques used to select candidate keyphrases, such as Part-Of-Speech, N-grams [16], and Noun-Phrase- Chunks [17]. Fig. 1. Keyphrases extraction process In the third step, each method selects the features of the candidate phrase on which it will rely to determine the keyphrases. According to [18], these features can be clas- sified into statistical, positional, linguistic, and contextual features. The keyphrases are extracted, either via a supervised or unsupervised approach. Supervised approaches mainly teach how to classify candidate keyphrases into “keyphrases” or “non-key- phrases”. Unsupervised approaches view this task as a ranking problem. The set of candidate keyphrases is ranked according to a weighting score. The first n candidates are considered keyphrases. The evaluation process is the last step which aims to know the performance of the approach used to extract the keyphrases. This evaluation is based on dataset available in the literature (scientific articles, and news) and can be carried out manually or auto- matically, via several metrics, such as precision, recall, F-score, Mean Reciprocal Rank (MRR), and Mean Average Precision (MAP). 3 Research objectives In recent years, a number of technologies have appeared to improve the automatic processing of text data. This development has greatly improved the performance of the techniques used to extract keyphrases. This highlights the importance of a system- atic literature review that provides a comprehensive overview of the recent techniques used to extract keyphrases. Through this SLR, we identify and evaluate the conclusions of the AKE approaches published between 2015 and 2022. The research objectives include studying candidate keyphrase selection techniques, while defining criteria and requirements that must be evaluated to improve the accuracy of keyphrase extraction. Additionally, for each selected article, we will review and discuss evaluation metrics iJIM ‒ Vol. 16, No. 16, 2022 33 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches and results as well as the features and datasets used. Therefore, our findings will pro- vide researchers, and practitioners with information for future investigations for auto- matic keyphrase extraction. 3.1 Research methodology To carry out our SLR, which aim to accomplish a specific sequence of detailed steps to gather as much research as possible, several works concerned with the methodology of carrying out systematic reviews of the literature are proposed [19] and [20]. Our study follows the guidelines provided in [21]. We also read some published SLR such as [22] and [23] to get a general idea of how to create SLR. Our SLR has three main phases. The planning phase which includes the definition of the desired objectives and the predetermination of the research strategy followed. The conduction phase includes the selection of primary studies, the assessment of their quality, as well as the extraction and synthesis of applicable information. The last phase is the results which include an effective interpretation of the results obtained, according to the objectives of the review. Research Protocol. In this study, we applied a scientific research protocol, compris- ing several steps. Figure 2 presents the steps followed to perform this protocol. Fig. 2. Phases of the research protocol The research questions. We identified a set of research questions (see Table 1) to achieve the main objective of our study, which is to obtain a state of the art on keyphrase extraction techniques, by examining the articles published during the period 2015–2022. Table 1. The research questions Research Questions Motivation and Purpose RQ1: What techniques can be exploited to identify candidate keyphrases? Identify the techniques used by keyphrase extraction approaches, to eliminate unnecessary phrases. RQ2: What techniques can be used to extract keyphrases? Highlighting the most commonly used algorithms, by keyphrases extraction systems. RQ3: How to estimate the precision of the proposed approaches? Identify the techniques and datasets used to validate the solutions RQ4: What are the most realistic and scalable AKE software? In order to introduce researchers to this kind of software and to motivate them to develop it or to implement others RQ5: What obstacles must be overcome to improve the accuracy of keyphrases extraction? Specify requirements that remarkably affect the efficiency and performance of keyphrase mining systems 34 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches Electronic Data Sources. In this study, we used a strategy, based on multiple electronic data sources (EDS), to collect related work. We conducted an online search by five electronic data sources (see Table 2). These EDS include all the journals and conference proceedings of high-quality to automatic keyphrases extraction approaches. We also applied a snowball search strategy by a bibliographic analysis of the selected articles to find more related articles. Table 2. Electronic data sources adopted in the study Num EDS Name Address EDS1 ACM Digital Library https://dl.acm.org EDS2 DBLP https://dblp.org EDS3 IEEE Xplore https://ieeexplore.ieee.org EDS4 ScienceDirect https://www.sciencedirect.com EDS5 Google Scholar https://scholar.google.com Search keywords. We defined the keywords for the research, using specific terms, in order to collect as many relevant articles as possible in the study. The set of keywords that we used to implement an SLR are “keyphrase extraction”, “Keyphrase generation”, and “keyword extraction”. Next, we designed the search strings for each data source to check the title, summary, and keywords, except for Google scholar which only allows titles search. Exclusion and inclusion criteria. This study focuses on articles published during the period 2015–2022. We first analyzed the studies according to titles, years of publication, keywords, and abstracts. To select or exclude any article, we defined in Table 3 the exclusion and inclusion criteria. Table 3. Inclusion and exclusion criteria Inclusion Criteria Exclusion Criteria IC1 Articles related to research questions (Q1–Q5) EC1 Articles not dealing with keyphrases extraction IC2 Journal or conference articles EC2 Duplicate papers in EDS IC3 Articles written in English only EC3 Working papers IC4 Articles published between 2015 and 2022 EC4 PhD dissertations, tutorials, editorials, magazines. Quality assessment. During this phase, each paper in the final group went through an evaluation process to measure its quality. For this, we used an evaluation checklist containing six Qualities (see Table 4), we assigned the highest weight to Qualities Q3 and Q5, which respectively deal with the architecture of the proposed solution and comparing the results of the article along with other articles. The qualities, study objectives Q1, related work Q2, evaluation of results Q4, and statement of results Q6 were of low importance. iJIM ‒ Vol. 16, No. 16, 2022 35 https://dl.acm.org https://dblp.org https://ieeexplore.ieee.org https://www.sciencedirect.com https://scholar.google.com Paper—A Systematic Literature Review of Keyphrases Extraction Approaches Table 4. Quality assessment checklist Num Qualities Vote Weight Answer Score Q1 Are the objectives of the study clear in the article? Yes 1 1Partly 0.5 No 0 Q2 Did the study examine related work? Yes 1 1Partly 0.5 No 0 Q3 Did the study clearly identify and discuss the proposed solution? Propose a new solution to extract keyphrases and describes its architecture 1 2The study discusses the proposed solution 0.5 The proposed solution is not well defined or discussed 0 Q4 Was the study evaluated empirically? Implement the proposed solution and use it in real application 1 1The study provided only the implementation 0.5 The study did not provide any implementation or results 0 Q5 Did the study compare the results of the proposed solution with other studies? Yes 1 2Partly 0.5 No 0 Q6 Did the study present a clear statement of findings? Yes 1 1Partly 0.5 No 0 For each article, its quality score is calculated using the formula (1), by considering the score for each question Si as well as its weight Wi QS S Wi i i = × × = ∑ ( ) 8 100 1 6 (1) 4 Results This section is devoted to summarizing the data extraction results obtained by apply- ing the research protocol detailed in section 3, with the aim of analyzing the results of each research question. In order to provide a comprehensive review of automatic keyphrases extraction. 4.1 Overview of research articles The first step allowed us to collect 607 articles (see Figure 3). Next, the article titles were checked for duplicates. This process allowed us to remove 187 articles and only 36 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches keep 420 articles. These articles were reviewed according to the exclusion and inclu- sion criteria described above. Where the number of articles decreased to 159 research articles. After looking at it, we found that only 61 of the 159 articles were relevant to extracting the keyphrase. These articles were supplemented with five more articles after reviewing the reference lists in related articles. In the last step, the six quality assessment criteria in Table 4 were taken into consideration to ensure that the included articles would make a valuable contribution to our SLR. Articles whose score was less than 62% (average of the scores) are eliminated. in the end, we kept 60 articles. Fig. 3. Overview of final papers selection process iJIM ‒ Vol. 16, No. 16, 2022 37 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches 4.2 Classification of selected papers To classify the selected articles, we have distributed them by EDS, and year of pub- lication. We distributed according to the year of publication (2015–2022), the 159 arti- cles that we obtained after applying the exclusion and inclusion criteria, view Figure 4. The first thing to note is the increase in the number of studies from 2018 to 2019. Indi- cating the growing interest in developing keyphrase extraction methods. It should also be noted that the results for 2022 are not final. We have also divided the articles before and after the quality assessment phase. Fig. 4. Distribution of scientific articles according to the year of publication Our statistics, on data sources, show that Google Scholar and DBLP contain the highest number of relevant articles, see Figure 5, of the 60 articles selected, 55% were published in journals and 45% were presented at conferences. Fig. 5. Distribution of scientific articles according to data sources 38 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches 4.3 Review and discuss results In this part, we will discuss the results of this systematic review in order to shed light on the research questions that were raised in the fifth section. RQ1: “What techniques can be exploited to identify candidate keyphrases?” The choice of keyphrases is always made from a group of candidate keyphrases, hence the importance of knowing the techniques of extracting candidate keyphrases from a text. To address RQ1, we review the techniques that improve the extraction of candi- date keyphrases adopted in the studies. Most of the articles reviewed noted that keyphrases are noun phrases that consist of one or more words and do not include stop words. Therefore, the process of extracting candidate keyphrases must take this into account. The noun phrase appears in many different patterns. In [24], the authors found 56 of these models from a training dataset. The important models are shown in Table 5. In order to define noun phrases, the major- ity of articles use the Part-Of-Speech (POS) technique. POS converts a phrase into a list of tuples (word, tag). It assigns parts of speech to each word, such as verb, noun, and adjective. Table 5. The important model for POS Pattern POS pattern Description < DT>? < JJ > * < NN. > + Begins with an optional determinant DT, followed by zero or more JJ adjectives. followed by one or more NN names. < JJ>? < NN > + Begins with multiple optional adjectives JJ, followed by one or more NN nouns. (< JJ> | < NN >)* < IN >?(< JJ.>| < NN.>)* < NN.> Begins with an adjective or noun followed by an optional subordinating conjunction IN followed by several adjectives or nouns and ends with noun NN. < NN > + < JJ >? Begins with one or more nouns NN, followed by zero or more optional adjective JJ. < PRP >? < JJ> * < NN > + Begins with an optional personal pronouns PRP, followed by zero or more JJ adjectives. followed by one or more NN names. In order to reduce the number of candidate key phrases, a set of techniques has been exploited. The authors [25] suggested eliminating n-grams that do not have a minimum incidence. The remaining phrases are then classified according to TFIDF to keep the most important. The authors of [26] use the concordance called phrase- ness, which measures the probability that a sequence of words can be considered as a phrase. Authors [27] propose a neural model of three layers: The embedding layer, the token-level BiLSTM layer and the CRF tagging layer to extract a set of candidates. The authors of [28] did a deep syntactic analysis of the text using the feeling parser and obtained a syntax tree of the text to achieve higher coverage of the text document. The authors of [29] considered the past participle of the verb (VBN) as an adjective and the gerund of the verb (VBG) as a noun. The noun phrase form has been modified to take into account the VBN and VBG tags to extract the candidate phrase. There are also studies suggesting a complete process for extracting candidate key phrases [15]. iJIM ‒ Vol. 16, No. 16, 2022 39 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches RQ2 “What techniques can be used to extract keyphrases?” After we have reviewed methods for identifying candidate keyphrases, we now focus on reviewing the techniques used in approaches to extracting keyphrases from doc- uments. In recent years several approaches have been published. These meth- ods employ many techniques. The methods we studied can be classified into four categories, graph-based models, statistical models, embedding models, and deep learning models. Statistical approaches. The statistical technique is one of the oldest techniques used to extract keyphrases. However, we find that some new methods have been adopted in the AKE process. Among these approaches we find, Giamblanco et al. propose in [30] Key-LUG, an unsupervised approach to extract keyphrases using Newton’s law of gravitation. Key-LUG uses a new weighting method that combines both the character length of a word and the frequency of a word in a document. Rabby et al. in [31] provide a tree-based automatic keyphrase extraction technique that uses nominal statistical knowledge. Campos et al. in [32] offer YAKE, an unsupervised approach that uses the statistical text function to extract keyphrases from text. The advantage of YAKE is that it does not use a dataset or dictionaries and it handles vari- ous document sizes, and Rabby et al. in [9] proposes TeKET, is a domain-independent technique that uses limited statistical knowledge and does not require any data. To extract keyphrases, it uses KePhEx (Keyphrase Extraction Tree), a new variant of a binary tree. KP-Rank [33] is an AKE approach based on LSA (latent semantic analy- sis) and clustering techniques and an algorithm based on phrase, paragraph and sec- tion frequencies for ranking candidate phrases. Merrouni et al. propose in [34], an unsupervised method that combines linguistic, statistical, semantic, and structural features of a text to identify keyphrases especially in long texts. The method relied on defining the candidate phrases on the parse tree, filtering and part of speech tag approach, which helped to control the computational complexity. Badrul et al. pro- pose in [35] a keyphrase concentrated area (KCA) as a new feature to extract the keyphrase from applying some statistical operations. The proposed method is multi- lingual and not related to a specific field. Graph-Based approaches. Representing text in graph form is among the most attrac- tive techniques for researchers, as it has been used in a large number of keyphrase extraction methods (see Figure 6). Most of the traditional methods adopt the relation of co-occurrences between the phrases of the document. Therefore, modern methods have attempted to add the phrase position, syntactic, and semantic relation between phrases in order to improve the extraction of keyphrases. Wen et al. 2016 propose in [36] the use of similarity and co-occurrence between phrases as a new edge weight. For that they used Word2Vec to represent the candidate phrases, and the distance of cos to calculate the similarity between the connected phrases. Florescu et al. consider in [37] that exploiting the phrase locations in the document when calculating the weight can improve the extraction of keyphrases. For this, they proposed a method called Position- Rank, which exploits all the positions of the occurrences of the phrase when calculat- ing the score. Figueroa et al. propose in [38] RankUp which applies backpropagation to improve keyphrase extraction algorithms based on graphs. Chen et al. [39] group the co-occurrence relation and the semantic relation, to build a multi-relational graph. Perez et al. propose in [40] an approach that combines lexico-syntactic models and graph-based topic modeling. 40 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches Fig. 6. Distribution of articles according to the technique used Boudin in [41] represents the document as a complete directed multiparty graph. This representation allows the ranking algorithm to fully exploit the interrelationship between topics and candidates and allows the inclusion of keyphrase selection preferences in the subject. Sun et al. propose [42] DivGraphPointer, an approach that combines the recent approaches based on neural networks and advantages of ranking methods based on graphs. Li et al. [43] suggest adding the idea of topic-based clustering to a graph-based ranking, to embed semantic information. Prasad et al. propose in [44] the Glocal tech- nique (global-local portmanteau) a graph convolution model to incorporate the global importance of the node in the local convolution process for supervised learning on graphs. Tosi et al. propose in [45] C-Rank approach, explores concept links to improve keyphrase extraction. C-Rank constructs a co-occurrence graph with the concepts anno- tated as vertices. Then, it weighs the vertices using their centrality in the graph. Dong et al. [46] use several features such as, the total number of sentences in which the target word appears in the sentence, the sum of the inverses of the word’s position in the document, and the LDA model to extract the topic information. These features are used when calculating scores by the PageRank algorithm. Yeom et al. propose in [47] a model that exploits modified C-Value to overcome biased co-occurrence frequency and loss of position information. The proposed model uses a corpus of documents as input. The modified C-Value method is applied to recalculate the scores of the keyphrase can- didates. Chen et al. propose in [48] an approach based on a three-way decision graph. Using TWDT (three-way decision theory), candidate keyphrases are divided into posi- tive domain, border domain, and negative domain according to graph-based attributes. Luo et al. propose in [49] a model to classifie candidate phrases according to a struc- tural score and a semantic score. The structural score is calculated using a graphic rank- ing algorithm. The semantic score is calculated by the similarity between the candidate and all phrases. Yeon et al. propose in [50] an HSN (Hierarchical semantic network) to extract keyphrases using centrality metrics. The identification of hierarchical relation- ships between keyphrases is motivated by [9]. TOP-Rank [51] is a technique that inte- grates both topical and positional information. TOP-Rank, based on Positionrank [37] iJIM ‒ Vol. 16, No. 16, 2022 41 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches and applies clustering by topic to similar phrases using cosine similitude and TF-IDF. The highest ranked phrase from each topic is considered the keyphrase. Yang et al. propose in [52] the use of a graph convolutional network (GCN) on document graph to capture core features of text. With the aim of ensuring consistency of the generated keyphrases with the document. Venktesh et al. propose in [53] a method based on repre- senting the text with a graph, where the nodes contain the candidate phrases, while the edge weights represent the semantic association between these nodes. This is done by using embedding techniques to represent the candidate phrases. The proposed method uses a ranking algorithm to select keyphrases. Embedding approaches. In recent years, various embedding techniques have been proposed [54], Table 6 shows them. This development has encouraged researchers to develop keyphrase extraction methods based on these techniques. Zeng et al. propose in [55] an algorithm that uses the word embedding technique. It is semi-supervised, integrates word frequency, the effects of word co-occurrence, and the semantic rela- tionship between words. The words in the document are grouped according to the vec- tor distance between the words. Bennani et al. [56] consider the candidate keyphrases most similar to the vector representing the document as keyphrases. For this, they pro- pose EmbedRank, an approach based on embedding phrases and documents for the extraction of keyphrases, as opposed to embedding standard individual words. Papa- giannopoulou et al. propose in [57] an unsupervised method of extracting keyphrases, based on the calculation of an average vector of the words of the title and the abstract, of a document called the reference vector. The candidate keyphrases are classified according to their cosine similarity with the reference vector. Mahata et al. propose in [58] Key2vec, an approach using the title and the summary, as excerpts from the topic. Each phrase in the topic snippet is represented by a phrase embedding model. The final theme vector is obtained by adding the vectors of the theme extract. Key2vec calculates the cosine distance between the theme vector and each candidate. The ranking of keyphrases extracted from them using PageRank weighted by theme similarity. Toleu et al. propose in [59] KeyVector, an unsupervised approach based on the calculation of three classification scores: global semantic score, calcu- lation of the semantic relationship between the document and the candidate phrases, and weighted Topics, calculated by the semantic relationship between the topic and the documents and the topic’s internal score, is the ranking of keyphrases within each topic. Each candidate keyphrase is ranked according to its values for the three scores. Table 6. Different models of word and phrase embedding Model Vector Dimension Word2vec [60] 200 Doc2Vec [61] 300 Glove [62] 300 Sent2vec [63] 700 Infertsent [64] 4096 Elmo [65] 1024 BERT [66] 700 USE [67] 512 42 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches Fan et al. propose in [68] incorporate local context of the word graph, topical infor- mation expressed in the document, and co-occurrence between words, which are important for keyphrase extraction. A new PageRank-based ranking model is designed to extract keyphrases by taking advantage of these features. Sun et al. [69] use SIF a phrase embedding model [70] to extract the relationship between phrase embeddings and the topic of the document. Next, they combined the ELMo autoregressive [65] with SIF to calculate phrase embeddings and document embedding. Cosine similarity is used to calculate the distance between candidate phrases and the topic. Rafiei et al. propose in [29] GLEAKE (Global and Local Embedding Automatic Keyphrase Extraction), an unsupervised method for AKE using a combination of local indices and semantic infor- mation from candidate phrases. GLEAKE is based on a local word embedding model to assign a syntax vector to each candidate keyphrase and the document. Ajallouda et al. confirme in [71], that calculating the similarity between candidate phrases and a docu- ment is not performed in long documents, so they suggested dividing the document into parts and then calculating the average similarity of the candidate phrases with the parts of the document that are probable to contain keyphrases. Wang et al. propose in [72], to rely on Bert embedding technic to extract keyphrases from the document, by use the whitening operation and reduce dimensionality, well as word frequency information. Deep Learning approaches. Deep learning (DL) approaches for keyphrase extraction typically apply the process AKE in an encoder-decoder framework, which first encodes documents from input by vector representation, then generate keyphrases with decod- ers. In recent years several DL approaches have been proposed. Jonathan et al. propose in [73] an approach incorporate the DBN (Deep Belief Networks) as a classifier, uses factual sentiment as a new feature of the keyphrase. Helmy et al. [74] consider that recent AKE deep learning approaches are not applicable to documents in Arabic. For this, they propose a DL model developed on the basis of the LSTM and uses AraVec for word embedding, to extract keyphrases from Arabic documents. Alzaidy et al. [75] discusse keyphrase extraction as a sequence labeling problem. For this, they propose a model AKE, which combines CRF and Bi-LSTM, to extract keyphrases from scientific documents. This model inserts a Bi-LSTM layer between the output and input layers in order to exploit dependencies in the text. Patel et al. [76] Build a complex labeling model use the Bi-LSTM-CRF network, which incorporates long distance information about an input sequence as well about the output sequence. Sahrawat et al. [77] formu- late AKE as a sequence tagging using a BiLSTM-CRF, where phrases from the input text are represented at the using deep embedding. They propose to use deep contex- tual integration models (BERT, SciBERT, and ELMo) instead of the use of fixed word embedding models (word2vec, Glove and FastText). Xiong et al. propose in [78], Beyond Language Understanding Keyphrase Extraction (BLING-KPE), an approach addresses the challenges of AKE in documents from vary- ing domains and content qualities. BLING-KPE uses a convolutional transformer architecture to model language properties in web documents. The BLING-KPE pro- cess has two main components, the embedding of hybrid words where each word is represented by its ELMo embedding, position integration and visual features. The convolutional transformer to model n-grams and their interactions. BLING-KPE first composes the hybrid word embeddings into n-gram embeddings using CNN. The final score of an n-gram is calculated by an anticipation layer on the transformer. Zhu et al. propose in [79] a neural network-based approach, which uses bidirectional long-short iJIM ‒ Vol. 16, No. 16, 2022 43 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches memory (BLSTM) and conditional random field (CRF), for the extraction of scien- tific keyphrases. Word representation is done by concatenation of word embedding, POS embedding, and dependency embedding. Then the Bi-LSTM layer takes the word representation as input and generates more complex functionality for the input phrase. Finally, CRF is added to predict the sequence of labels for the phrase. Zhou et al. propose in [80] an approach that uses the memory array [81] to capture the long- range contextual information hidden in the textual data. they use CRF model to capture dependencies between adjacent words in a sequence of text and determine if a candi- date phrase is a keyphrase. Huanqin et al. propose in [82] to exploit the keyphrases mentioned in the document, in order to generate keyphrases not mentioned in the doc- ument by using the mask-predict method. RQ 3 “How to estimate the precision of the proposed approaches?” After ana- lyzing the proposed methods for extracting keyphrases, we review the proposed mea- sures in order to evaluate these methods, well as the datasets used. In order to know the precision of keyphrase extraction for any approach, an evaluation process must be performed. The approach is applied to a set of data and the extracted key- phrases are compared via evaluation metrics to a set of manually assigned keyphrases. There are many datasets that have been exploited in the evaluation process by the meth- ods studied. In Figure 7 we present ten datasets that were used by most of articles studied. 0 5 10 15 20 25 30 35 Inspec Semeval 2010 Krapivin DUC 2001 NUS KDD WWW 500N- KPCorwd SemEval 2017 KP20k Number of articles Fig. 7. Comparison of datasets based on the number of articles in which they were used Inspec, Semeval2010 and Krapivin are the most used datasets. It is not surprising that all three datasets are widely used. Because its scientific publications have a professional expression and a clear semantics compared to other datasets. Each paper used in these groups has its own keyphrases assigned by the authors which make the evaluation pro- cess somewhat accurate. However, the only problem that datasets, is the dependence of most of them on articles in the English language, and therefore the methods that extract keyphrases from articles in another language such as Arabic, for example, are difficult to evaluate despite some attempts such as WikiAll [83] composed of 100 documents col- lected on Arabic Wikipedia. Each document has its own keyphrases written by its authors. 44 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches Datasets. In order to evaluate and develop their approaches. The authors need tex- tual sources. That is to say scientific publications, news documents, and abstracts of articles. Table 7 presents these sources. Table 7. Datasets used for AKE evaluation Type Dataset Language Docs KP/Doc Full-text Papers NUS [84] English 211 10 Krapivin [85] 2300 6 PubMed [86] 1300 5 Citeulike-180 [87] 180 5 Semeval2010 [88] 282 15 Paper Abstracts Inspec [89] English 2000 10 KDD [90] 755 4 WWW [90] 1300 5 KP20k [91] 550K 4 KPTimes [96] 260K 5 News DUC-2001 [92] English 308 10 WikiAll [83] Arabic 100 – 110-PT-BN-KP [93] Portuguese 110 28 500N-KPCorwd [94] English 500 46 Wikinews [95] French 100 10 Metrics. Most authors trust three metrics of precision, recall and F1-score, due to their accuracy and ease of use. Some works such as [32] which was also based on mean mean precision (MAP), and [68] that used mean reciprocal rank (MRR). Figure 8 shows the percentages of use of these metrics in the articles we studied. Fig. 8. The proportions of using evaluation metrics iJIM ‒ Vol. 16, No. 16, 2022 45 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches RQ 4 “What are the most realistic and exploitable AKE systems?” The growing demand for the use of keyphrases in NLP fields has prompted researchers to develop efficient systems of extracting keyphrases. In this section, we present these systems with the aim of enabling researchers to identify and exploit them in order to develop them or produce new, more efficient systems. In recent years, several AKE systems have been developed, some of them apply a single AKE approach, like KEA [4], which is a keyphrase extraction system, which works at the technique [97]. KEA is developed by the JAVA language. It is available under the GNU General Public License. Pytextrank is a python implementation, works like TextRank, with some modifications, notably the graph also contains verbs but are not selected as keyphrases. Also uses lemmatization instead of stemming. RAKE (Rapid Automatic Keyword Extraction) implemented in Python, works by technique [98]. RAKE selects phrases that are at least five characters long, phrases that make up at most three words and appear in the text at least four times. TopicCoRank, also implemented by Python [99], builds two graphs, the first represents the document and the second represents the domain, which allows extracting keyphrases that belonged to its domain. The source for TopicCoRank, is also available under GitHub. Seq2Seq is one of the few systems that rely on neural networks to extract keyphrases. It is also implemented in Python [91]. Finally, YAKE is a lightweight system, implemented by python which relies on the statistical approach [100] to select keyphrases. This system does not need an external corpus. It’s also available under GitHub. Table 8. Automatic keyphrase extraction systems Implementation Language Software Approach Type Language Python Pytextrank [101] Unsupervised Multilanguage RAKE [98] Unsupervised Multilanguage TopicCoRank [99] Unsupervised En/Fr Seq2Seq [91] Supervised En YAKE [100] Unsupervised Multilanguage PKE [37], [41], [83], [92], [95], [102], [103], [104] Supervised/ Unsupervised Multilanguage Java KEA [97] Supervised Multilanguage Maui [87] Supervised Multilanguage CiteTextRank [92], [101] Unsupervised En Sequential Labeling [87], [102], [105], [106] Supervised En C++ KE package [92], [101] Unsupervised En There are other systems that work according to a combined technique, we men- tion them, CiteTextRank is a system implemented in Java and uses several techniques including, TfIdf, TextRank, SingleRank and ExpandRank. Also, PKE is a system implemented in Python and works by eight techniques [37], [41], [83], [92], [95], [102], [103], and [104] (c.f; Table 8). KE package is a system implemented in C++, 46 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches it works by the techniques that CiteTextRank, but it only processes English texts. The last system is Sequential Labeling, implemented in Java and works by the tech- niques [87], [102], [105], and [106] (c.f; Table 8). Indeed, English is the default language for all these systems, but users can use other languages except TopicCoRank which uses English and French, while Seq2seq, CiteTextRank, Sequential Labeling, and KE package only use English. Python and Java remain among the preferred programming languages for developing keyphrase extraction systems. Figure 9 shows the proportion of systems developed in each pro-gramming language. Fig. 9. Percentage of use programming languages in AKE systems RQ 5 “What obstacles must be overcome to improve the accuracy of keyphrases extraction?” The different methods that we studied tried to improve the performance of extracting phrases from the document. These performances vary according to the language and the document domain, the length and the number of keyphrases extracted, their type, the values of the hyperparameters as well as the availability of data learning used by supervised methods. The evaluations carried out by the authors of these methods show that no method can extract keyphrases very efficiently. Considering these results, various challenges are revealed at the different stages of the keyphrase extraction: The preprocessing, the functionalities used and the identification of candidate keyphrases. At the preprocess- ing stage, most methods remove stopwords and use stemming and normalization tech- niques. These operations are always linked to the writing language of the document. Additionally, the use of individual phrases may refer to a different meaning of the text context, which negatively affects the performance of the keyphrase extraction method. When defining candidate keyphrases, most methods consider these phrases either as a single word or as a multi-word noun phrase and thus exclude any phrases that differ from this structure. In addition, despite the exploitation of resources, such as Wikipedia iJIM ‒ Vol. 16, No. 16, 2022 47 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches and external terminology databases. Most of these methods neglect the relationship between synonyms and phrase abbreviations, resulting in keyphrases that are different in writing but maybe linguistically equivalent. Indeed, future approaches should take into account the understanding of the text. Most of the current AKE methods select only the keyphrases mentioned in the text, while more than 50% of the keyphrases of a document are not mentioned, which is exemplified by the datasets that are used to evaluate the performance of the AKE. Table 9 shows the percentage of the presence of the keyphrases in 5 most datasets used for AKE evaluation. Thus, care should be taken to generate keyphrases that are not mentioned in the document. Table 9. Percentage of present and absent keyphrases in datasets Dataset Present KP Absent KP Inspec 73,58% 26,42% Krapivin 55,67% 44,33% NUS 54,64% 45,36% SemEval2010 44,37% 55,63% KP20K 62,77% 37,23% The difficulties that reduce the effectiveness of keyphrase extraction vary from one method to another. The performance of graph-based approaches is affected by changing the frequency of the co-occurrence window. Therefore, the use of semantic information when creating the graph could be an alternative solution to word windows. In addition, most of them create graphs of words and not of phrases, also the score of a phrase which consists of more than one word based on sum or an average score of the words which compose it. And this reflects negatively on the results of these methods, espe- cially for long documents. For statistical methods, the characteristics they adopt only discover the importance of each phrase of the document on the basis of its repetition and coexistence with other phrases, while the semantic relationship between words is neglected. This leads to ignoring rare keyphrases and focusing on repeated words in the document, especially in short texts. In addition, the values of the hyperparameters used also affect the performance of these approaches. The performance of embedding approaches is affected by the quality of the phrase embeddings. In addition, the vector representation of multi-word phrases always depends on the representation of their words, and some specific phrases are difficult to represent, such as biomedical terms, the same for abbreviations. Indeed, Deep Learning approaches depend on the learning dataset. They require a large dataset to be more efficient. Despite the fact that deep learning reduces the use of features, they require considerable computational time for training. This explains why they are not used by AKE systems. Also, most of these methods formulate keyphrase extraction as a sequence tagging task, not a ranking task. So maybe an important phrase won’t be extracted in the first place. Additionally, these methods do not exploit doc- ument subjects during the extraction process, which could increase the efficiency of these methods. 48 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches 5 Discussion In this section, we will generally discuss the problems encountered in the approaches proposed in the articles studied. In addition, we will present the limits of this SLR, and we suggest recommendations that should be taken into account in the future. 5.1 Discussion of problems Despite the effort made in recent years by researchers to develop keyphrase extraction. However, through our study of 60 articles published between 2015 and 2022, it is clear that these methods still have not reached the desired level due to certain problems that limit their effectiveness. Most of these methods deal with scientific doc- uments and are therefore difficult to use in other types of texts. Texts lose information during the preprocessing process due to stopword removal, stemming, and normaliza- tion. The process of extracting candidate phrases based on choosing which nominal phrases are repeated in the text, which sometimes leads to the exclusion of phrases that may be important, such as biomedical texts in which the repetition of important phrases is low. In addition, most of these techniques do not work well for keyphrases containing synonyms or abbreviations. As for the use of features, its implementation depends on the approach used. We find that some authors prefer the use of statistical features, while others prefer the use of grammatical, morphological or semantic features. Others have used external features based on certain resources, such as Wikipedia and terminology databases, despite their computational cost. However, most approaches do not use all of the features of the phrase. The above-mentioned issues negatively affect the keyphrase extraction step, which is also faced by other issues that limit its performance. One of the problems that arises during the keyphrase extraction process is that some phrases are included with others. Sometimes a few extracted phrases are different in writing but similar in meaning. Most of the methods still haven’t solved the problem of removing keyphrases that rarely appear in the document. Note that most of these methods have been tested in scientific articles so that the textual structure affects the performance of these methods. There are some shortcomings in the evaluation of keyphrase extraction approaches, as the test is limited to comparing the proposed method with traditional approaches by applying it to limited types of data. In addition, the evaluation results change depending on the number of keyphrases extracted and the length and nature of the text. Our study also shows that no method is applied to all types of data to guarantee its effectiveness. Often the proposed method is designed according to the target application and the data- set to be processed. 5.2 Recommendations The discussion of the problems encountered in the different approaches of AKE gave us an idea of the directions that researchers should work in the future in order to improve the process of AKE. Inevitably, if the interest in improving preprocessing is increased, and the POS and N-Gram modules are merged in the process of extracting iJIM ‒ Vol. 16, No. 16, 2022 49 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches candidate keyphrases, we may have more suitable candidate keyphrases. The keyphrase extraction process is mainly applied to candidate keyphrases by exploiting the fea- tures of the latter, which are mainly either statistical, linguistic, structural, or semantic. These features, especially semantics, can be exploited to overcome redundancy issues. The features of the phrase should not be treated independently of the text. Rather, the sentence should be treated through its meaning in the text and not in its general sense. Moreover, despite their computational costs, the use of dictionaries and external cor- pora will help AKE methods to highlight keyphrases that are rare to appear in the text. Improving the training process can improve the performance of supervised methods, especially if one has a complete and available body of data. Embedding keyphrases and contextual information into deep learning models is showing encouraging results. Therefore, researchers should develop models of deep learning, with a view to using them, especially in large documents. And because these approaches have more ability than others to generate keyphrases that are not mentioned in the document. Unsupervised methods remain preferred by researchers, 77% of the articles studied in our SLR propose unsupervised approaches. As they are easy to operate in various areas of NLP, they also do not require any prior domain knowledge or data training. But its performance, especially on large documents, remains poor. The exploitation of cer- tain characteristics of the document, such as knowledge of its domain and its topic, can help these approaches to partially overcome the scarcity problem. Therefore, research should focus on this direction to determine keyphrases regardless of their presence or absence in the document. Also, the use of embedding techniques has allowed these methods to process sentences according to their concept in the text, these techniques also remain among the research directions, especially in long texts. 5.3 Limitations of this SLR Although five digital libraries are used as research resources, which gave us access to a large collection of related scientific articles. However, our study is not exhaustive and does not cover all work related to keyphrase extraction, due to the methodology adopted in the study, which is based on articles published in English. Therefore, all studies published in other languages are ignored. In addition, our study focused on work published from January 2015 to May 2022. Work published after May 2022 was not studied and may be included in future work. In addition, due to the time factor, the evaluation of the proposed methods not having been verified, we limited ourselves to the results published by the authors. 6 Conclusion and future work In this SLR, we have adopted a comprehensive scientific methodology to understand the research direction related to the extraction of keyphrases from the text, to study the problems of their extraction from the solutions proposed in 60 research articles, pub- lished between 2015 and 2022. This work included a series of steps related to a broad search strategy from the choice of keyphrases for the search, through the adoption of a set of inclusion and exclusion criteria. Our study included five research questions 50 http://www.i-jim.org Paper—A Systematic Literature Review of Keyphrases Extraction Approaches that allowed us to know, the techniques used to remove unnecessary phrases, the algo- rithms, and techniques used in AKE approaches, the most important evaluation metrics, and datasets to verify the performance of the proposed solutions. Identify the most important AKE systems and their implementation languages, then identify the elements that significantly affect the efficiency and performance of AKE systems. Therefore, this study is useful not only to guide researchers and practitioners but can also be used as a support for the development of new, more precise AKE systems. In future work, we aim to develop the steps of the keyphrase extraction process, in particular the preprocess- ing and selection of candidate keyphrases. We will find a new method of transforming the text into a graph that does not depend on the window of the words. We will create a dataset composed of documents written in the Arabic language in order to evaluate AKE approaches in Arabic texts, and We will propose an AKE approach that not only extracts key phrases mentioned in the document, but can also generate key phrases that are not mentioned in the document. 7 References [1] A. Aristovnik, D. Ravšelj, and L. Umek. “A bibliometric analysis of COVID-19 across sci- ence and social science research landscape.” Sustainability 12.21 (2020): 9132. https://doi. org/10.3390/su12219132 [2] E. Papagiannopoulou, and G. Tsoumakas. “A review of keyphrase extraction.” Wiley Inter- disciplinary Reviews: Data Mining and Knowledge Discovery 10.2 (2020): e1339. https:// doi.org/10.1002/widm.1339 [3] Z. A. Merrouni, B. Frikh, and Brahim Ouhbi. “Automatic keyphrase extraction: A survey and trends.” Journal of Intelligent Information Systems 54.2 (2020): 391–424. https://doi. org/10.1007/s10844-019-00558-9 [4] Z. Nasar, S. W. Jaffry, and M. K. Malik. “Textual keyword extraction and summarization: State-of-the-art.” Information Processing & Management 56.6 (2019): 102088. https://doi. org/10.1016/j.ipm.2019.102088 [5] T. Sabri, O. El Beggar, and M. Kissi. “Comparative study of Arabic text classification using feature vectorization methods.” Procedia Computer Science 198 (2022): 269–275. https:// doi.org/10.1016/j.procs.2021.12.239 [6] L. Abualigah, et al. “Efficient text document clustering approach using multi-search Arith- metic Optimization Algorithm.” Knowledge-Based Systems 248 (2022): 108833. https:// doi.org/10.1016/j.knosys.2022.108833 [7] R. Srivastava, P. Singh, K. P. S. Rana, & V. Kumar. “A topic modeled unsupervised approach to single document extractive text summarization.” Knowledge-Based Systems 246 (2022): 108636. https://doi.org/10.1016/j.knosys.2022.108636 [8] A. Erdmann, A. Ramón, and J. M. Ponzoa. “Search engine optimization: The long-term strategy of keyword choice.” Journal of Business Research 144 (2022): 650–662. https://doi. org/10.1016/j.jbusres.2022.01.065 [9] G. Rabby, S. Azad, M. Mahmud, K. Z. Zamli, & M. Rahman. “Teket: A tree-based unsu- pervised keyphrase extraction technique.” Cognitive Computation 12.4 (2020): 811–833. https://doi.org/10.1007/s12559-019-09706-3 [10] D. Pramod, and B. Prafulla. “Conversational recommender systems techniques, tools, accep- tance, and adoption: A state of the art review.” Expert Systems with Applications (2022): 117539. https://doi.org/10.1016/j.eswa.2022.117539 iJIM ‒ Vol. 16, No. 16, 2022 51 https://doi.org/10.3390/su12219132 https://doi.org/10.3390/su12219132 https://doi.org/10.1002/widm.1339 https://doi.org/10.1002/widm.1339 https://doi.org/10.1007/s10844-019-00558-9 https://doi.org/10.1007/s10844-019-00558-9 https://doi.org/10.1016/j.ipm.2019.102088 https://doi.org/10.1016/j.ipm.2019.102088 https://doi.org/10.1016/j.procs.2021.12.239 https://doi.org/10.1016/j.procs.2021.12.239 https://doi.org/10.1016/j.knosys.2022.108833 https://doi.org/10.1016/j.knosys.2022.108833 https://doi.org/10.1016/j.knosys.2022.108636 https://doi.org/10.1016/j.jbusres.2022.01.065 https://doi.org/10.1016/j.jbusres.2022.01.065 https://doi.org/10.1007/s12559-019-09706-3 https://doi.org/10.1016/j.eswa.2022.117539 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches [11] S. Kumar, and R. Kumar. “A study on different aspects of web mining and research issues.” IOP Conference Series: Materials Science and Engineering. Vol. 1022. No. 1. IOP Publishing, 2021. https://doi.org/10.1088/1757-899X/1022/1/012018 [12] J. Ali, A. Jusoh, N. Idris, A. F. Abbas, & A. H. Alsharif. “Everything is going electronic, so do services and service quality: bibliometric analysis of E-services and E-service qual- ity.” International Journal of Interactive Mobile Technologies 15.18 (2021): 149. https://doi. org/10.3991/ijim.v15i18.24519 [13] U. Jha, L. Tyagi, D. Kansal, S. Chakraborty, & A. Singhal “A review of sentiment analysis techniques using soft computing approaches.” 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 2021. https://doi.org/10.1109/ Confluence51648.2021.9377031 [14] M. Haddoud, A. Mokhtari, T. Lecroq, & S. Abdeddaïm. “Accurate keyphrase extraction from scientific papers by mining linguistic information.” CLBib@ ISSI. 2015. [15] L. Ajallouda, O. Hourrane, A. Zellou, & E. H. Benlahmar. “Toward a new process for candi- date key-phrases extraction.” International Conference on Digital Technologies and Appli- cations. Springer, Cham, 2022. https://doi.org/10.1007/978-3-031-02447-4_48 [16] C. Lioma, and C. K. van Rijsbergen. “Part of speech n-grams and information retrieval.” Revue française de linguistique appliquée 13.1 (2008): 9–22. https://doi.org/10.3917/rfla.131.0009 [17] A. Handler, M. Denny, H. Wallach, & B. O’Connor. “Bag of what? simple noun phrase extraction for text analysis.” Proceedings of the First Workshop on NLP and Computational Social Science. 2016. https://doi.org/10.18653/v1/W16-5615 [18] N. Firoozeh, A. Nazarenko, F. Alizon, & B. Daille, “Keyword extraction: Issues and methods.” Natural Language Engineering 26.3 (2020): 259–291. https://doi.org/10.1017/ S1351324919000457 [19] J. Biolchini, P. G. Mian, A. C. C. Natali, & G. H. Travassos. “Systematic review in software engineering.” System engineering and computer science department COPPE/UFRJ, Tech- nical Report ES 679.05 (2005): 45. [20] J. Renaud, V. Martin, and P. Dagenais. Les normes de production des revues systéma- tiques: Guide méthodologique. Institut national d’excellence en santé et en services sociaux (INESSS). (2013). [21] B. Kitchenham, Barbara, and P. Brereton. “A systematic review of systematic review pro- cess research in software engineering.” Information and software technology 55.12 (2013): 2049–2075. https://doi.org/10.1016/j.infsof.2013.07.010 [22] I. Mustapha, N. Khan, M. I. Qureshi, A. A. Harasis, & N. T. Van. “Impact of Industry 4.0 on healthcare: A systematic literature review (SLR) from the last decade.” International Journal of Interactive Mobile Technologies 15.18 (2021). https://doi.org/10.3991/ijim. v15i18.25531 [23] S. S. A. Shah Kazmi, M. Hassan, S. A. Khawaj, & S. F. Padlee. “The use of AR technology to overcome online shopping phobia.” International Journal of Interactive Mobile Tech- nologies 15.5 (2021). https://doi.org/10.3991/ijim.v15i05.21043 [24] S. Popova, Svetlana, and V. Danilova. “Keyphrase extraction abstracts instead of full papers.” 2014 25th International Workshop on Database and Expert Systems Applications. IEEE, 2014. https://doi.org/10.1109/DEXA.2014.57 [25] S. Danesh, T. Sumner, and J. H. Martin. “Sgrank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction.” Proceedings of the fourth joint conference on lexical and computational semantics. 2015. https://doi. org/10.18653/v1/S15-1013 [26] H. Jia, and E. Saule. “Addressing overgeneration error: An effective and effcient approach to keyphrase extraction from scientific papers.” BIRNDL@ SIGIR. 2018. 52 http://www.i-jim.org https://doi.org/10.1088/1757-899X/1022/1/012018 https://doi.org/10.3991/ijim.v15i18.24519 https://doi.org/10.3991/ijim.v15i18.24519 https://doi.org/10.1109/Confluence51648.2021.9377031 https://doi.org/10.1109/Confluence51648.2021.9377031 https://doi.org/10.1007/978-3-031-02447-4_48 https://doi.org/10.3917/rfla.131.0009 https://doi.org/10.18653/v1/W16-5615 https://doi.org/10.1017/S1351324919000457 https://doi.org/10.1017/S1351324919000457 https://doi.org/10.1016/j.infsof.2013.07.010 https://doi.org/10.3991/ijim.v15i18.25531 https://doi.org/10.3991/ijim.v15i18.25531 https://doi.org/10.3991/ijim.v15i05.21043 https://doi.org/10.1109/DEXA.2014.57 https://doi.org/10.18653/v1/S15-1013 https://doi.org/10.18653/v1/S15-1013 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches [27] Q. Liu, D. Kawahara, and S. Li. “Scientific Keyphrase extraction: extracting candidates with semi-supervised data augmentation.” Chinese Computational Linguistics and Natural Lan- guage Processing Based on Naturally Annotated Big Data. Springer, Cham, 2018. 183–194. https://doi.org/10.1007/978-3-030-01716-3_16 [28] M. Barreiro-Guerrero, A. Simón-Cuevas, Y. Pérez-Guadarrama, F. P. Romero, & J. A. Olivas. “Applying OWA operator in the semantic processing for automatic keyphrase extraction.” Iberoamerican Congress on Pattern Recognition. Springer, Cham, 2019. https:// doi.org/10.1007/978-3-030-33904-3_6 [29] J. R. Asl, and J. M. Banda. “GLEAKE: Global and local embedding automatic keyphrase extraction.” arXiv preprint arXiv:2005.09740 (2020). [30] N. Giamblanco, and P. Siddavaatam. “Keyword and keyphrase extraction using newton’s law of universal gravitation.” 2017 IEEE 30th Canadian conference on electrical and com- puter engineering (CCECE). IEEE, 2017 https://doi.org/10.1109/CCECE.2017.7946724 [31] G. Rabby, S. Azad, M. Mahmud, K. Z. Zamli, & M. M. Rahman. “A flexible keyphrase extraction technique for academic literature.” Procedia Computer Science 135 (2018): 553–563. https://doi.org/10.1016/j.procs.2018.08.208 [32] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, & A. Jatowt, A. «YAKE! Keyword extraction from single documents using multiple local features.” Information Sci- ences 509 (2020): 257–289. https://doi.org/10.1016/j.ins.2019.09.013 [33] M. Aman, S. J. Abdulkadir, I. A. Aziz, H. Alhussian, & I. Ullah. “KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data.” Multimedia Tools and Applications 80.8 (2021): 12469–12506. https://doi.org/10.1007/s11042-020-10215-x [34] Z. A. Merrouni, F. Bouchra, and O. Brahim. “HAKE: An unsupervised approach to auto- matic keyphrase extraction for multiple domains.” Cognitive Computation (2022): 1–23. https://doi.org/10.1007/s12559-021-09979-7 [35] M. B. A. Miah, S. Awang, M. S. Azad, & M. M. Rahman. “Keyphrases concentrated area identification from academic articles as feature of keyphrase extraction: A new unsupervised approach.” International Journal of Advanced Computer Science and Applications 13.1 (2022). https://doi.org/10.14569/IJACSA.2022.0130192 [36] Y. Wen, H. Yuan, and P. Zhang. “Research on keyword extraction based on word2vec weighted textrank.” 2016 2nd IEEE International Conference on Computer and Communi- cations (ICCC). IEEE, 2016. [37] C. Florescu, and C. Caragea. “Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents.” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. https://doi. org/10.18653/v1/P17-1102 [38] G. Figueroa, P. Chen, and Y. Chen. “RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation.” Computer Speech & Language 47 (2018): 112–131. https://doi.org/10.1016/j.csl.2017.07.004 [39] W. Chen, Z. Liu, W. Shi, & J. X. Yu. “Keyphrase extraction based on optimized random walks on multiple word relations.” Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data. Springer, Cham, 2018. https://doi.org/10.1007/978-3-319-96893-3_27 [40] Y. Perez-Guadarrama, A. Simón-Cuevas, W. Hojas-Mazo, J. A. Olivas, & F. P. Romero. “A fuzzy approach to improve an unsupervised automatic keyphrase extraction process.” 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2018. https:// doi.org/10.1109/FUZZ-IEEE.2018.8491487 [41] F. Boudin. “Unsupervised keyphrase extraction with multipartite graphs.” arXiv preprint arXiv:1803.08721 (2018). https://doi.org/10.18653/v1/N18-2105 iJIM ‒ Vol. 16, No. 16, 2022 53 https://doi.org/10.1007/978-3-030-01716-3_16 https://doi.org/10.1007/978-3-030-33904-3_6 https://doi.org/10.1007/978-3-030-33904-3_6 https://doi.org/10.1109/CCECE.2017.7946724 https://doi.org/10.1016/j.procs.2018.08.208 https://doi.org/10.1016/j.ins.2019.09.013 https://doi.org/10.1007/s11042-020-10215-x https://doi.org/10.1007/s12559-021-09979-7 https://doi.org/10.14569/IJACSA.2022.0130192 https://doi.org/10.18653/v1/P17-1102 https://doi.org/10.18653/v1/P17-1102 https://doi.org/10.1016/j.csl.2017.07.004 https://doi.org/10.1007/978-3-319-96893-3_27 https://doi.org/10.1109/FUZZ-IEEE.2018.8491487 https://doi.org/10.1109/FUZZ-IEEE.2018.8491487 https://doi.org/10.18653/v1/N18-2105 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches [42] Z. Sun, J. Tang, P. Du, Z. H. Deng, & J. Y. Nie. “Divgraphpointer: A graph pointer net- work for extracting diverse keyphrases.” Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019. https://doi. org/10.1145/3331184.3331219 [43] T. F. Li, L. Hu, J. F. Chu, H. T. Li, & L. Chi. “An unsupervised approach for keyphrase extraction using within-collection resources,” IEEE Access 7 (2019): 126088–126097 https://doi.org/10.1109/ACCESS.2019.2938213 [44] A. Prasad, and M. Y. Kan. “Glocal: Incorporating global information in local convolution for keyphrase extraction.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. [45] M. Tosi, D. Lucca, and J. Cesar dos Reis. “C-rank: a concept linking approach to unsuper- vised keyphrase extraction.” Research Conference on Metadata and Semantics Research. Springer, Cham, 2019. https://doi.org/10.1007/978-3-030-36599-8_21 [46] H. Dong, J. Wan, and Z. Wan. “Keyphrase Extraction Based on Multi-Feature.” 2019 Inter- national Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). IEEE, 2019. https://doi.org/10.1109/MLBDBI48998.2019.00047 [47] H. Yeom, Y. Ko, and J. Seo. “Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modi- fied C-value method.” Computer Speech & Language 58 (2019): 304–318. https://doi. org/10.1016/j.csl.2019.04.008 [48] T. Chen, D. Miao, and Y. Zhang. “A graph-based keyphrase extraction model with three-way decision.” International Joint Conference on Rough Sets. Springer, Cham, 2020. https://doi. org/10.1007/978-3-030-52705-1_8 [49] L. Luo, L. Zhang, and H. Peng. “An unsupervised keyphrase extraction model by incorpo- rating structural and semantic information.” Progress in Artificial Intelligence 9.1 (2020): 77–83. https://doi.org/10.1007/s13748-019-00200-3 [50] Y. Yeon Sung, and S. B. Kim. “Topical keyphrase extraction with hierarchical seman- tic networks.” Decision Support Systems 128 (2020): 113163. https://doi.org/10.1016/j. dss.2019.113163 [51] M. N. Awan, and M. O. Beg. “Top-rank: a topicalpostionrank for extraction and classifica- tion of keyphrases in text.” Computer Speech & Language 65 (2021): 101116. https://doi. org/10.1016/j.csl.2020.101116 [52] P. Yang, Y. Ge, Y. Yao, & Y. Yang. “GCN-based document representation for keyphrase generation enhanced by maximizing mutual information.” Knowledge-Based Systems 243 (2022): 108488. https://doi.org/10.1016/j.knosys.2022.108488 [53] V. Venktesh, M. Mohania, and V. Goyal. “Topic aware contextualized embeddings for high quality phrase extraction.” European Conference on Information Retrieval. Springer, Cham, 2022. https://doi.org/10.1007/978-3-030-99736-6_31 [54] L. Ajallouda, K. Najmani, A. Zellou, & E. L. Benlahmar. “Doc2Vec, SBERT, InferSent, and USE Which embedding technique for noun phrases?” 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE, 2022. https://doi.org/10.1109/IRASET52964.2022.9738300 [55] P. Zeng, P., Q. Tan, Y. Yan, Q. Xie, J. Xu, & W. Cao. “Automatic keyword extraction using word embedding and clustering.” 2017 International Conference on Computer Systems, Elec- tronics and Control (ICCSEC). IEEE, 2017. https://doi.org/10.1109/ICCSEC.2017.8447033 [56] K. Bennani-Smires, C. Musat, A. Hossmann, M. Baeriswyl, & M. Jaggi. “Simple unsupervised keyphrase extraction using sentence embeddings.” arXiv preprint arXiv:1801.04470 (2018). https://doi.org/10.18653/v1/K18-1022 54 http://www.i-jim.org https://doi.org/10.1145/3331184.3331219 https://doi.org/10.1145/3331184.3331219 https://doi.org/10.1109/ACCESS.2019.2938213 https://doi.org/10.1007/978-3-030-36599-8_21 https://doi.org/10.1109/MLBDBI48998.2019.00047 https://doi.org/10.1016/j.csl.2019.04.008 https://doi.org/10.1016/j.csl.2019.04.008 https://doi.org/10.1007/978-3-030-52705-1_8 https://doi.org/10.1007/978-3-030-52705-1_8 https://doi.org/10.1007/s13748-019-00200-3 https://doi.org/10.1016/j.dss.2019.113163 https://doi.org/10.1016/j.dss.2019.113163 https://doi.org/10.1016/j.csl.2020.101116 https://doi.org/10.1016/j.csl.2020.101116 https://doi.org/10.1016/j.knosys.2022.108488 https://doi.org/10.1007/978-3-030-99736-6_31 https://doi.org/10.1109/IRASET52964.2022.9738300 https://doi.org/10.1109/ICCSEC.2017.8447033 https://doi.org/10.18653/v1/K18-1022 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches [57] E. Papagiannopoulou, and G. Tsoumakas. “Unsupervised keyphrase extraction based on outlier detection.” arXiv preprint arXiv: 1808.03712 (2018). [58] D. Mahata, J. Kuriakose, R. Shah, & R. Zimmermann. “Key2vec: Automatic ranked key- phrase extraction from scientific articles using phrase embeddings.” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018. https://doi. org/10.18653/v1/N18-2100 [59] A. Toleu, G. Tolegen, and R. Mussabayev. “Keyvector: Unsupervised keyphrase extraction using weighted topic via semantic relatedness.” Computación y Sistemas 23.3 (2019): 861–869. https://doi.org/10.13053/cys-23-3-3264 [60] T. Mikolov, K. Chen, G. Corrado, & J. Dean. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013). [61] Q. Le, and T. Mikolov. “Distributed representations of sentences and documents.” Interna- tional conference on machine learning. PMLR, 2014. [62] J. Pennington, R. Socher, and C. D. Manning. “Glove: Global vectors for word representa- tion.” Proceedings of the 2014 conference on empirical methods in natural language pro- cessing (EMNLP). 2014. https://doi.org/10.3115/v1/D14-1162 [63] M. Pagliardini, P. Gupta, and M. Jaggi. “Unsupervised learning of sentence embeddings using compositional n-gram features.” arXiv preprint arXiv:1703.02507 (2017). https://doi. org/10.18653/v1/N18-1049 [64] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, & A. Bordes. “Supervised learning of universal sentence representations from natural language inference data.” arXiv preprint arXiv:1705.02364 (2017). https://doi.org/10.18653/v1/D17-1070 [65] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, & L. Zettlemoyer. Deep contextualized word representations. arXiv 2018. arXiv preprint arXiv:1802.05365, 12. https://doi.org/10.18653/v1/N18-1202 [66] J. Devlin, M. W. Chang, K. Lee, & K. Toutanova. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018). [67] D. Cer, et al. “Universal sentence encoder.” arXiv preprint arXiv:1803.11175 (2018). [68] W. Fan, H. Liu, S. Wang, Y. Zhang, & Y. Chang. “Extracting keyphrases from research papers using word embeddings.” Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2019. https://doi.org/10.1007/978-3-030-16142-2_5 [69] Y. Sun, H. Qiu, Y. Zheng, Z. Wang, & C. Zhang. “SIFRank: a new baseline for unsuper- vised keyphrase extraction based on pre-trained language model.” IEEE Access 8 (2020): 10896–10906. https://doi.org/10.1109/ACCESS.2020.2965087 [70] S. Arora, Y. Liang, and T. Ma. “A simple but tough-to-beat baseline for sentence embed- dings.” International conference on learning representations. 2017. [71] L. Ajallouda, F. Z. Fagroud, A. Zellou, & E. H. Benlahmar. KP-USE: An Unsupervised Approach for Key-Phrases Extraction from Documents. (IJACSA) International Journal of Advanced Computer Science and Applications, 13(4). (2022). https://doi.org/10.14569/ IJACSA.2022.0130433 [72] H. Wang, and J. Li. “Unsupervised Keyphrase Extraction from Single Document Based on Bert.” 2022 International Seminar on Computer Science and Engineering Technology (SCSET). IEEE, 2022. https://doi.org/10.1109/SCSET55041.2022.00068 [73] F. C. Jonathan, and O. Karnalim. “Semi-supervised keyphrase extraction on scientific article using fact-based sentiment.” Telkomnika 16.4 (2018): 1771–1778. https://doi.org/10.12928/ telkomnika.v16i4.5473 [74] M. Helmy, R. M. Vigneshram, G. Serra, & C. Tasso. “Applying deep learning for Ara- bic keyphrase extraction.” Procedia Computer Science 142 (2018): 254–261. https://doi. org/10.1016/j.procs.2018.10.486 iJIM ‒ Vol. 16, No. 16, 2022 55 https://doi.org/10.18653/v1/N18-2100 https://doi.org/10.18653/v1/N18-2100 https://doi.org/10.13053/cys-23-3-3264 https://doi.org/10.3115/v1/D14-1162 https://doi.org/10.18653/v1/N18-1049 https://doi.org/10.18653/v1/N18-1049 https://doi.org/10.18653/v1/D17-1070 https://doi.org/10.18653/v1/N18-1202 https://doi.org/10.1007/978-3-030-16142-2_5 https://doi.org/10.1109/ACCESS.2020.2965087 https://doi.org/10.14569/IJACSA.2022.0130433 https://doi.org/10.14569/IJACSA.2022.0130433 https://doi.org/10.1109/SCSET55041.2022.00068 https://doi.org/10.12928/telkomnika.v16i4.5473 https://doi.org/10.12928/telkomnika.v16i4.5473 https://doi.org/10.1016/j.procs.2018.10.486 https://doi.org/10.1016/j.procs.2018.10.486 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches [75] R. Alzaidy, C. Caragea, and C. L. Giles. “Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents.” The world wide web conference. 2019. https://doi. org/10.1145/3308558.3313642 [76] K. Patel, and C Caragea. “Exploring word embeddings in crf-based keyphrase extraction from research papers.” Proceedings of the 10th International Conference on Knowledge Capture. 2019. https://doi.org/10.1145/3360901.3364447 [77] D. Sahrawat, et al. “Keyphrase extraction from scholarly articles as sequence labeling using contextualized embeddings.” arXiv preprint arXiv:1910.08840 (2019). [78] L. Xiong, C. Hu, C. Xiong, D. Campos, & A. Overwijk. “Open domain web keyphrase extraction beyond language modeling.” arXiv preprint arXiv:1911.02671 (2019). https:// doi.org/10.18653/v1/D19-1521 [79] X. Zhu, C. Lyu, D. Ji, H. Liao, & F. Li. “Deep neural model with self-training for scientific keyphrase extraction.” Plos one 15.5 (2020): e0232547. https://doi.org/10.1371/journal. pone.0232547 [80] T. Zhou, Y. Zhang, and H. Zhu. “Multi-level memory network with crfs for keyphrase extraction.” Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2020. https://doi.org/10.1007/978-3-030-47426-3_56 [81] Y. Uzun. “Keyword extraction using naive bayes.” Bilkent University, Department of Computer Science, Turkey, 2005. www.cs.bilkent.edu.tr/~guvenir/courses/CS550/Workshop/Yasin_ Uzun.pdf [82] H. Wu, B. Ma, W. Liu, T. Chen, & Nie, D. “Fast and constrained absent keyphrase genera- tion by prompt-based learning.” (2022). https://doi.org/10.1609/aaai.v36i10.21402 [83] S. R. El-Beltagy, & A. Rafea. “KP-Miner: A keyphrase extraction system for English and Arabic documents.” Information Systems 34.1 (2009): 132–144. https://doi.org/10.1016/j. is.2008.05.002 [84] T. D. Nguyen, and M. Y. Kan. “Keyphrase extraction in scientific publications.” Interna- tional conference on Asian digital libraries. Springer, Berlin, Heidelberg, 2007. [85] M. Krapivin, A. Autaeu, and M. Marchese. “Large dataset for keyphrases extraction.” (2009). [86] A. T. Schutz. “Keyphrase extraction from single documents in the open domain exploiting linguistic and statistical methods.” M. App. Sc Thesis (2008). [87] O. Medelyan, E. Frank, and L. H. Witten. “Human-competitive tagging using automatic keyphrase extraction.” Association for Computational Linguistics, 2009. https://doi. org/10.3115/1699648.1699678 [88] S. N. Kim, O. Medelyan, M. Y. Kan, T. Baldwin, & L. P. Pingar. “SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific.” [89] A. Hulth. “Improved automatic keyword extraction given more linguistic knowledge.” Pro- ceedings of the 2003 conference on Empirical methods in natural language processing. 2003. https://doi.org/10.3115/1119355.1119383 [90] S. D. Gollapalli, and C. Caragea. “Extracting keyphrases from research papers using citation networks.” Proceedings of the AAAI conference on artificial intelligence. Vol. 28. No. 1. 2014. https://doi.org/10.1609/aaai.v28i1.8946 [91] R. Meng, S. Zhao, S. Han, D. He, P. Brusilovsky, & Y. Chi. “Deep keyphrase genera- tion.” arXiv preprint arXiv:1704.06879 (2017). https://doi.org/10.18653/v1/P17-1054 [92] X. Wan, and J. Xiao. “Single document keyphrase extraction using neighborhood knowl- edge.” AAAI. Vol. 8. 2008. [93] L. Marujo, M. Viveiros, and J. Paulo da S. Neto. “Keyphrase cloud generation of broadcast news.” arXiv preprint arXiv:1306.4606 (2013). 56 http://www.i-jim.org https://doi.org/10.1145/3308558.3313642 https://doi.org/10.1145/3308558.3313642 https://doi.org/10.1145/3360901.3364447 https://doi.org/10.18653/v1/D19-1521 https://doi.org/10.18653/v1/D19-1521 https://doi.org/10.1371/journal.pone.0232547 https://doi.org/10.1371/journal.pone.0232547 https://doi.org/10.1007/978-3-030-47426-3_56 http://www.cs.bilkent.edu.tr/~guvenir/courses/CS550/Workshop/Yasin_Uzun.pdf http://www.cs.bilkent.edu.tr/~guvenir/courses/CS550/Workshop/Yasin_Uzun.pdf https://doi.org/10.1609/aaai.v36i10.21402 https://doi.org/10.1016/j.is.2008.05.002 https://doi.org/10.1016/j.is.2008.05.002 https://doi.org/10.3115/1699648.1699678 https://doi.org/10.3115/1699648.1699678 https://doi.org/10.3115/1119355.1119383 https://doi.org/10.1609/aaai.v28i1.8946 https://doi.org/10.18653/v1/P17-1054 Paper—A Systematic Literature Review of Keyphrases Extraction Approaches [94] L. Marujo, A. Gershman, J. Carbonell, R. Frederking, & J. P. Neto. “Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization.” arXiv preprint arXiv:1306.4886 (2013). [95] A. Bougouin, F. Boudin, and B. Daille. “Topicrank: Graph-based topic ranking for key- phrase extraction.” International joint conference on natural language processing (IJCNLP). 2013. [96] Y. Gallina, F. Boudin, and B. Daille. “KPTimes: A large-scale dataset for keyphrase generation on news documents.” arXiv preprint arXiv:1911.12559 (2019). https://doi. org/10.18653/v1/W19-8617 [97] E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, C. G. NevillManning. Domain-specific keyphrase extraction. Proceedings of the Sixteenth International Joint Conference on Arti- ficial Intelligence, pp. 668–673. San Francisco, CA, USA. (1999). [98] S. Rose, D. Engel, N. Cramer, & W. Cowley. “Automatic keyword extraction from indi- vidual documents.” Text Mining: Applications and Theory 1 (2010): 1–20. https://doi. org/10.1002/9780470689646.ch1 [99] A. Bougouin, F. Boudin, and B. Daille. “Keyphrase annotation with graph co-ranking.” arXiv preprint arXiv:1611.02007 (2016). [100] R. Campos, V. Mangaravite, A. Pasquali, A. M. Jorge, C. Nunes, & A. Jatowt. “Yake! collection-independent automatic keyword extractor.” European Conference on Informa- tion Retrieval. Springer, Cham, 2018. https://doi.org/10.1007/978-3-319-76941-7_80 [101] R. Mihalcea, and TP. Tarau. “Textrank: Bringing order into text.” Proceedings of the 2004 conference on empirical methods in natural language processing. 2004. [102] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, & C. G. Nevill-Manning. “Kea: Practical automated keyphrase extraction.” Design and Usability of Digital Libraries: Case Stud- ies in the Asia Pacific. IGI global, 2005. 129–152. https://doi.org/10.4018/978-1-59140- 441-5.ch008 [103] T. D. Nguyen, and M. T. Luong. “WINGNUS: Keyphrase extraction utilizing document logical structure.” Proceedings of the 5th international workshop on semantic evaluation. 2010. [104] L. Sterckx, T. Demeester, J. Deleu, & C. Develder. “Topical word importance for fast key- phrase extraction.” Proceedings of the 24th International Conference on World Wide Web. 2015. https://doi.org/10.1145/2740908.2742730 [105] C. Caragea, F. Bulgarov, A. Godea, & S. D. Gollapalli. “Citation-enhanced keyphrase extraction from research papers: A supervised approach.” Proceedings of the 2014 con- ference on empirical methods in natural language processing (EMNLP). 2014. https://doi. org/10.3115/v1/D14-1150 [106] S. Gollapalli, D. X. Li, and P. Yang. “Incorporating expert knowledge into keyphrase extraction.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1. 2017. https://doi.org/10.1609/aaai.v31i1.10986 8 Authors Lahbib Ajallouda is PhD Student at Computer Science and Systems Analysis School (ENSIAS), Mohamed V University, Rabat, Morocco. His research interests are primarily in the area of internet of things, search engines, cloud computing, and machine learning, where he is the author/co-author of over 6 research publications. He can be contacted at email: lahbib_ajallouda@um5.ac.ma. iJIM ‒ Vol. 16, No. 16, 2022 57 https://doi.org/10.18653/v1/W19-8617 https://doi.org/10.18653/v1/W19-8617 https://doi.org/10.1002/9780470689646.ch1 https://doi.org/10.1002/9780470689646.ch1 https://doi.org/10.1007/978-3-319-76941-7_80 https://doi.org/10.4018/978-1-59140-441-5.ch008 https://doi.org/10.4018/978-1-59140-441-5.ch008 https://doi.org/10.1145/2740908.2742730 https://doi.org/10.3115/v1/D14-1150 https://doi.org/10.3115/v1/D14-1150 https://doi.org/10.1609/aaai.v31i1.10986 mailto:lahbib_ajallouda@um5.ac.ma Paper—A Systematic Literature Review of Keyphrases Extraction Approaches Fatima Zahra Fagroud is PhD Student at Faculty of Sciences Ben M’sick, Hassan II University of Casablanca, Morocco. Her research interests are primarily in the area of internet of things, search engines, cloud computing, machine learning, where she is the author/co-author of over 14 research publications. She can be contacted at email: fagroudfatimazahra0512@gmail.com. Ahmed Zellou received his Ph.D. in Applied Sciences at the Mohammedia School of Engineers, Mohammed V University, Rabat, Morocco 2008. He is currently a coordinator of the IWIM Web Engineering & Mobile Computing branch at ENSIAS Mohamed V university in Rabat, Morocco. His research interests include Parallel Computing, Information Systems (Business Informatics), and Distributed Computing, where he is the author/co-author of over 72 research publications. He can be contacted by email: ahmed.zellou@um5.ac.ma. EL Habib Benlahmar received his PhD, computer science at Computer Science and Systems Analysis School (ENSIAS), University Mohamed V, Rabat, Morocco. He is currently a coordinator of the master data science & Big Data at FSBM Hassane II University Casablanca Morocco. His research interests include Educational Technology, Software Engineering, Information Systems (Business Informatics), and Human-computer Interaction, where he is the author/co-author of over 165 research publications. He can be contacted by email: h.benlahmer@gmail.com. Article submitted 2022-06-07. Resubmitted 2022-07-06. Final acceptance 2022-07-06. Final version published as submitted by the authors. 58 http://www.i-jim.org mailto:fagroudfatimazahra0512@gmail.com mailto:ahmed.zellou@um5.ac.ma mailto:h.benlahmer@gmail.com