Lontar - Template


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

73 
 

Web Scraping and Winnowing Algorithms for 
Plagiarism Detection of Final Project Titles  

 
Neng Ika Kurniati
a1

, Alam Rahmatulloh
a2

, Ridwan Nur Qomar
a3

 
a
Program Studi Informatika, Fakultas Teknik, Universitas Siliwangi 

Siliwangi Street Number 24, Tasikmalaya City 46115, West Java, Indonesia 
1
nengikakurniati@unsil.ac.id, 

2
alam@unsil.ac.id, 

3
ridwan.nurqomar14@student.unsil.ac.id 

 
Abstract 
 

Plagiarism in research can occur due to accident or intentional. Plagiarism is an act that violates 
copyright and includes actions that harm others. In submitting the title of the research, for 
example, for the final assignment research, not a few students who repeatedly submitted titles 
were rejected and considered doing plagiarism because the title proposed had already existed 
before. Then we need a system that can detect the similarity between the titles to be submitted 
and the existing titles so that it is expected to reduce the occurrence of plagiarism. This study 
uses a winnowing algorithm to find the percentage similarity between titles. The Google Scholar 
will be used to obtain data on research titles that have been previously available as comparison 
titles. Web scraping with CURL (Client URLs) and simple HTML DOM parser is used to retrieve 
title data from Google Scholar. The results of the study with the application of a Winnowing 
algorithm to find the percentage similarity to data from Google Scholar were able to present a 
percentage of similarities in percent with the category of mild, moderate or severe plagiarism, 
while also helping early detection as prevention of plagiarism. 

  
Keywords: Final Project, Google Scholar, Plagiarism, Web Scraping, Winnowing Algorithm 
  
 
1. Introduction 

Determination of whether or not a title of the Final Project is acceptable and to find out whether 
the title already exists or not currently done is through control and selection of the lecturers or 
supervisors. Sometimes the ability of the lecturer in exercising control and selection is still 
constrained by having to check and find out with the memory abilities of each lecturer or 
supervisor that may be limited so that sometimes some titles pass the observation that causes 
duplicate titles. 

Title duplication is a common form of plagiarism in writing final project [1], [2], [3]. As one way to 
overcome these problems, a system is needed to find out how much the percentage of the title 
of the research submitted by students with the title of the research that already exists. Data from 
research titles that have been available on Google Scholar, which include online journals from 
scientific publications [4] can be used to assist in obtaining other pre-existing titles as a 
reference or similar titles. 

The application of web scraping with CURL (Client URLs) and simple HTML DOM parser can 
help to retrieve title data, as a comparison of existing research title data in google scholar [5]. 
Web scraping is a technique for retrieving information from a website [6], [7]. CURL is useful to 
transfer data to and from the server with a library and command line. CURL is useful for data 
retrieval methods from sites [8], [9].  Simple HTML DOM parser helps manipulate HTML 
elements that can work with HTML code that does not include W3C validation because Simple 
HTML DOM parsers are not limited to valid HTML classes. DOM elements can also be deleted, 
added, or changed. In HTML DOM data retrieval is based on tags, classes, IDs, and so on [10], 
[11]. 


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

74 
 

Winnowing algorithm can be used to find the percentage of the similarity of the text of the 
research title proposed with the research title data from Google Scholar. Google Scholar is one 
of the references for search engine scientific publications so that data from the Google Scholar 
is a proper scientific work data used as a comparison in detecting the proposed title of the final 
assignment of student research. 

The winnowing algorithm has fulfilled the prerequisites of the text similarity detection algorithm, 
namely whitespace insensitivity, i.e., only characters in the form of letters or numbers will be 
processed further and discard all irrelevant characters such as punctuation, spaces and other 
characters [12], [13]. The winnowing algorithm can detect plagiarism of text or documents even 
though the document has been changed in sentence structure either by spinning or 
paraphrasing techniques [14]. Compared to the Rabin-Karp algorithm, the winnowing algorithm 
produces a better percentage level with a faster processing time [15]. Previous research [16], 
[17], [18], [19], [20] has been carried out, but each study has not collaborated and utilized 
Google Scholar resources, as comparable data for the Final Project title using the Winnowing 
Algorithm. 

Based on these problems, to reduce plagiarism and detect early submission of student research 
titles, a study was conducted entitled "Web Scraping and Winnowing Algorithms for Plagiarism 
Detection of Final Project Titles". 

2. Research Methods 

2.1. Related Works 

Table 1 Research related to web scraping, winnowing algorithms, and google scholar include: 
1. This study built a system to collect parallel corpus between Indonesian and English. 

The scraping process with the HTML DOM method has produced parallel corpus 
documents of 38,712 pairs [17]. 

2. This research builds a system to detect thesis titles using a winnowing algorithm to 
facilitate the final task coordinator or Chair of the Study Program in determining the 
percentage of similarities. The system in this study will detect the similarity of a title 
entered with the title data that has been stored in the database [18]. 

3. This research builds a website that is useful for finding the desired collection of 
journals. This website was created to streamline the search for scientific journals in the 
Mendeley and google scholar by utilizing ParsCit citation extraction paper data [19]. 

4. This study discusses the use of google scholar, which makes it easier for final level 
students to find legitimate reference sources for thesis assignments. Google scholar 
also makes it easy for trial examiners to search for words or sentences plagiarized by 
students who copy other people's work [20]. 

Table 1. Comparison of Related Research 

No. Research Web Scraping Winnowing Algorithm Google Scholar 

1. [17] Yes No No 
2. [18] No Yes No 
3. [19] No No Yes 
4. [20] No No Yes 
5. Proposed Research Yes Yes Yes 

 
LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

75 
 

2.2. Web Scraping Architecture from Google Scholar 

 
Figure 1. Web Scraping Architecture from Google Scholar 

Figure 1 is a web scraping architecture. The web application requests Google Scholar, and then 
Google Scholar responds with HTML resources. Simple HTML DOM is used to convert HTML 
data and manipulate HTML elements for retrieving the data needed namely title data. Then the 
storage is carried out on the database, and the data is compared using a winnowing algorithm 
so that the comparison results with the value data in the form of a percentage of plagiarism. 

2.3. Flowchart of Plagiarism Detection using Web Scraping and Winnowing Algorithms 

Figure 2 a web scraping flowchart and winnowing algorithm. First, the user enters the title that 
will be checked by plagiarism, then the system with web scraping will retrieve the title data from 
the Google Scholar according to what was entered by the user. Next is the title data from 
Google Scholar compared to the similarity with the title entered by the user using the Winnowing 
algorithm. The last process of the system will display information on title data along with the 
percentage of similarity.  

 
Figure 2. Flowchart of Plagiarism Detection using Web Scraping and Winnowing 
Algorithms 


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

76 
 

2.4. Textual Analysis 

This system is expected to help to reduce the occurrence of duplication of research titles or 
plagiarism. The user checks by entering the final project title. Furthermore, the system will 
retrieve title data with web scraping from Google Scholar according to the title entered by the 
user. The title data from Google Scholar will be processed with a winnowing algorithm to find 
the percentage similarity between the titles entered by the user and the title of the Google 
Scholar. 

2.5.  Use Case 

 
Figure 3. Use Case Diagram 

The similarity check form in Figure 3 is a menu for checking the similarity of research titles with 
other research titles that already exist in Google Scholar by entering the research title to be 
searched for or checked for similarity. Web scraping is used to retrieve data from other research 
titles that already exist in Google Scholar as a reference or comparison. The process of finding 
the percentage similarity of the research title using the Winnowing algorithm by comparing the 
titles entered by the actor with the final project title data from Google Scholar.  


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

77 
 

2.6. Coding 

 
Figure 4. Source Code for Title Data Collection 

Figure 4 is a code for web scraping programs using PHP to retrieve research title data from 
Google Scholar. Retrieving title data is per page with many titles, which are ten titles. Function 
url_request () is CURL which is used to send user agent information to Google Scholar like a 
web browser so that Google Scholar considers requests made by a user using a web browser 
and stores cookies given by Google Scholar. The function scholar () has a function to get the 
title data obtained by manipulating the Google scholar HTML data based on the id using the 
function of the simple HTML DOM parser library. 

  
3. Result and Discussion 

The user checks the similarity of the title by filling out the input form "enter the title". After filling 
in the title input form and pressing the search button, the system will display the research title 
data obtained from Google Scholar along with the percentage of similarities shown in Figure 5.  

 
LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

78 
 

Figure 5. Display menu looking for title similarity 

3.1. Black-Box Testing 

Black-box testing is a method for testing software in terms of functional specifications without 
testing the design and program code. Testing is intended to find out whether the functions, 
inputs, and outputs of the software are by what is needed. Table 2 is the result of black-box 
testing in the application made 

Table 2. Black Box Testing 

Data Input Scenario Result 

The title of the research to be sought Will display the title data obtained from 
Google Scholar along with the 
percentage of similarity 

Success 

The title of the research to be searched 
is not available on Google Scholar 

Will not display the research title data 
including the percentage of similarity 

Success 

 
3.2. Testing the Winnowing Algorithm manually, using the system and tools plagiarism 

3.2.1. Manual Testing 

The manual calculation is a calculation carried out directly by humans without using an 
application. The process of detecting the similarity of the first title "Implementasi Teknik Web 
Scraping Pada Aplikasi Pemesanan Tiket Kereta Api" to the second title "Implementasi Teknik 
Web Scraping Pada Aplikasi Pemesanan Tiket Pesawat". 

a. Discard irrelevant characters and change all letters to lowercase in the first and second title 
text. 

First title: 

implementasiteknikwebscrapingpadaaplikasipemesanantiketkeretaapi 

 
LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

79 
 

Second title: 

implementasiteknikwebscrapingpadaaplikasipemesanantiketpesawat 

 
b. The formation of the n-gram circuit with n = 6, it will form as follows: 

n-gram first title: 

implem mpleme plemen lement ementa mentas entasi ntasit tasite asitek sitekn itekni 
teknik eknikw knikwe nikweb ikwebs kwebsc webscr ebscra bscrap scrapi crapin raping 
apingp pingpa ingpad ngpada gpadaa padaap adaapl daapli aaplik aplika plikas likasi 
ikasip kasipe asipem sipeme ipemes pemesa emesan mesana esanan sanant ananti 
nantik antike ntiket tiketk iketke ketker etkere tkeret kereta eretaa retaap etaapi 

n-gram second title: 

implem mpleme plemen lement ementa mentas entasi ntasit tasite asitek sitekn itekni 
teknik eknikw knikwe nikweb ikwebs kwebsc webscr ebscra bscrap scrapi crapin raping 
apingp pingpa ingpad ngpada gpadaa padaap adaapl daapli aaplik aplika plikas likasi 
ikasip kasipe asipem sipeme ipemes pemesa emesan mesana esanan sanant ananti 
nantik antike ntiket tiketp iketpe ketpes etpesa tpesaw pesawa esawat 

 
c. Calculates the hash value in the first n-gram series "impleme", base value (b) = 3, and n-

gram circuit length (n) = 6. 

                    
The results of all calculations of the first title hash value are: 

38752 39812 40085 38723 37534 39088 37908 40211 40544 37175 40922 39036 40670 
37565 39167 39596 38713 39693 41190 36916 37231 40356 37343 39961 36889 40051 
38605 39367 38008 39049 35607 36213 35846 36922 40168 38961 38263 38345 37141 
40811 38713 39691 37535 39073 37868 40091 36543 39023 36980 40343 40946 38375 
38694 38180 41027 38614 37936 40291 37872 

The results of all calculations of the second title hash value are: 

38752 39812 40085 38723 37534 39088 37908 40211 40544 37175 40922 39036 
40670 37565 39167 39596 38713 39693 41190 36916 37231 40356 37343 39961 
36889 40051 38605 39367 38008 39049 35607 36213 35846 36922 40168 38961 
38263 38345 37141 40811 38713 39691 37535 39073 37868 40091 36543 39023 
36980 40343 40951 38390 38740 38314 41432 39829 37955 

 
d. Setting a window with w = 4. 

Window first title: 

W-1 : {38752 39812 40085 38723} 

W-2 : {39812 40085 38723 37534} 

W-3 : {40085 38723 37534 39088} 

. . . 

W-56 : {38614 37936 40291 37872} 

Window second title: 

W-1 : {38752 39812 40085 38723} 

W-2 : {39812 40085 38723 37534} 


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

80 
 

W-3 : {40085 38723 37534 39088} 

. . . 

W-54 : {38314 41432 39829 37955} 

 
e. The selection of fingerprint values from the window formation.  

fingerprint first title: 

38723 37534 37908 37175 37565 38713 36916 37231 36889 38008 35607 35846 
36922 38263 37141 37535 36543 36980 38375 38180 37936 37872 

fingerprint second title: 

38723 37534 37908 37175 37565 38713 36916 37231 36889 38008 35607 35846 
36922 38263 37141 37535 36543 36980 38390 38314 37955 

 
f. Jaccard coefficient: 

The same fingerprint from the first title and the second title:  

(38723 37534 37908 37175 37565 38713 36916 37231 36889 38008 35607 35846 
36922 38263 37141 37535 36543 36980) = 18 

The entire fingerprint is first and second title:  

(38723 37534 37908 37175 37565 38713 36916 37231 36889 38008 35607 35846 
36922 38263 37141 37535 36543 36980 38375 38390 38180 38314 37936 37955 
37872) = 25 

Similarity : 

Similarity  
  

Percentage of text similarity between first title and second title based on the results of the 
similarity of the two fingerprints with a manual calculation of 72%. 

3.2.2. Calculations on the system 

 
Figure 6. The results of the calculation of the winnowing algorithm on the system 

Figure 6 shows the results of the calculation of the system winnowing algorithm with a value of n 
= 6, w = 4, and b = 3, with the results of 72% similarity. 

These results indicate that the calculation of the manual winnowing algorithm and the system 
get the same results, namely 72%. Plagiarism can be grouped according to proportion or 
percentage of sentences or hijacked paragraphs, namely mild plagiarism (<30%), moderate 
plagiarism (30–70%) and severe plagiarism (> 70%) [21] [22]. 


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

81 
 

3.3. Testing with Plagiarism Checker X Tools 

This test was conducted to compare the results of the percentage similarity between the 
systems proposed in this study with tools plagiarism checker X.Plagiarism Checker X is a tool to 
help detect plagiarism in research papers, blogs, assignments, and websites. To find the 
percentage of the title similarity to the X checker plagiarism application is done by side by side 
comparisons by entering the tested title and the comparison title. 

Table 3. The title tested and the comparison title 

No Tested Title Comparative Title 

1. Implementation of Web Scraping 
Techniques on Train Ticket Booking 
Applications 

Implementation of Web Scraping 
Techniques in Airplane Ticket Booking 
Applications 

2. Implementation of RESTful Web Service 
for Election Vote Calculation System 

Implementation of RESTful Web Service 
for Rapid Vote Counting System in Local 
Election 

3. CRM Implementation to Increase Customer 
Loyalty 

Analysis of Electronic CRM Implementation 
at PT Cordova Garment to Increase 
Customer Loyalty 

4. Medical Record Information System at 
RSUD Pacitan General Hospital Based on 
Android 

Medical Record Information System at the 
Regional General Hospital of RSUD 
Pacitan Based on Web Base 

5. Similarity Thesis Detection System using 
Rabin Karp's Algorithm 

Thesis Title Similarity Detection System 
Using Winnowing Algorithms 

6. Scientific Article Search Website by 
Utilizing Google Scholar and Mendeley API 

Website Search for Scientific Articles by 
Utilizing Parscit's Google Scholar and 
Mendeley API 

7. Web Scraping Implementation on 
Ontology-Based Web for Drug Data 

Web Scraping Implementation on 
Ontology-Based Web for Drug Data and 
Disease 

8. Implementation of Customer Relationship 
Management in the Hotel Reservation 
System 

Implementation of Customer Relationship 
Management CRM in a Website and 
Desktop-based Hotel Reservation System 

9. Designing Information Systems for 
competitive advantages of modern 
companies 

Analysis and Design of Information 
Systems for competitive advantages of 
modern companies and organizations 

10. Information System Distribution of 
Information Technology Research Sites in 
Garut 

Designing Geographic Information 
Systems Distribution of Information 
Technology Research Sites in the City of 
Garut 

11. Designing Achievement Decision Selection 
System for Student Achievement 

Designing the Decision Support System for 
the Selection of Outstanding Students 
using the AHP and Promethee Methods 

 
Table 3 is the title data tested and the title data as a comparison so that the percentage value of 
plagiarism will be obtained using the system proposed in the study with tools plagiarism checker 
X. 

Table 4. Similarity percentage comparison 

No. This Research Plagiarism Checker X tools 

1. 72% 89% 
2. 68.75% 67% 
3. 38.89% 0% 
4. 80.65% 86% 
5. 70.37% 88% 
6. 87.5% 92% 
7. 83.87% 85% 
8. 58.97% 62% 
9. 54.29% 58% 


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

82 
 

10. 46.47% 46% 
11. 67.57% 62% 

Average 66.30% 66.82% 

 
Table 4 is the percentage data of the plagiarism value from the comparison between the 
systems proposed in the study with tools plagiarism checker X. The system created has a 
smaller percentage average of 66.30% compared to X plagiarism checker application, with an 
average of 66.82%. 

 
4. Conclusion 

Based on the results of testing in the study conclusions can be drawn, namely; Web scraping 
with CURL and simple HTML DOM parser can be applied to retrieve data from Google Scholar's 
research title on early detection applications for submitting student research titles. Google 
Scholar can be used to obtain other existing research titles as a reference or comparison in 
early detection applications submitting student research titles by applying web scraping as a 
method of retrieving data. Winnowing algorithm can be applied to find the percentage similarity 
of the research title proposed with the existing research title in Google Scholar in the application 
of early detection submission of student research titles. This research is still lacking. Namely, 
the comparative title data source only from Google Scholar and the data compared only to the 
title, can not know the author of the scientific work. Also, the application of the method in this 
study has not been able to detect research titles with different languages. 

  
References 
 

[1]  N. Knock dan R. Davison, “Dealing with Plagiarism in the Information Systems,” MIS 
Quarterly, vol. 27, pp. 511-532, 2003.  

[2]  Mulyana, “Pencegahan Tindak Plagiarisme Dalam Penulisan Skripsi,” Cakrawala 
Pendidikan, 2010.  

[3]  A. Y. Gasparyan, B. Nurmashev, B. Seksenbayev, V. I. Trukhachev, E. I. Kostyukova dan 
G. D. Kitas, “Plagiarism in the Context of Education and Evolving Detection Strategies,” 
Journal of Korean Medical Science, vol. 32, no. 8, pp. 1220-1227, 2017.  

[4]  Google, “Tentang Google Cendikia,” [Online]. Available: 
https://scholar.google.com/intl/id/scholar/ about.html. [Diakses 9 September 2018]. 

[5]  R. Gunawan, A. Rahmatulloh, I. Darmawan dan F. Firdaus, “Comparison of Web Scraping 
Techniques: Regular Expression, HTML DOM and Xpath,” dalam 2018 International 
Conference on Industrial Enterprise and System Engineering (IcoIESE 2018), Atlantis 
Press, 2019.  

[6]  B. G. Dastidar, D. Banerjee dan S. Sengupta, “An Intelligent Survey of Personalized 
Information Retrieval using Web Scraper,” International Journal of Education and 
Management Engineering, vol. 5, no. 3, pp. 24-31, 2016.  

[7]  M. Turland, “php| architect's Guide to Web Scraping with PHP,” Marco Tab ini&Associates, 
2010.  

[8]  D. Stenberg, “CURL: curl groks URLs,” 2015. 

[9]  M. I. Khalid, PHP/CURL Book with Examples Version 1.8, 2006.  

[10]  V. B. Kadam dan G. K. Pakle, “A Survey on HTML Structure Aware and Tree Based Web 
Data Scraping Technique,” International Journal of Computer Science and Information 
Technologies (IJCSIT), vol. 5, no. 2, pp. 1655-1658, 2014.  

[11]  V. Janjic, “PHP Simple HTML DOM Parser: Editing HTML Elements in PHP,” 7 September 
2011. [Online]. Available: https://phpbuilder.com/php-simple-html-dom-parser-editing-html-
elements-in-php/. [Diakses 6 Oktober 2018]. 

[12]  X. Duan, M. Wang dan J. Mu, “A Plagiarism Detection Algorithm based on Extended 
Winnowing,” dalam 2017 International Conference on Electronic Information Technology 
and Computer Engineering (EITCE 2017), 2017.  


LONTAR KOMPUTER VOL. 10, NO. 2 AUGUST 2019                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2019.v10.i02.p02  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

83 
 

[13]  S. Schleimer, D. S. Wilkerson dan A. Aiken, “Winnowing: Local Algorithms for Document 
Fingerprinting,” Proceedings of the ACM SIGMOD international conference on 
management of data, pp. 76-85, 2003.  

[14]  H. Tri Nugroho I, “Pengaruh Algoritma Stemming Nazief-Adriani Terhadap Kinerja 
Algoritma Winnowing Untuk Mendeteksi Plagiarisme Bahasa Indonesia,” ULTIMA 
Computing, vol.9, no. 1, pp. 36-40, 2017.   

[15]  N. Alamsyah, “Perbandingan Algoritma Winnowing dengan Algoritma Rabin Karp untuk 
Mendeteksi Plagiarisme pada Kemiripan Teks Judul Skripsi,” Technologia, vol. 8, no. 3, pp. 
124-134, 2017.  

[16]  I. P. A. Darmawan dan I. N. P. I. P. A. Dharmaadi, “Ekstrak Hirarki Data Dari Situs Web A-Z 
Animals Menggunakan Web Scraping,” Lontar Komputer : Jurnal Ilmiah Teknologi 
Informasi, vol. 8, no. 3, pp. 124-134, 2017.  

[17]  V. Mitra, H. Sujaini dan A. B. Putra Negara, “Rancang Bangun Aplikasi Web Scraping 
Untuk Korpus Paralel Indonesia - Inggris Dengan Metode HTML DOM,” Jurnal Sistem dan 
Teknologi Informasi (JUSTIN), vol. 5, no. 1, pp. 36-41, 2017.  

[18]  Nurdin dan A. Munthoha, “Sistem Pendeteksi Kemiripan Judul Skripsi Menggunakan 
Algoritma Winnowing,” InfoTekJar (Jurnal Nasional Informatika dan Teknologi Jaringan), 
vol. 2, no. 1, pp. 90-97, 2017.  

[19]  I. Ruslan, A. Wibowo dan R. Lim, “Website Penelusuran Artikel Ilmiah dengan 
Memanfaatkan Parscit, Google Scholar, dan Mendeley Api,” Jurnal Infra, vol. 1, no. 2, 
2013.  

[20]  K. Tiara, U. Rahardja dan I. A. Rosalinda, “Pemanfaatan Google Scholar Dan Citation 
Dalam Memenuhi Kebutuhan Pembuatan Skripsi Mahasiswa Pada Perguruan Tinggi,” 
Technomedia Journal (TMJ), vol. 1, no. 1, pp.95113, 2016.  

[21]  S. Sastroasmoro, “Beberapa Catatan tentang Plagiarisme,” Majalah Kedokteran Indonesia, 
vol. 57, no. 8, Agustus, 2007.  

[22]  J. D. Velásquez dan E. M. Taylor, “Tools for External Plagiarism Detection in DOCODE,” 
dalam WI-IAT '14 Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences 
on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014.