


Proceedings of Engineering and Technology Innovation, vol. 7, 2017, pp. 31 - 36 

 
Using Text Mining to Extract Issues for School: an Empirical Study of the 

Social Platform-Dcard 

Hsin-Yi Wang, Shu-Fen Chiou
*
, Jung-Wen Lo 

Department of Information Management, National Taichung University of Science & Technology, Taichung, Taiwan, ROC. 

Received 18 July 2017; received in revised form 15 August  2017; accept ed 30 August  2017 

 
Abstract 

Nowadays, social network within sentiment analysis has become the main trend in text mining domain. There are 

many platforms have been analyzed, such as Facebook, Twitter, Instagram, and so on. In our manuscript, we attempt 

to extract the information about the sentiment polarity of messages (positive, neutral or negative) in a social platform 

“Dcard”. The users of Dcard are Taiwanese college students, and anonymou s post is being used this  in social platform, 

therefore, the user can express their opinion more freedom. We use Dcard to the sentiment polarity of messages in 

extract the information about the school; moreover, the school could get the feedback from this finding to improve their 

policy. In this paper, we used python to scrap the web page, and the sentiment lexicon would be built. 

 
Keywords: text mining, big data, social platform, sentiment 

 
1. Introduction 

Nowadays, Internet is used for communication widely. People used the internet to be browsing the web and collecting data 

(79.9%), using community websites (24.1%), playing online games (19.4%), listening to music or watching movies (17.8%), (15%) 

shopping online (10.3%) [1]. For community websites, people prefer communicating via the Internet Services  over talking 

face-to-face or writing letters. They are more often writing blogs or posting messages on social networks and the personality will 

be presented by habitual vocabularies they used. In this research, we t ry to s analyzed Chinese vocabularies on a social platform 

named Dcard [2].   

With the accumulation of large amounts of educational data, the use of advanced statistical techniques, such as data mining, 

exploring the potentially useful information or realizing some knowledge from bunch of data [3]. Another important material for 

structured data is unstructured data composed of free text ; for example, scientific research papers, patented technical documents, 

qualitative interview data, and open questionnaires the content of the analysis, etc. with the analysis or research value. Analysis 

of these unstructured data exploration methods, relying on the free text of the advanced processing and statistical operation s, 

that is, text mining technology[4].  

Dcard anonymous posting mechanism has successfully attracted tens of thousands of domestic and foreign college 

students to become platform members, this  research is using Dcard community behavior of the Chinese corpus , the part of the 

analysis  from the Dcard text behavior can be analyzed everyone Comment on the community on the web.  

                                                                 
* Corresponding author. E-m ail address: fen057@ nutc.edu.tw  

Tel.: +886-4-22196391
 

 Proceedings of Engineering and Technology Innovation, vol. 7, 2017, pp. 31 - 36 

Copyright ©  TAETI 

32 

In this paper, we use the word surveying technology to analyze the administrative quality of the campus in order to avoid 

the loss of the students, but also to attract the resources and  improve the quality of education. The results data can be used as 

reference for the follow-up research reference and campus administrators. 

2. Related Works 

In this section, we first introduce the technique which we use to extract Chinese vocabularies. And we describe the social 

platform named Dcard 

2.1.   Python 

Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum 

and first released in 1991. An interpreted language, Python has a design philosoph y which emphasizes code readability (notably 

using whitespace indentation to delimit code blocks rather than curly braces or keywords), and a syntax which allows 

programmers to express concepts in fewer lines of code than possible in languages such as C++ or Java. The language provides 

constructs intended to enable writing clear programs on both a small and large scale [5].  

2.2.   Dcard 

Dcard is the first exclusive college student dating site platform in Taiwan. The official website is show in Fig.  1 [6].  The 

Dcard platform attracts many college students to join to expand their circle of friends, but to join must provide school e -mail, with 

the real name and photos through the administrator can be certified . Dcard using anonymous system to register, many s tudents 

published articles to take anonymous system to express their own emotions, and these acts through the text presented in the 

article. 

 
Fig. 1 Dcard website 

2.3.   Text mining 

In a large number of data, there are digital structure of the structured  data and text, sound, image of the unstructured data. 

There are already hundreds of ways to deal with structured data. [7] 

Text mining from biological literature is emerging as one of the main issues in bioinformatics research, and NLP methods are 

regarded as being useful to raise the potential of text mining from this literature. While the techniques are  separated relatively into 

domain-portable, reference materials, except for Corpora. [8]. 


Proceedings of Engineering and Technology Innovation, vol. 7, 2017, pp. 31 - 36 

Copyright ©  TAETI 

33 

Text mining is a new field, the text of the survey can be collec ted from the text of meaningful messages, for the file can also 

be analysed out of his specific purpose of the target, the text is undiscovered, invisible, and difficult in the algorithm. H owever, 

in modern culture, the document is the most common tool for the formal exchange of messages. The field of textual exploration 

usually involves a document whose function is to convey a factual message or opinion. [9] 

The steps of knowledge exploration : [10] 

 Data collection 

 Data cleansing 

 Data conversion 

 Application of exploration technology 

 The results are presented and interpreted  

The methods of knowledge exploration: 

 Association analysis  

 Classification 

 Clustering 

 Summarization 

 Prediction 

 Sequence analysis  

3. Our Proposed Method 

The work presented in this article covers, on the one hand, the extraction of information about user’s 

positive/neutral/negative sentiments from the text title they write. We investigate the detection of sentiment changes with 

respect to the “usual” sentiment of each us er [11]. In Fig. 2, we present the methods ’ flowchart with each purpose below: 

 
Fig. 2 Method 

Table 1 Marking 

Positive 其實我們的學校真的挺不錯的(Actually  our school is really  good) 

Negative 畢業證書之學校爛行政(Graduation certificate of the school rotten administration) 

Neutral 問學校語言學習軟體(Ask for school language learning software) 
 

 Proceedings of Engineering and Technology Innovation, vol. 7, 2017, pp. 31 - 36 

Copyright ©  TAETI 

34 

Table 2 Dcard to word 

Web Text 

  
Table 3 Preprocessing 2 

Before the process After the process 

【沒朋友 徵室友，我又來了】 
(【No friends levy  roommate, I have come】) 

沒朋友徵室友我又來了 
(No friends levy  roommate, I have come) 

Table 4 Keywords 

Main Classification Keyword 

Course 課程 課名 老師 
Administration 足球隊(Football team) 行政(Administrative) 校園(Camp us) 

 
Step 1: We need data collection and marking. This research using the article title to classification to Positive , Neutral and 

Negative includes school course and administration tow part. We give an example in Table 1. 

Step 2: We use Python to extract Dcard website to the text content. We extract Dcard website by using Python API URL. We 

extract Dcard website convert to text content, combine and export to document. Table 2 are our examples. 

Step 3: Sorting out the data we export, and remove the extra number and English letter (the web content which is not created by 

the Dcard user), and convert the full shape character to half shape character. 

Step 4: Remove the punctuation in the data (e.g. #, semicolon, comma and space) like Table 3. 

Step 5: Create Keywords. Classification the pre-process data we created for the school course and administration. For an 

example in Table 4, we set the course and its name for the school course partition, and the “soccer team” for 

administration due to the students usually use 

4. Data Analysis and Results 

4.1.   Data 

We collect the Dcard information from 2014 to 2017 in Table 5. The results of the original data have number of 24939. The 

data have been 4941 after pre-process  finishing. We the separate these into "Administration", "Courses" and "Other" three 

classifications :  

Table 5 Data 

Main Classification Data 

Administration 1098 

Courses 729 

Other 3114 

Table 6 Keywords dictionary 

Main 
Classification 

Keywords 

Administration 

行政 
(Administrative) 

足球隊 
(Football team) 

停車場 
(Parking lot) 

學校 
(School) 

校園 
(Camp us) 

系學會 
(Dep artment of Science) 

智慧大師 
(M aster of wisdom) 

 
Courses 
課 

(Class) 
課程名稱 

(Class name) 
老師名稱 
(Teacher) 

RS 
(RS) 

多益 
(TOEIC) 

科系 
(Dep artment) 

 
Proceedings of Engineering and Technology Innovation, vol. 7, 2017, pp. 31 - 36 

Copyright ©  TAETI 

35 

Table 7 Polarity 

 Administration Courses  

Positive 251 211 

Negative 321 215 

Neutral 526 303 

4.2.   Keyword dictionary 

In the classification keywords shown in Table 6, we found that students had certain classification of keywords . For instance, 

students often use the "足球隊" to describe the attitude of the campus administration. The following keywords which are 

usually being used: 

4.3.   Data polarity 

We will separate relatively into positive, negative and neutral, the following is a review of a variet y of data classification   

number: 

5. Discussion 

In Fig. 3, studies have investigated the title Dcard reviews, we found that 22% of students expressed  the article ideas for the 

administration, 15% of people have questions or ideas for courses in Dcard, and the others are expressed Other deman ds. 

  
Fig. 3 Data Fig. 4 Administrative 

In Fig. 4, we also found that students in the comments can really achieve the administration and course, there are 1098 

administrative from administration. 251 is the positive title, 321 negative title and 526 neutral title. 52 percent of the students 

published their perspective, point of view and opinions on the Dcard, 29% of whom expressed negative title, which showed that 

students were less likely to be administratively in school (Including school systems). 

 
Fig. 5 Course 

There are 729 titles in the course, with 211 positive titles, 215 negative titles and 303 neutral titles, which shows that st udents 

have 58% of the ideas and suggestions in the course or teacher, and the 42% of the people have other problem in course. We can 

see the results in Fig. 5. 


 Proceedings of Engineering and Technology Innovation, vol. 7, 2017, pp. 31 - 36 

Copyright ©  TAETI 

36 

6. Conclusions 

This research explores students' perceptions of campus quality, extracts web pages, use keywords and the terminology of 

classified sentences to achieve the desired results. Use the positive and negative sentences to explore the ideas, perspectives, 

point of views  of students on the administration and the curriculum the negative title, it can be extracted separately review and 

achieve to improvement the purpose campus administration and curriculum. 

Furthermore, future research can be published on Dcard and use the text to explore the students for the campus 

administration ideas and recommendations  make research more in-depth can improve the improvement. There are two limitations 

to this paper, which can be discussed in subsequent studies : 

1. Research methods can only see the school administration is missing, but cannot see what kind of administrative issues  are 

2. Title cannot tell what kind of problem, is it can only know the advantages and disadvantages 

References 

[1] S. W. Wu, “Applying text mining technique to analyze the software quality characteristics of mobile games, ” In department 

of industrial management, National Pingtung University of Science and Technology , pp. 1-49, 2013. 

[2] X. Z. Chang and Y. M. Li, “Using text mining to predict personality based on social behavior,” 2012. 

[3] R. S. Baker and K. Yacef, “The state of educational data mining in 2009: a review and future visions ,” Journal of Educational 

Data Mining, vol. 1, no. 1, pp. 3-17, 2009. 

[4] Y. H. Tseng and Y. I. Lin, “The application of content mining techniques to the analysis of educational evaluation research 

trends ,” Journal of Research in Education Sciences, vol. 56, no. 1, pp. 129-166, 2011. 

[5] G. van Rossum, Python, https://zh.wikipedia.org/wiki/Python#.E5.8F.82.E8.80.83.E6.96.87.E7.8C.AE, 1991. 

[6] C. Y. Jian, “Dcard,” https://www.dcard.tw/, 2011. 

[7] C. M. Young, “Applies by text mining to the support systems for coding ICD-9-CM —a study of admission note and 

discharge summary,” Taipei Medical University Information Management, pp. 1-68, 2004. 

[8] J. D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, “GENIA corpus —a semantically annotated corpus for bio -textmining,” 

Bioinformatics, vol. 19, pp. i180-i182, 2003. 

[9] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” 

ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp. 10-18, 2009. 

[10] Y. H. Tseng, “Research and development on automatic information organization and subject analysis in recent decades ,” 

Journal of Educational Media and  Library Sciences , vol. 51, pp. 3-26, 2014. 

[11] A. Ortigosa, J. M. Martin, and R. M. Carro, “Sentiment analysis in Facebook and its application to e-learning,” Computers in 

Human Behavior, vol. 31, pp. 527-541, 2014.