Mathematical Problems of Computer Science 51, 57-65, 2019. UDC 004.031.42 Realization of Recommender Framework Based on Community Detection Karen K. Mkhitaryan Institute for Informatics and Automation Problems of NAS RA e-mail: karenmkhitaryan@gmail.com Abstract Recommender systems play an important role in suggesting relevant information to users based on their available preferences about items. Utilizing a recommender system allows companies to increase revenues, customer satisfaction and enable personalization and discovery. Content-based and collaborative filtering approaches are the most popular techniques in recommender systems predicting users preferences based on “collaborative” data about users and items in the system. However, their use is not justified in certain applications, particularly when user-item collaboration data is very sparse or missing. In this paper, a recommender framework based on community detection is developed outperforming other popular recommendation methods in some applications. Keywords: Community detection, Recommender systems, Collaborative filtering, Similarity measures. 1. Introduction A recommender system is a type of information filtering system predicting users’ preferences about items based on user-item collected data. People listen to songs, watch movies, read books, use products, etc., and express their opinions by giving ratings, liking, disliking or reviewing items online more than ever. In recent years, various recommendation techniques and algorithms were proposed to generate recommendations based on collected user-item interaction data. Effective recommender systems increase revenues, customer satisfaction, enable personalization and discovery. Several real world examples of recommender systems are Last.fm, recommending songs, Amazon, recommending products to buyers and many more recommending people, movies, books, etc. Despite the existence of many recommendation algorithms, there are some applications, where traditional recommendation approaches are not beneficial. This is the case when users 57 Realization of Recommender Framework Based on Community Detection 58 cannot explicitly rate the items, thus “collaborative” data is not available, while content-based and collaborative filtering approaches rely only on user-item rating data. In this paper, we propose a framework based on community detection that can generate recommendations in such scenarios. Community detection is a research area from network science dealing with the investigation of complex networks aiming to detect communities of nodes that are densely connected inside the community and sparsely connected with nodes from other communities. Detection of communities has many applications in various disciplines, such as investigating protein-to- protein interactions in biology, friendship circles in social network analysis, discovering fraudulent websites on the web, etc. Several successful attempts have been made in the literature to incorporate community detection tools and techniques into recommender systems. Abdrabbah et al. introduced a novel architecture called Dynamic Community-based Collaborative filtering (D2CF), combining both collaborative filtering and dynamic community detection techniques. Experiments on MovieLens dataset showed that the proposed technique has considerably higher recommendation accuracy outperforming the methods based on static community detection and item-based collaborative filtering [1]. Lalwani et al. presented a social recommender system based on community detection and collaborative filtering techniques (CCSRS) using a map-reduce framework [2]. The proposed approach improves scalability, coverage and cold start issue compared with item- based collaborative filtering. There are also other papers pointing out the possible interconnections of community detection and recommender systems [3], [4], [5]. The recommender framework suggested in this paper is based only on community detection therefore allows to apply it in the situations when other methods are useless. The paper is constructed as follows: in Section 2 we discuss traditional recommendation approaches, introduce the framework in Section 3 and show experimental results in Section 4. 2. Recommender System The analysis of data generated by users can be used to predict preferences or ratings a user will give to an item that he/she did not interact in the past. The most popular recommendation approaches are content-based, collaborative and hybrid filtering techniques (Fig.1). In content-based filtering, terms or keywords (e.g., music genre, article keywords) are used to describe an item and the user profile. Content-based recommendation algorithms learn the features of items the user liked in the past and recommend items that share similar features or characteristics. The advantage of this method over collaborative filtering is that it does not need other user’s ratings and avoids the cold start problem (new items or users in the system for which the information is missing) [6]. The idea behind collaborative filtering is to use the “collaboration” data between users and items in the system to generate recommendations. Memory-based and model-based approaches are two types of collaborative filtering technique [7]. Memory-based approach takes the user-item rating matrix 𝑀𝑀 as input, where each 𝑀𝑀𝑢𝑢𝑢𝑢 element represents the rating that the user 𝑢𝑢 gave to item 𝑖𝑖 and predicts the missing ratings in the matrix. User-user filtering and item-item filtering are two realizations of memory-based approach. In user-user filtering, items that users similar to the given user 𝑢𝑢 like, are recommended to user 𝑢𝑢, while in item-item filtering recommendation is made using the preferences of users about other items that also liked an item the user 𝑢𝑢 liked. K. Mkhitaryan 59 Fig. 1: Types of recommender system algorithms. For user or item comparison, various similarity measures are used, such as Euclidean distance, cosine similarity, Minkowski distance, Manhattan distance, etc. To predict the missing rating 𝑀𝑀𝑢𝑢𝑢𝑢 in matrix 𝑀𝑀 using user-user filtering we first find the weighted sum of ratings from other users � 𝑠𝑠𝑖𝑖𝑠𝑠(𝑢𝑢, 𝑞𝑞)𝑀𝑀𝑞𝑞𝑢𝑢 𝑞𝑞∈𝑈𝑈 , where 𝑈𝑈 is the set of users in the system, 𝑠𝑠𝑖𝑖𝑠𝑠(𝑢𝑢, 𝑞𝑞) is the similarity score between the users 𝑢𝑢 and 𝑞𝑞, and 𝑀𝑀𝑞𝑞𝑢𝑢 is the rating the user 𝑞𝑞 gave to item 𝑖𝑖. Finally, the predicted rating is obtained by normalizing the above weighted sum: 𝑀𝑀𝑢𝑢𝑢𝑢 = ∑ 𝑠𝑠𝑖𝑖𝑠𝑠(𝑢𝑢, 𝑞𝑞)𝑀𝑀𝑞𝑞𝑢𝑢𝑞𝑞∈𝑈𝑈 ∑ |𝑠𝑠𝑖𝑖𝑠𝑠(𝑢𝑢, 𝑞𝑞)|𝑞𝑞∈𝑈𝑈 . Item-item filtering uses the same idea but instead of user similarities, item similarities need to be calculated. Model-based approaches use clustering, matrix factorization and deep learning techniques to make recommendations. Well-known 𝐾𝐾-means clustering technique can be used to find the top 𝑘𝑘 similar users or items from user-item rating matrix to predict the unknown ratings. Neural networks from deep learning can also be used to build recommender systems [8]. There are also hybrid recommender systems that combine both memory-based and model-based approaches. Although they are the most popular techniques in the literature, they have some common drawbacks in real life applications listed below: Realization of Recommender Framework Based on Community Detection 60  Collaborative filtering Data sparsity: Users do not rate too many items, therefore user-item rating data is sparse containing many unknown ratings. Scalability: With millions or billions of users or items, it is a computationally extensive task to run recommendation algorithms.  Content-based filtering Over-specialization: Recommending items that are very similar to what the user liked, limits the novelty, and the recommended item is not interesting to the user. New user problem: When a new user enters the system or the user profile is empty, it is a difficult task to predict his/her preferences. 3. Community Detection-Based Recommender Framework There are some applications when it is not possible to obtain the user-item rating data or to compare the items in the system. In that case, collaborative and content-based approaches will fail to make recommendations. We develop a community detection recommender framework that is able to generate recommendations in scenarios when user-item rating data is very sparse or missing and/or item comparison cannot be accomplished using community detection techniques. The main steps of the framework to generate recommendations to the target user are as follows: Step 1 - Feature identification. To be able to quantitatively compare users in the system and obtain the similarity scores, feature identification is needed. Features represent attributes about users that can be acquired in different ways, such as survey questions, past user records, etc. Step 2 - Ground-truth network construction. Based on predefined features, users can be compared with each other using similarity measures designed for user or item comparison. To make recommendation to the target user, we first need a ground-truth user data, i.e., users that have already shown their preferences. Using similarity scores a weighted ground-truth network is constructed, where nodes and weighted edges represent the users in the ground-truth and similarities between them, respectively. Step 3 - Community detection algorithm validation. In the literature of community detection, various algorithms were designed for weighted networks. In [9] we studied the subsequent state- of-the-art algorithms that will be applied in this paper: multilevel modularity optimization [10], fast greedy modularity optimization [11], Infomap [12], Walktrap [13], Label propagation [14] and Edge betweenness [15]. The algorithm takes the ground-truth network as input and partitions it into a community structure. In order to validate the results of algorithms, their detected community structures are compared with a real community structure, i.e., communities based on preferences of users. For evaluation, there are also well-designed measures for network partition comparison. A new modified 𝜒𝜒2-divergence measure was suggested [16], where its advantages are discussed. We use this measure to validate an algorithm for the next step by comparing its detected community structure with the real one. Step 4 - Network updating including the target user. In this step, a new user enters the system (in the network) and based on predefined features, similarities between the new user and the users in the ground-truth are calculated using the similarity measures. Next the ground-truth network is updated including the target user. Step 5 - Community detection and recommendation. The validated community detection algorithm is applied to an updated network, and the target user is assigned to a community with users sharing similar preferences. Recommendation to the target user is made considering the K. Mkhitaryan 61 preferences of users from the same community. Items that are mostly preferred by the users in the community are recommended to the target user. 4. Experimental Results Consider an example of recommender system using the proposed framework that recommends professions or career paths to people. Traditional recommendation techniques will require preference data of users about professions in order to generate recommendations, while in reality it is impossible for a person to deal with many professions to share preference. Developed framework considers the similarities of people rather than their preferences and generates recommendations based on the preference of other people being similar to the target person. Experiment is implemented on an artificially created dataset (ground-truth) containing 100 users and their answers to 100 survey questions. We considered that an answer to a question is an integer value from 1 (strongly disagree) to 5 (strongly agree). Let 𝑄𝑄 be the generated matrix containing 100 rows (users) and 100 columns (number of questions), where each 𝑄𝑄𝑢𝑢𝑖𝑖 ∈ {1,2,3,4,5} element represents the answer that the user 𝑖𝑖 gave to the question 𝑗𝑗. We also consider that the ground-truth network has 10 communities, and each community contains 10 people. Next, the ground-truth network is constructed where nodes and edges represent the users and similarity scores between them, respectively. We used adjusted cosine similarity to measure the similarity between two users defined as: 𝑠𝑠𝑖𝑖𝑠𝑠(𝑢𝑢, 𝑣𝑣) = ∑ (𝑄𝑄𝑢𝑢𝑢𝑢 − 𝜇𝜇𝑢𝑢)(𝑄𝑄𝑣𝑣𝑢𝑢 − 𝜇𝜇𝑣𝑣)𝑢𝑢 �∑ (𝑄𝑄𝑢𝑢𝑢𝑢 − 𝜇𝜇𝑢𝑢)2𝑢𝑢 �∑ (𝑄𝑄𝑣𝑣𝑢𝑢 − 𝜇𝜇𝑣𝑣)2𝑢𝑢 , where 𝑄𝑄𝑢𝑢𝑢𝑢 and 𝑄𝑄𝑣𝑣𝑢𝑢 are the answers of users 𝑢𝑢 and 𝑣𝑣 to question 𝑖𝑖, and 𝜇𝜇𝑢𝑢 and 𝜇𝜇𝑣𝑣 are the average of all the answers to 100 questions of users 𝑢𝑢 and 𝑣𝑣, respectively. In Fig. 2 (left) the constructed ground truth network is shown, composed of 100 user nodes and 5050 edges, where the weight of an edge between users 𝑢𝑢 and 𝑣𝑣 is the similarity score obtained using adjusted cosine similarity, 𝑠𝑠𝑖𝑖𝑠𝑠(𝑢𝑢, 𝑣𝑣). Next six community detection algorithms, discussed in the previous section, are applied to partition the network into a community structure. Let 𝑋𝑋 = (𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑁𝑁) and 𝑌𝑌 = (𝑦𝑦1, 𝑦𝑦2, … , 𝑦𝑦𝑁𝑁) be two partitions of the network with 𝑁𝑁 nodes into 𝐾𝐾𝑋𝑋 and 𝐾𝐾𝑌𝑌 number of communities, respectively, where each 𝑥𝑥𝑢𝑢 ∈ {1, … , 𝐾𝐾𝑋𝑋} and 𝑦𝑦𝑢𝑢 ∈ {1, … , 𝐾𝐾𝑌𝑌}, 𝑖𝑖 ∈ {1, … , 𝑁𝑁} denotes the community label of node 𝑖𝑖 in partitions 𝑋𝑋 and 𝑌𝑌, respectively. An information theoretic measure is called modified 𝜒𝜒2-divergence defined as: 𝑀𝑀𝑀𝑀𝜒𝜒2(𝑋𝑋, 𝑌𝑌) = 1 − ∑ 𝑝𝑝 2(𝑥𝑥, 𝑦𝑦) 𝑝𝑝(𝑥𝑥)𝑝𝑝(𝑦𝑦)𝑥𝑥,𝑦𝑦 − 1 max(𝐾𝐾𝑋𝑋, 𝐾𝐾𝑌𝑌) − 1 , where 𝑝𝑝(𝑥𝑥, 𝑦𝑦) is the joint probability function of 𝑋𝑋 and 𝑌𝑌, and 𝑝𝑝(𝑥𝑥) and 𝑝𝑝(𝑦𝑦) are the marginal probability distribution functions of 𝑋𝑋 and 𝑌𝑌, respectively, and can be used to quantitatively validate the community structure detected by an algorithm, i.e., which one is most similar to real community structure. The overview of community structures detected by all six algorithms is given in Table 1. Thus, the closest network partition to real network partition is detected by the fast greedy modularity optimization algorithm shown in Fig. 2 (right). Realization of Recommender Framework Based on Community Detection 62 Fig.2: Weighted network composed of 𝟏𝟏𝟏𝟏𝟏𝟏 user nodes and 𝟓𝟓𝟏𝟏𝟓𝟓𝟏𝟏 weighted edges (left) and community structure detected by fast greedy modularity optimization algorithm (right). Table 1: Overview of community structures detected by six algorithms. Algorithm No of communities Modified 𝝌𝝌𝟐𝟐-divergence Multilevel modularity optimization 10 0.903 Fast greedy modularity optimiation 10 0.899 Infomap 1 1 Walktrap 1 1 Label propagation 100 0.909 Edge betweenness 94 0.903 To recommend a profession to a person who has already answered 100 questions, the similarities between the target user and other users in the ground-truth network are calculated resulting in a construction of a new network composed of 101 nodes and 5151 edges. Fast greedy modularity optimization algorithm validated on ground-truth network is applied to the new network and detected 10 communities, where the 101th target user lies in a community of 17 users (Fig. 3). Finally the profession selected by the majority of users among the 17 users is recommended to the target user. K. Mkhitaryan 63 Fig. 3: Community structure of a new network including the target user (circled in white). 5. Conclusion In this paper, we developed a community detection-based recommender framework that is able to make recommendations when user-item rating data is very sparse or missing, and/or items cannot be compared with each other. The framework was implanted into an artificially generated dataset to demonstrate how it can be used. For the work of the recommender system a real life data is needed. References [1] S. B. Abdrabbah, R. Ayachi and N. B. Amor, "Collaborative filtering based on dynamic community detection," in DYNAK'14 Proceedings of the 2nd International Conference on Dynamic Networks and Knowledge Discovery, 2014. [2] D. Lalwani, D. V. L. N. Somayajulu and P. R. Krishna, "A community driven social recommendation system," in 2015 IEEE International Conference on Big Data (Big Data), DOI:10.1109/BigData.2015.7363828, 2015. [3] W. Deng, R. Patil, L. Najjar, Y. Shi and Z. Chen, "Incorporating community detection and clustering techniques into collaborative filtering model," Procedia Computer Science, vol. 31, pp. 66-74, 2014. [4] F. Gasparetti, G. Sansonetti and A. Micarelli, "Community detection and recommender systems," Encyclopedia of Social Network Analysis and Mining, pp. 1-14, 2017. [5] D. Deshmuk and D. D. R. Ingle, "A community detection and recommendation system," International Research Journal of Engineering and Technology (IRJET) , vol. 4, no. 7, pp. Realization of Recommender Framework Based on Community Detection 64 752-757, 2017. [6] M. J. Pazzani and D. Billsus, "Content-based recommendation systems," The Adaptive Web, vol. 4321, pp. 325-341, 2007. [7] X. Su and T. M. Khoshgoftaar, "A survey of collaborative filtering techniques," Advances in Artificial Intelligence, vol. 2009, no. 421425, 2009. [8] L. Zhang, T. Luo, F. Zhang and Y. Wu, "A recommendation model based on deep neural network," IEEE Access, vol. 6, pp. 9454 - 9463, 2018. [9] J. Mothe, K. Mkhitaryan and M. Haroutunian, "Community detection: comparison of state of the art algorithms," Proceedings of International Conference Computer Science and Information Technologies, CSIT, Yerevan, Armenia, pp. 252-258, 2017. [10] V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of Statistical Mechanics: Theory and Experiment, DOI: https://doi.org/10.1088/1742-5468/2008/10/P10008, 2008. [11] A. Clauset, M. E. J. Newman and C. Moore, "Finding community structure in very large networks," Phys. Rev. E, vol. 70, no. 066111, 2004. [12] M. Rosvall, D. Axelsson and C. T. Bergstrom, "The map equation," The European Physical Journal Special Topics, vol. 178, no. 1, pp. 13-23, 2009. [13] P. Pons and M. Latapy, "Computing communities in large networks using random walks," International Symposium on Computer and Information Sciences, pp. 284-293, 2005. [14] U. N. Raghavan, R. Albert and S. Kumara, "Near linear time algorithm to detect community structures in large-scale networks," Phys. Rev. E, vol. 76, no. 036106, 2007. [15] M. Girvan and M. E. J. Newman, "Community structure in social and biological networks," PNAS, vol. 99, no. 12, pp. 7821-7826, 2002. [16] M. Haroutunian, K. Mkhitaryan and J. Mothe, "f-Divergence measures for evaluation in community detection," in Workshop on Collaborative Technologies and Data Science in Smart City Applications (CODASSCA 2018), Yerevan, pp. 137--144, 2018. Submitted 14.01.2019, accepted 16.04.2019. Խորհրդատվական միջավայրի մշակում համայնքների հայտնաբերման միջոցով Կարեն Կ. Մխիթարյան ՀՀ ԳԱԱ Ինֆորմատիկայի և ավտոմատացման պրոբլեմների ինստիտուտ e-mail: karenmkhitaryan@gmail.com Ամփոփում Խորհրդատվական համակարգերը օգտատերերին ինֆորմացիա են առաջարկում` հիմնվելով նրանց վերաբերյալ առկա տվյալների վրա։ Խորհրդատվա- կան համակարգերի կիրառումը հնարավորություն է տալիս կազմակերպու- թյուններին ավելացնել եկամուտները, ընդլայնել ցանցը, ինչպես նաև խթանել https://doi.org/10.1088/1742-5468/2008/10/P10008 K. Mkhitaryan 65 օգտատերերի համար նոր ինֆորմացիայի բացահայտումը։ Բովանդակային և համագործակցային ֆիլտրման մոտեցումները ամենահաճախ օգտագործվող մեթոդներն են խորհրդատվական համակարգերում, որոնք կանխագուշակում են օգտատերերի նախընտրությունները։ Սակայն այդ մեթոդները պիտանի չեն որոշ կիրառություններում, մասնավորապես, երբ համագործակցային տվյալները քիչ են կամ բացակայում են։ Այս աշխատանքում մշակվել է համայնքների հայտնաբերման վրա հիմնված խորհրդատվական միջավայր, որը որոշ կիրառություններում ունի առավելություն խորհրդատվական այլ մոտեցումների համեմատ։ Բանալի բառեր` համայնքների հայտնաբերում, խորհրդատվական համակարգեր, համագործակցային ֆիլտրում, նմանության չափեր։ Реализация рекомендательной среды на основе обнаружения сообществ Карен К. Мхитарян Институт проблем информатики и автоматизации НАН РА e-mail: karenmkhitaryan@gmail.com Аннотация Рекомендательные системы предлагают информацию пользователям на основе доступных данных. Использование рекомендательных систем позволяет организациям увеличивать свои доходы, расширять сеть и способствовать открытию новой информации для пользователей. Подходы контентной и коллаборативной фильтрации являются наиболее популярными методами в рекомендательных системах, которые прогнозируют предпочтения пользователей. Однако эти методы не подходят для некоторых приложений, особенно когда коллаборативные данные отсутствуют или их очень мало. В этой работе разработана рекомендательная среда на основе обнаружения сообществ, которая в некоторых приложениях имеет преимущества перед другими подходами. Ключевые слова: обнаружение сообществ, рекомендательная система, коллаборатив- ная фильтрация, меры сходства. References