Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2022-2025 2022 www.etasr.com Modi et al.: FMFinder: A Functional Module Detector for PPI Networks FMFinder: A Functional Module Detector for PPI Networks Manali Modi Computer Engineering Department Marwadi Education Foundation Rajkot, Gujarat manalim.21@gmail.com Navjyotsinh Jadeja Information Technology Department Marwadi Education Foundation Rajkot, Gujarat noon2night88@gmail.com Kirtirajsinh Zala Marwadi Education Foundation Rajkot, Gujarat kirtirajzala@gmail.com Abstract—Bioinformatics is an integrated area of data mining, statistics and computational biology. Protein-Protein Interaction (PPI) network is the most important biological process in living beings. In this network a protein module interacts with another module and so on, forming a large network of proteins. The same set of proteins which takes part in the organic courses of biological actions is detected through the Function Module Detection method. Clustering process when applied in PPI networks is made of proteins which are part of a larger communication network. As a result of this, we can define the limits for module detection as well as clarify the construction of a PPI network. For understating the bio-mechanism of various living beings, a detailed study of FMFinder detection by clustering process is called for. Keywords-functional modules; protein; PPI network; detection methods; inferring PPI network I. INTRODUCTION One of the major applications of Biotechnology is in the field of Bioinformatics especially when working with organic data. Analysis of biological processes is the primary objective of bioinformatics. The primary research revolves around hereditary connections, structural alignment of proteins, various protein to protein interaction methods and evaluation methods. Large part of bioinformatics is concerned with various biological processes which are part of Protein-Protein Interaction (PPI). PPI network, is a network of proteins interacting with each other to carry out various biological processes inside an organism. Hence it is very important to study and analyze how these protein modules interact to perform and carry out various metabolic activities. Systematical examination of the properties which are concerned with proteins which give concise depictions of consistent structures in wellbeing and diseases is known as Proteomics [1]. Normally, protein infrequently goes about as a solitary isolated component. Proteins, including those in the indistinguishable cell forms, regularly associate with one another to consolidate into an extensive atom to perform the organic capacities. For example, absorption framework, quality outpouring control, cell spread, cell signal transduction, the ways of action and movements of GSC and cell apoptosis depend on PPI. As needs be, the examination of PPI frameworks frequently serves as the reason to a better comprehension of cell affiliation, strategies, and limits and thusly elucidation of protein correspondence which is a central issue in science [1]. II. PPI MAJOR DATASETS The postgenomic period is recognized by the accessibility of colossal measure of organic information sets which are truly heterogeneous in nature and hard to examine. Vast scale PPI network throughput such as tandem affinity purification and yeast two hybrid give associations stable as well as transient in nature [4]. PPI networks also have mass spectrometry showing the protein edifices [4]. These datasets, notwithstanding being inadequate, additionally comprise of false positives, and, thusly, the cooperation found in different information sets may not concur with one another. Owing to this difference, it is basic to make utilization of statistic techniques to induce the PPI arranges by discovering solid and reproducible connections and anticipate the associations not discovered yet in the accessible data. III. INFERRING PPI NETWORK This segment depicts the statistical techniques that are utilized to discover solid and complete protein-protein association systems. The inference of PPI systems can be done in sundry courses, for example, phylogenetic profiling and ID of basic examples. It is to be noticed that in contrast with quality systems, a great deal of work can be flawed in a protein-protein system surmising utilizing the probabilistic strategies. In a living life form, a few proteins cooperate to do different undertakings framing a protein complex. A large sum of Protein-Protein Interaction information compromises the interaction and it is extremely uncommon the discovering communications amongst large protein numbers. Subsequently, recognizable proof of protein buildings is of prime significance to pick up a superior comprehension of the cell system. Distinguishing protein edifices is a crucial zone of investigation of protein systems, for which different grouping techniques were connected. One of the different methods for recognizing Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2022-2025 2023 www.etasr.com Modi et al.: FMFinder: A Functional Module Detector for PPI Networks the protein buildings incorporate diagram division, where the chart is grouped into subgraphs utilizing expense based pursuit calculations. Another methodology is extensively ordered as protection crosswise over species, where arrangement devices are utilized to discover the edifices that are normal in numerous information sets originating from distinctive species. The conclusion of human genome sequencing in turn makes proteomic examination a standout amongst the most critical of life science processes. A systematical investigation procedure in which various properties concerned with are investigated. This investigation helps in representing the structure, capacity and the control of natural frameworks defining health and sickness [1]. At times proteins go about as single disengaged elements. On the other hand, proteins included in the same cell regularly associated with one another to join into an extensive particle. Case in point, the methods and exercises of the hereditary substance duplicate, quality declaration control, cell signal transduction, digestion system, cell proliferation are related to PPI. They are the foundation of organic courses of action occurring in life forms. Consequently, the examination of PPI systems commonly serves as the premise to a superior comprehension of cell association, courses of action, capacities and subsequently to the explanation of protein communication which is a focal issue in science [2]. Last decade saw PPI information dissected by high throughput test systems. Such as two hybrid frameworks, protein chip innovation and mass spectrometry. Established approach to focus protein capacity is to discover homologies between an unannotated protein and other protein utilizing grouping comparability calculation [3]. Numerous coordinated PPI systems have performed identification processes. The huge size of PPI system information becomes a tedious task to effectively distinguish various organic modules as well as essential examination subject in genomic time. There are various organic trial systems to identify functional modules in PPI systems. A. Module Detection Survey [1] The authors have dissected existing issue and present metrics for distance. Also they categorized the overall arrangement of practical module discovery as well as execution of numerous calculations by using known values. Final execution in the current scenario is the investigating of the network system. Essential idea driving the research is to distinguish utilitarian modules from existing network. By organizing and utilizing diverse bunching routines with distinctive calculation. In the current research different problems are introduced to assess the discovery of module and its quality and also counting location system's execution. Parameters are Sensitivity, Precision, F-Measure, Recall, Accuracy and Positive Predictive Value, and p-value measure. This research portrays procedures which recognize utilitarian module in comparison with current network. B. DFM-CIN Algorithm [2] The authors have presented a novel structure which identifies protein complexes. This paper also proposes practical approach by acclimatizing hereditary component statement information into datasets of PPI. DFM-CIN is the proposed method which calculates revelation of useful module in view of the distinguished edifices. Authors developed Protein Protein Interaction network as a part of static systems of TSNs. The proposed structure not only calculates but can also recognize protein complexes and modules. The research findings of the authors recommend functional modules identification with the help of protein complexes. C. Protein Function Prediction Using ANN [3] Authors have built a model by using weighted graph for protein interactions. This acts as a base to put forward and reflect facts related to small world network property. This property filters protein interaction network reliably. In certain situations, individual protein has multiple functions. This makes it an issue of multi labeling in a weighted graph problem. The procedure shows very high reliability amongst the connections of protein in the network. The suggested approach has been tested on MIPS datasets showing high performance in terms of precision and recall while using ANN. D. MOFinder Algorithm for Overlapping Modules [4] MOFinder for large PPI networks is proposed by the authors in [4]. PPI data file is primarily converted into sparse matrix in this approach. Then this sparse matrix is processed with global Approximate Minimum Degree Ordering as well as local AMD. Local sparse matrix and local AMD are generated using sliding window protocol along the diagonal. Clustering coefficient for this matrix is required to be calculated and if found higher than the cutoff then we save the sub modules. Else they are discarded. Finally sliding window goes diagonally to fetch remaining modules. Detection of small modules is not possible with this algorithm. E. Overlapping Modules Mining Using LGT [5] Authors proposed a method in which they have used Line Graph Transformation for discovering utilitarian modules from large network as well as accumulate ones identifying protein module structure. Resulting modules are identified with projected algorithm which shows high scope among fly, yeast, and worm network of proteins. Investigation on yeast protein networks recommends enormous protein modules which have been found with association of capacity annotation, localization as well as buildings blocks of proteins. F. COACH Algorithm [6] In this paper authors promoted COACH method for anticipating constructions with the help of recognizing protein complexes as well as containing protein associations. They measured and investigated protein groups from different viewpoints. From first view, they execute widespread correlation among projected method as well as present approaches with consideration of expected groups in contradiction of targeted protein structures. Second, they admit focal point linking edifices exploiting dissimilar biological evidence as well as learning. G. Fuzzy Clustering Algorithm [7] Authors have proposed a method in which they have secluded the capacity of Q fluffy cluster. This means they have Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2022-2025 2024 www.etasr.com Modi et al.: FMFinder: A Functional Module Detector for PPI Networks applied grouping strategy in order to discern the covering group structure. The research shows the higher efficiency of the new algorithm in detecting appropriate number of clusters and good clustering. H. Functional Module Detection Using Metrices [8] Authors have addressed various problems arising while developing protein protein connections and resultant issues of protein communication information. Hence, they advised for usage of betweenness commonality decomposition for calculating customs edge shared trait and for the recognizing of practical modules from the extensive networks. Fig. 1. Intferring PPI Network IV. PROPOSED FRAMEWORK All the above procedures devised some shortcomings with the reference of quantity of practical module recognition. Hence, FMFinder approach is designed for amending enactment of prevailing procedure. The steps for the proposed flow of the algorithm are shown in Figure 2. For the proposed FMFinder, PPI file in terms of protein network and clustering coefficient 0.45 are the inputs and in turn functional modules from the Human and Yeast database are the production of the protein networks. Figure 3 shows the proposed algorithm FMFinder for functional module detection. V. IMPLEMENTATION AND COMPARISON WITH EXISTING TECHNIQUES Human PPI Dataset and Yeast PPI Dataset were used to perform FMFinder algorithm based implementation. Interacting protein data is stored in these datasets. Human dataset was collected from Human Protein Release Dataset (HPRD). Yeast dataset was collected from Dataset of Interacting Proteins (DIP). 1800 protein interactions are included in the Yeast dataset and 39200 interactions of proteins are found in human PPI network. FMFinder algorithm processes on these datasets to detect functional modules. Analysis and comparison with earlier algorithms and approaches along with their limitations are described in TABLE I. Table II depicts modules identified with its major modules for human dataset. Table III depicts modules identified with its major modules for yeast dataset. From this review it is obvious that FMFinder outflanks when applying to yeast and human datasets for overlapping module detection. This relative study will be helpful for the examination of the diverse algorithm which is valuable for the detection of protein modules. Despite the fact that LPCF has most anticipated proteins and covered proteins it has less utilitarian rate. Figure 5 shows modules that are predicted and covered proteins by the algorithm in Human Database. As shown in the Figure 5 and Table II FMfinder algorithm has highest predicted proteins and covered proteins as compared to other algorithms for human database. Figure 6 shows functional percentage of identifying modules from the large network by different algorithms in Yeast Database. It can easily seen from the Figure 6 and Table III that FMFinder shows highest functional percentage for detecting functional modules for Yeast database. Thus FMFinder has highest accuracy for discovering overlapping modules from PPI complex network. Fig. 2. Proposed Flow of FMFinder Algorithm Fig. 3. Proposed Algorithm Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2022-2025 2025 www.etasr.com Modi et al.: FMFinder: A Functional Module Detector for PPI Networks Fig. 4. Flow of MMD Algorithm TABLE I. ALGORITHMIC ANALYSIS Algorithm Description Limitation MCODE Identifies densely connected group of proteins Detects only connected graphs of proteins within the PPI network DPClus Based on the agglomerate and divisive algorithm Unable to detect overlapping functional modules MOFinder Identifies functional modules, specially overlapping modules Detects only small size of modules, less than 12, from human and yeast database FMFinder Identifies functional modules Detects 256 modules from human database and 109 modules from yeast database. Major size module is 4. TABLE II. FUNCTIONAL MODULES IDENTIFICATION ON HUMAN DATABASE Algorithm Modules Identified Major Module Size MCODE 21 3 DPClus 102 8 MOFinder 221 12 FMFinder 265 4 TABLE III. FUNCTIONAL MODULES IDENTIFICATION ON YEAST DATABASE Algorithm Modules Identified Major Module Size MCODE 15 3 DPClus 54 8 MOFinder 90 3 FMFinder 109 3 Fig. 5. Comparision Graph for Human Database VI. CONCLUSION The current paper portrays different methods which are exploited as a part of reviling functional modules from a large database. We examined each dataset, inferred PPI network, analysis of existing algorithms and its covered proteins and functional percentage of modules which are actually identified from the existing algorithms. Results show that the FMFinder proposed algorithm outperforms previous algorithms (MCODE, DPClus, MOFinder) by detecting more modules of proteins in both human and yeast databases. Fig. 6. Comparision Graph for Yeast Database Fig. 7. Graph Comparison between FMFinder and MOFinder REFERENCES [1] J. Ji, A. Zhang, C. Liu, X. Quan, Z. Liu, “Survey: Functional Module Detection from Protein-Protein Interaction Networks”, IEEE Transaction on Knowledge and Data Engineering, Vol. 26, No. 2, pp. 261-273, 2014 [2] M. Li, X. Wu, J. Wang, Y. Pan, “Towards the Identification of Protein Complexes and Functional Modules by Integrating PPI Network and Gene Expression Data”, BCM Bioinformatics, pp. 1-12, 2012 [3] L. Shi, Y. R. Cho, A. Zhang, “Prediction of Protein Function from Connectivity of Protein Interaction Network”, International Journal of Computational Bioscience, Vol. 1, pp. 1-5, 2010 [4] Q. Yu, G. H. Li, J. F. Huang, “MOfinder: A Novel Algorithm for Detecting Overlapping Modules from Protein-Protein Interaction Network”, Journal of Biomedicine and Biotechnology, Vol. 2012, pp. 1- 10, 2012 [5] ] S. Zhang, H. W. Liu, X. M. Ning, X. S. Zhang, “A hybrid graph- theoretic method for mining overlapping functional modules in large sparse protein interaction networks”, International Journal of Data Mining and Bioinformatics, Vol. 3, No. 1, pp. 68–84, 2009 [6] M. Wu, X. Li, C. K. Kwoh, S. K. Ng, “A core-attachment based method to detect protein complexes in PPI networks”, BMC Bioinformatics, Vol. 10, pp. 1-5, 2009 [7] S. Zhang, R. S. Wang, X. S. Zhang, “Identification of overlapping community structure in complex networks using fuzzy c-means clustering”, Physica A, Vol. 374, No. 1, pp. 483– 4490, 2007 [8] C. Wang, C. Ding, Q. Yang, S. R. Holbrook, “Consistent dissection of the protein interaction network by combining global and local metrics”, Genome Biology, Vol.8, No.12, pp. 1-10, 2007