COMPARATIVE METAGENOMICS ANALYSIS OF PALM OIL MILL EFFLUENT (POME) USING THREE DIFFERENT BIOINFORMATICS PIPELINES IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. COMPARATIVE METAGENOMICS ANALYSIS OF PALM OIL MILL EFFLUENT (POME) USING THREE DIFFERENT BIOINFORMATICS PIPELINES ADIBAH PARMAN1, MOHD NOOR MAT ISA2, FARAH FADWA BENBELGACEM1, IBRAHIM ALI NOORBATCHA1 AND HAMZAH MOHD. SALLEH3 1Bioprocess & Molecular Engineering Research Unit (BPMERU), Department of Biotechnology Engineering, Kulliyyah of Engineering, International Islamic University Malaysia, 2Malaysia Genome Institute, Jalan Bangi, 43000 Kajang, Selangor, Malaysia 3International Institute for Halal Research and Training (INHART), International Islamic University Malaysia, P.O. Box 10, 50728 Kuala Lumpur, Malaysia. *Corresponding authors: ibrahiman@iium.edu.my, hamzah@iium.edu.my (Received: 9th March 2018; Accepted: 25th Jan 2019; Published on-line: 1st June 2019) https://doi.org/10.31436/iiumej.v20i1.909 ABSTRACT: The substantial cost reduction and massive production of next-generation sequencing (NGS) data have contributed to the progress in the rapid growth of metagenomics. However, production of the massive amount of data by NGS has revealed the challenges in handling the existing bioinformatics tools related to metagenomics. Therefore, in this research we have investigated an equal set of DNA metagenomics data from palm oil mill effluent (POME) sample using three different freeware bioinformatics pipelines’ websites of metagenomics RAST server (MG-RAST), Integrated Microbial Genomes with Microbiome Samples (IMG/M) and European Bioinformatics Institute (EBI) Metagenomics, in term of the taxonomic assignment and functional analysis. We found that MG-RAST is the quickest among these three pipelines. However, in term of analysis of results, IMG/M provides more variety of phylum with wider percent identities for taxonomical assignment and IMG/M provides the highest carbohydrates, amino acids, lipids, and coenzymes transport and metabolism functional annotation beside the highest in total number of glycoside hydrolase enzymes. Next, in identifying the conserved domain and family involved, EBI Metagenomics would be much more appropriate. All the three bioinformatics pipelines have their own specialties and can be used alternately or at the same time based on the user’s functional preference. ABSTRAK: Pengurangan kos dalam skala besar dan pengeluaran data ‘next-generation sequencing’ (NGS) secara besar-besaran telah menyumbang kepada pertumbuhan pesat metagenomik. Walau bagaimanapun, pengeluaran data dalam skala yang besar oleh NGS telah menimbulkan cabaran dalam mengendalikan alat-alat bioinformatika yang sedia ada berkaitan dengan metagenomik. Justeru itu, dalam kajian ini, kami telah menyiasat satu set data metagenomik DNA yang sama dari sampel effluen kilang minyak sawit dengan menggunakan tiga laman web bioinformatik percuma iaitu dari laman web ‘metagenomics RAST server’ (MG-RAST), ‘Integrated Microbial Genomes with Microbiome Samples’ (IMG/M) dan ‘European Bioinformatics Institute’ (EBI) Metagenomics dari segi taksonomi dan analisis fungsi. Kami mendapati bahawa MG- RAST ialah yang paling cepat di antara ketiga-tiga ‘pipeline’, tetapi mengikut keputusan analisa, IMG/M mengeluarkan maklumat philum yang lebih pelbagai bersama peratus identiti yang lebih luas berbanding yang lain untuk pembahagian taksonomi dan IMG/M 1 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. juga mempunyai bacaan tertinggi dalam hampir semua anotasi fungsional karbohidrat, amino asid, lipid, dan koenzima pengangkutan dan metabolisma malah juga paling tinggi dalam jumlah enzim hidrolase glikosida. Kemudian, untuk mengenal pasti ‘domain’ terpelihara dan keluarga yang terlibat, EBI metagenomics lebih bersesuaian. Ketiga-tiga saluran ‘bioinformatics pipeline’ mempunyai keistimewaan mereka yang tersendiri dan boleh digunakan bersilih ganti dalam masa yang sama berdasarkan pilihan fungsi penggun. KEYWORDS: bioinformatics pipeline; metagenomics analysis; MG-RAST; IMG/M; EBI; metagenomics; palm oil mill effluent 1. INTRODUCTION In the last few decades, metagenomics has become one of the crucial tools in mining the hidden microbial treasure without the use of conventional laboratory culture techniques. Metagenomics involves the study of genetic material extracted from the diverse microbial population of environmental samples. The early stage of genomics relied on the standard laboratory cultivation method which is insufficient to identify the entire microbial population as compared to metagenomics. Furthermore, the change in biotechnology development within this era, such as inexpensive next-generation sequencing (NGS) technologies, high throughput screening technique for metagenomics library and advances in bioinformatics tools, have left a huge impact in the field of metagenomics [1]. Illumina is the most widely used NGS platform in metagenomics studies. The Illumina system has advantages to other NGS platforms in terms of its high throughput sequencing at an economical price with high accuracy (> 99%) reads [2]. The Illumina platform could initially only produce a short-read sequence length which has gradually been improved to a readable length and consequently made it more popular compared to the other platforms in NGS tools [2]. This fast evolution by NGS technologies allows researchers to achieve more variety of data with a high level of detailed sequencing results. NGS has also been developed continuously and rapidly, starting from its launch in 2006, resulting in the accumulation of massive amounts of sequences [2]. Hence, several bioinformatics tools for metagenomics annotation are needed to accurately analyze the enormous amount of data. Palm oil mill effluent (POME) is a colloidal suspension of the final stage effluent in the palm oil industry production. The composition of POME includes 95-96% water, 4-5% total solids, and 0.6-0.7% oil [3]. Besides that, the raw POME also contains significant concentrations of carbohydrates, proteins, nitrogenous compounds, lipids, and minerals that enable this effluent to be used in various biotechnological applications like fermentation media, production of antibiotics, bio-insecticides, polyhydroxyalkanoates (PHA), organic acids, enzymes, and hydrogen [3]. The present study attempts to make a comparative bioinformatics analysis of POME’s sample metagenome constructed using different automated bioinformatics pipelines of MG-RAST, IMG/MER, and EBI Metagenomics to evaluate their accuracy in the taxonomical assignment and functional annotation for future research directed to the industrial application. Metagenome Rapid Annotation using Subsystem Technology (MG-RAST) is a free web-based server with a fully automated system that provides sequence alignment, gene prediction, structural and functional annotation, comparative metagenomics and archiving services [4]. It was launched at the Argonne National Library in 2007 to address the computational needs of huge metagenomics data production analysis [5]. This 2 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. bioinformatic pipeline is being used by researchers around the world with the analysis record of over 250,000 datasets with 100 tera-basepairs of DNA being successfully and completely analyzed to date [6]. Besides, it also has a graphical user interface (GUI) that allows the researcher to study the composition of microbial communities with their specific function [6]. MG-RAST is also one of the bioinformatics pipelines that allows the submission of raw sequence data in the fastq, fasta, and sff format which will then be normalized and processed until annotation is completed by several integrated bioinformatic tools [6]. The Integrated Microbial Genomes with Microbiomes (IMG/M) is quite similar to MG-RAST, which can also examine the taxonomy and function or metabolic potential of microbiomes [7]. IMG/M is a metagenomics data management system supported by the DOE-JGI metagenome annotation pipeline (MAP V.4) which allows the submission of fasta or fastq format assembled and unassembled 454, Illumina, and pacBio nucleotide sequences [8]. In early 2016, all these unassembled reads could no longer be accepted; meanwhile the sequence data generated outside JGI has been limited to the fasta format in assembled data formed only. Until now, IMG/M still supports the external submission of assembled genomes only with the condition that the metagenomes submission and metadata have to be registered with Genomes Online Database (GOLD) version 5 [9]. European Bioinformatics Institute (EBI) Metagenomics is an expanding metagenomics analysis and archiving resource that uses the European Nucleotide Archive (ENA) data scheme developed by the European Molecular Biology Laboratory (EMBL). ENA is needed for the initial submission and archiving purposes in a long-term period storage for reuse in the future [10]. Besides, EBI Metagenomics is a free web-based server that enables users to perform analysis on large scale platforms from Ion Torrent, Roche 454, and Illumina metagenomic sequence data [11]. Similar to MG-RAST and IMG/M, EBI Metagenomics also has an established standardized system and analysis pipeline that includes a variety of analytical and visualization tools in generating the analysis of taxonomic and functional features of user-submitted sequence [12]. A comparison of MG-RAST to Qualitative Insights into Microbial Ecology (QIIME) based on 16S rRNA method found QIIME to be more accurate in term of taxonomic assignment compared to MG-RAST [13]. When MG-RAST was compared to QIIME but with MOTHUR as an additional bioinformatic tool, the results showed that QIIME was again the fastest compared to the other two [14]. QIIME is a bioinformatic tool used by EBI Metagenomics Version 3.0 to perform the taxonomical annotation that currently is being replaced with MAPseq in EBI Metagenomics Version 4.0. Even though in previous research QIIME produced a better result, it lacks the facility to manage, store, and analyze the metagenomics data. In this work, we provide a comparative analysis of the metagenomics data using a fully automated bioinformatics pipeline that integrates the work of management, analysis, storage, and sharing of metagenomics projects [15] instead of analysis of 16S rRNA data only. The three bioinformatics pipelines, namely MG-RAST, IMG/M, and EBI Metagenomics are chosen because within the existing web resources in metagenomic studies, the three bioinformatics tools are highly rated in terms of ease in data uploading, online user support availability, analysis spectrum, citation, and storage capacity [16]. Hence, this research will only focus on the metagenomics analysis by these three bioinformatic pipelines: MG-RAST, IMG/MER, and EBI Metagenomics. These three tools, especially MG-RAST, have been used repeatedly to analyze many metagenome sequencing datasets from a variety of sources. In the present work, we compare the 3 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. analysis result of the same input data using the common three web-based automated bioinformatics pipelines of MG-RAST, IMG/MER, and EBI metagenomics to evaluate their accuracy in taxonomy and functional annotation on the microbial diversity and several functional genes contain in POME. 2. MATERIALS AND METHODS 2.1 Collection of Samples and Creation of Metagenomics Libraries POME samples were collected from FELDA Palm Industries Sdn. Bhd. of KKS Mempaga, Pahang, Malaysia. After sample collection, the metagenomic DNA was extracted using Meta-G-Nome™ DNA Isolation Kit (Epicentre, U.S), sheared, end- repaired, ligated to pCC1FOS fosmid vector and phage-mediated transfection to surrogate host EPI300T1 resistant E.coli to perform the cloning of metagenomic DNA grown in LB agar with the recommended antibiotic. The libraries (>100,000) were constructed by preparing 384-well transparent microplates with LB broth and 20% glycerol and each colony were inoculated from LB agar to the microplates and of glycerol media and stored at -80 oC. 2.2 High Throughput Screening and Next-Generation Sequencing (NGS) Colonies from each plate were inoculated to 384-well microplates filled with LB broth with antibiotic and inducer (respective to their positions in the library plates). Screening buffer (potassium acetate) and lysis mix (10% Triton X-100, 100 mM Tris and 10 mM EDTA) were added to break open the cells and methylumbelliferyl-β-D- glucopyranoside (MUGlc), methylumbelliferyl-β-D-cellobioside (MUC) and chlorocoumarin-xylobioside (CCX) fluorogenic substrates were added to each well. After an overnight incubation, the plates were screened using a microplate reader for the presence of cellulose- and xylan-degrading enzymes. The relative fluorescence units given by the microplate reader were changed to robust z-score to select the high rated hits. Robust z-score was calculated for each microplate independently and results of all plates (109,834 clones) were combined to select the 100 high rated clones [17,18] and sent for Illumina HiSeq2000 next-generation sequencing (NGS) at the Malaysia Genome Institute (MGI), Bangi, Selangor. 2.3 Post-Processing of NGS Data Initial raw data of NGS in the fastq format had undergone sequence quality trimming and short read removal by SolexaQA++ program [19]. The fosmid vector and internal control phiX sequences were removed using bowtie2 [20]. The high-quality sequences were assembled with Velvet based on the de Bruijn graph algorithm to organize the sequences in contigs. Velvet is a new strategy developed to merge very short reads in combination with read pairs to produce useful assemblies [21]. 2.4 Metagenomics Analysis using Three Bioinformatics Pipelines The same dataset of assembled data from POME’s NGS results with fasta format was uploaded into three automated bioinformatics pipelines of MG-RAST, IMG/M, and EBI Metagenomics. These free web-based bioinformatic pipelines were automatically run after submission of the file with related metadata. The details on the uploading method, databases, and system included are summarized in Table 1. 4 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. 2.4.1 MG-RAST The fasta format file and metadata were uploaded in the upload segment of MG- RAST. The metadata was also uploaded using Microsoft Excel template prior to the submission. Throughout the process, the percentage of analysis and process done could be seen in the progress segment. In MG-RAST, the file had undergone the quality control for data hygiene first and before proceeding with feature identification of coding DNA sequence (CDS) using FragGeneScan. This database could predict the DNA coding region of higher than 75 bp. The similarity searches on taxonomic classification in MG-RAST were conducted using the BLAST-Like Alignment Tool (BLAT) that is used to find sequence hits in seven taxonomic categories. For functional annotation, m5nr was used by providing non-redundant integration of several databases like SEED, KEGG, Genbank, IMG, UniProt, and eggNOGs [6]. Table 1: Technical comparison of MG-RAST, IMG/M, and EBI Metagenomics MG-RAST IMG/M EBI Metagenomics Analysis Duration 5 h 21 mins 5 days 22 h 46 mins 3 days License Open-source Open-source Open-source Current Version 4.0.3 4.570 Pipeline Version 4.0 Website http://metagenomics.anl .gov/ https://img.jgi.doe.gov/m/ https://www.ebi.ac.uk/m etagenomics/ Association - GOLD & DOE-JGI MAP ENA Primary Usage GUI GUI GUI Feature Identification rRNA & CDS (FragGeneScan) CRISPR, tRNA, rRNA & CDS Prediction (GeneMark, Prodigal, MetaGeneAnnotator, FragGeneScan) rRNA & CDS (FragGeneScan, Prodigal) Homology- Based Algorithm BLAT BLAST, HMMER, USEARCH - Functional Annotation UniProt, IMG/M, COGs, eggNOGs, KEGG, SEED, STRING COGs, Pfam, KO, EC, MetaCyc, KEGG Comparing CDS using InterProScan Taxonomic Annotation BLAT BLAST Comparing rRNA using MAPseq 2.4.2 IMG/M To submit our own data for metagenomics analysis in IMG/M, we used the Integrated Microbial Genomes with Microbiomes for expert review (IMG/MER). IMG/M has been associated with GOLD v.5 and DOE’s Joint Genome Institute (JGI). The initial step involved is by login to IMG/MER and then registering the project metadata at GOLD prior to uploading the data at JGI. After completion of data submission, the quality control (QC) pre-processing like trimming and removing sequences shorter than 150 bp would take 5 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. place. Next, the genes prediction of CRISPR, tRNA, rRNA, and CDS was conducted using several databases like GeneMark, Prodigal, MetaGeneAnnotator, and FragGeneScan. Finally, the functional annotation was completed by associating the protein-coding genes with the COG, Pfam, KO, EC, MetaCyc, and KEGG [8]. 2.4.3 EBI Metagenomics (EMG) To analyze the data from POME’s NGS results using EBI metagenomics, the data file needs to be uploaded to the European Nucleotide Archive (ENA) using either Webin Uploader, FileZilla Client-Server, or Aspera. In this research, FileZilla Client-Server was utilized, which is easier to use compared to the other methods and has been used widely by many researchers. After QC and filtering out ncRNA reads, the CDS of proteins were predicted using FragGeneScan for short reads and Prodigal for longer reads or assembled ones. In EBI Metagenomics, MAPseq was used for taxonomic classification by assigning taxonomy and OTU classification to rRNA sequence. In the meantime, InterProScan was also used for functional annotation by predicting the domains and classifying them into families using a compilation of several databases like Pfam, TIGRFAM, PRINTS, PROSITE patterns, and Gene3d. Finally the functional results were produced in Gene Ontology (GO). 3. RESULTS AND DISCUSSION To ensure the similar type of dataset for data comparison, the file in fasta format with assembled form has been selected for this research. The assembled format is used instead of the raw sequence because one of the bioinformatics pipelines, IMG/M cannot process the unassembled reads if they are generated outside the Joint Genome Institute (JGI) [9]. The data was submitted to get a clear result related to the taxonomic and functional annotation involving POME DNA metagenome sample. Specifically, the results of major phylum and genus distribution existing in POME will represent the taxonomic classification. On the other hand, major functional annotation and glycoside hydrolase enzyme present in POME’s metagenomics DNA results will represent the functional annotation part of this research. Overall, MG-RAST was found to be the quickest in 5 hours 20 minutes (Table 1) compared to IMG/M and EBI Metagenomics. However, QIIME was the quickest compared to MG-RAST and MOTHUR [14]. On the contrary, MG-RAST is the quickest in this research, even though QIIME was included in EBI Metagenomics system Version 3.0. In this condition, QIIME is not a stand-alone system, and EBI metagenomics also involves other bioinformatic tools like FragGeneScan, Prodigal, and InterProScan which explains the slowness of EBI Metagenomics containing QIIME compared to MG-RAST. Figure 1(a) showed the results of major phylum distribution present in effluent of FELDA palm oil mill, Mempaga, Pahang by each bioinformatic pipelines. All three bioinformatic pipelines have similar dominant phylum of Proteobacteria and Firmicutes with MG-RAST 73.7% and 25.3%, IMG/M 48% and 28%, and EBI Metagenomics 67% and 33%, respectively. The difference can be seen in IMG/M ER with additional dominant phylum of Actinobacteria (12%), dsDNA viruses (5%), and Bacteroidetes (4%). In this phylum composition, IMG/M ER appeared to be more diverse and details compared to the other two bioinformatics pipeline. MG-RAST using a BLAT algorithm while IMG/M ER used BLAST which is the most commonly used bioinformatics tool for sequence similarity analysis. Besides that, IMG/M could provide the percent identities greater than 30% which ensured the larger amount of results produced, while the setting in MG-RAST is only for the percent identities of 60% and above. 6 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. For genus distribution result, the comparison table only involves MG-RAST and IMG/MER as shown in Fig. 1(b). EBI Metagenomics was excluded because no details genes distribution could be detected except Staphylococcus. The latest version of EBI Metagenomics Pipeline 4.0 uses MAPseq framework for taxonomic classification whichallows the metagenomic analysis of reference based rRNA only. Consequently, only small amount of 16S rRNA could be found in this sequence and finally bring the limitation of Staphylococcus as an output. On the contrary, the other two pipelines of MG-RAST and IMG/M, besides rRNA as reference-based, they also used coding DNA sequence (CDS) of protein as the reference which results thousands of CDS can be found from the data submission. Fig. 1: (a) Phyla distribution of palm oil mill effluent (POME) computed by each tool and (b) Major genus distribution in POME analyzed by MG-RAST and IMG/M only. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 G en es A nn ot at ed (x 10 ^3 ) Taxonomic Hits (Genus) mg-rast img/ mer 0% 20% 40% 60% 80% 100% 120% MG-RAST IMG/M ER EMG Ph yl a R el at iv e A bu nd an ce (% ) DsDNA viruses Proteobacteria Firmicutes Cyanobacteria Chlamydiae Bacteroidetes Actinobacteria (a) (b) 7 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. The major genus found in MG-RAST in Fig. 1(b) is similar to IMG/MER which are Pseudomonas, Staphylococcus, Serratia, Escherichia, Burkholderia, Yersinia, and Salmonella. From the same figure too, the quantity of major genes found in all MG- RAST’s genus is greater than IMG/M, and the total number of the genus found in MG- RAST initially also is greater than IMG/M. Soleimaninanadegani and Manshad‘s work on POME reported that they found Bacillus, Micrococcus, Pseudomonas and Staphylococcus genus in POME sample [22]. This finding explains the existence of Pseudomonas and Staphylococcus as one of the major genus found in POME while the other genus like Serratia, Burkholderia, Yersinia, and Salmonella are new findings in this research. Besides that, all the genus are identified under the family of Firmicutes and Proteobacteria which also explain the two families as the main families present in POME. For functional annotation, several major categories of functional annotation can been extracted from the results obtained by each pipeline which are transport and metabolism of nucleotide, lipid, inorganic ion, coenzyme, carbohydrate, and amino acid as shown in Fig. 2. Raw POME contains a high concentration of carbohydrate, protein, nitrogenous compounds, lipids, and minerals [23], which explains why in all three pipelines’ analysis of the metabolism that lipid, carbohydrate, amino acid, inorganic ion, and coenzyme are among the highest annotated reads of functional annotation. Once again, IMG/M has the highest number of annotated reads nearly in all the functional categories. Fig. 2: Major functional annotation of palm oil mill effluent (POME) analyzed by MG-RAST, IMG/M, and EBI. All the glycoside hydrolase enzymes analyzed by each pipeline are listed in Table 2 in order for future research related to the study of useful glycoside hydrolases related to POME. The POME glycoside hydrolase enzymes identified by bioinfomatics tools suggest that POME can be a potential local source for novel enzymes and future potential enzymes for industrial applications. The blank spaces of EBI Metagenomics part (Table 2) probably occurred because EBI metagenomics used InterProScan in functional annotation. InterProScan is a database that works by classifying the CDS into respected families and also by predicting any important domains and sites related to protein sequence [24]. Therefore, the output result of EBI Metagenomics will be more related to predicted domains and families rather than the protein or enzyme names. 0 200 400 600 800 1000 1200 1400 1600 Amino acid transport and metabolism Carbohydrate transport and metabolism Coenzyme transport and metabolism Inorganic ion transport and metabolism Lipid transport and metabolism Functional Annotation EBI IMG/M MG-RAST 8 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. Table 2: Total number of enzyme glycoside hydrolase (EC 3.2.1.-) in palm oil mill effluent (POME) identified by MG-RAST, IMG/M ER, and EBI Metagenomics Enzyme ID Enzyme Name MG-RAST IMG/M EBI EC 3.2.1.1 Alpha-amylase 1 2 5 EC 3.2.1.14 Chitinase - 4 12 EC 3.2.1.141 Malto-oligosyltrehalose trehalohydrolase - 2 - EC 3.2.1.17 Lysozyme - 1 - EC 3.2.1.20 Alpha-glucosidase 1 3 - EC 3.2.1.21 Beta-glucosidase 1 3 - EC 3.2.1.23 Beta-galactosidase 2 2 7 EC 3.2.1.26 Beta-fructofuranosidase 3 3 - EC 3.2.1.4 Cellulase 3 1 2 EC 3.2.1.52 Beta-N-acetylhexosaminidase 11 5 - EC 3.2.1.85 6-phospho-beta-galactosidase 1 1 - EC 3.2.1.86 6-phospho-beta-glucosidase 4 4 2 EC 3.2.1.93 Alpha,alpha-phosphotrehalase 1 2 2 28 33 30 4. CONCLUSIONS Throughout the results of taxonomic and functional annotation of palm oil mill effluent (POME) DNA metagenomics sample, IMG/M could be seen to provide a better analysis result compared to the other pipelines. IMG/M could give a diverse phylum result as it involves a much wider range of percent identity. Besides that, this bioinformatics pipeline also is the one with the highest in nearly all major functional annotation of carbohydrate, amino acid, lipid and coenzyme transport and metabolism as well as the highest in the total number of glycoside hydrolase enzymes contained in POME. On the other hand, the other two pipelines of MG-RAST and EBI metagenomics also have their own specialties. These two bioinformatic tools allow the submission of raw data compared with IMG/M, which is limited to the assembled one only. MG-RAST is also good for obtaining data analysis in a short period as it uses the BLAT system which is 500 times faster for nucleic acid sequence alignment and 50 times faster for protein sequence alignment than other famous existing tools including the one being used in IMG/M and EBI Metagenomics server [25]. Next, EBI metagenomics is more suitable for a deeper understanding of the protein families and domains involved in any specific sequence rather than focusing on taxonomical annotation. In conclusion, all the three bioinformatics pipelines have their own specialty that the researcher needs to know in detail so that the specific bioinformatic pipeline can be used based on individual functional interest 9 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. ACKNOWLEDGMENT The authors would like to thank Ministry of Education, Malaysia for giving the financial support through the Fundamental Research Grant Scheme (FRGS) with project ID FRGS 13-086-0327. Besides that, the authors also deeply grateful to Malaysia Genome Institute (MGI) for the technical assistance in analyzing the metagenomic data of POME. The authors also would like to express their appreciation to Professor S.G. Withers and Dr Hong Ming (Chemistry Department, University of British Columbia, Vancouver, Canada) for providing substrates used for the high-throughput screening of this work. REFERENCES [1] Kumar S, Krishnani KK, Bhushan B, and Brahmane MP. (2015) Metagenomics : Retrospect and Prospects in High Throughput Age, vol. 2015. [2] Mincheol K, Lee KH, Yoon SW, Kim BS, Chun J, and Hana Y. (2013) Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era, 11(3): 102-113. [3] Abdullah N and Sulaiman F. (2013) The oil palm wastes in Malaysia, Biomass Now – Sustain. Growth Use, pp. 75-100. [4] Meyer F, Paarmann D, Souza MD, Olson R, Glass EM, Kubal M, Edwards RA. (2008) The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes, 8: 1-8. https://doi.org/10.1186/1471-2105-9-386. [5] Tang W, Bischof J, Desai N, Mahadik K, Gerlach W, Harrison T, Meyer F. (2014) Workload characterization for MG-RAST metagenomic data analytics service in the cloud. 2014 IEEE International Conference on Big Data (Big Data), 56-63. https://doi.org/10.1109/BigData.2014.7004394. [6] Wilke A, Gerlach W, Harrison T, Paczian T, Trimble WL, & Meyer F. (2016) MG-RAST Manual for version 4, revision 1. [7] Markowitz VM, Chen IMA, Chu K, Szeto E, Palaniappan K, Pillay M, Kyrpides NC. (2014) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Research, 42(D1): 568-573. https://doi.org/10.1093/nar/gkt919. [8] Huntemann M, Ivanova NN, Mavromatis K, Tripp HJ, Paez-espino D, Tennessen K, Kyrpides NC. (2016) The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v . 4). Standards in Genomic Sciences, 1: 1-5. https://doi.org/10.1186/s40793-016-0138-x. [9] Chen IMA, Markowitz VM, Chu K, Palaniappan K, Szeto E, Pillay M, Kyrpides NC. (2017) IMG/M: Integrated genome and metagenome comparative data analysis system. Nucleic Acids Research, 45(D1): D507-D516. https://doi.org/10.1093/nar/gkw929. [10] Hunter S, Corbett M, Denise H, Fraser M, Gonzalez-beltran A, Hunter C, Sansone S. (2014) EBI metagenomics — a new resource for the analysis and archiving of metagenomic data, 42: 600-606. https://doi.org/10.1093/nar/gkt961. [11] Denise H, Mitchell A, Bucchini F, Cochrane G, Denise H, Hoopen P, Finn RD. (2016) EBI metagenomics in 2016 - An expanding and evolving resource for the analysis and archiving of metagenomic data EBI metagenomics in 2016 - An expanding and evolving resource for the analysis and archiving of metagenomic data, (November 2015). https://doi.org/10.1093/nar/gkv1195. [12] Mitchell AL, Scheremetjew M, Denise H, Potter S, Tarkowska A, Qureshi M, Finn RD. (2017) EBI Metagenomics in 2017: Enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Research, 46(D1): D726-D735. https://doi.org/10.1093/nar/gkx967. [13] D’Argenio V, Casaburi G, Precone V, & Salvatore F. (2014) Comparative metagenomic analysis of human gut microbiome composition using two different bioinformatic pipelines. Biomed Res Int, 325340. https://doi.org/10.1155/2014/325340. 10 IIUM Engineering Journal, Vol. 20, No. 1, 2019 Parman et al. [14] Plummer E, Twin J. (2015) A Comparison of Three Bioinformatics Pipelines for the Analysis of Preterm Gut Microbiota using 16S rRNA Gene Sequencing Data. J. Proteomics & Bioinformatics, 8(12): 283–291. https://doi.org/10.4172/jpb.1000381. [15] Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Iliopoulos I. (2015) Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies. Bioinformatics and Biology Insights, 9: 75-88. https://doi.org/10.4137/BBi.s12462. [16] Dudhagara P, Bhavsar S, Bhagat C, Ghelani A, Bhatt S, & Patel R. (2015) Web Resources for Metagenomics Studies, 13: 296–30. [17] Malo N, Hanley JA, Cerquozzi S, Pelletier J, & Nadon R. (2006) Statistical practice in high- throughput screening data analysis. Nature Biotechnology, 24(2): 167-175. https://doi.org/10.1038/nbt1186. [18] Birmingham A, Selfors LM, Forster T, Wrobel D, Kennedy CJ, Shanks E, Shamu CE. (2009) Statistical methods for analysis of high-throughput RNA interference screens. Nat Methods, 6(8): 569-575. https://doi.org/10.1038/nmeth.1351. [19] Cox MP, Peterson DA, & Biggs PJ. (2010) SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics, 11: 485. https://doi.org/10.1186/1471-2105-11-485. [20] Langmead B, & Salzberg SL. (2013) Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4): 357-359. https://doi.org/10.1038/nmeth.1923.Fast. [21] Zerbino DR, Birney E. (2008) Velvet Manual Version 1.1. Genome Research, 18(5): 821- 829. https://doi.org/10.1101/gr.074492.107. [22] Soleimaninanadegani M, Manshad S. (2014) Enhancement of Biodegradation of Palm Oil Mill Effluents by Local Isolated Microorganisms. International Scholarly Research Notices, 1-8. https://doi.org/10.1155/2014/727049. [23] Habib MAB, Yusoff FM, Phang SM., Ang KJ, & Mohamed S. (1997) Nutritional values of chironomid larvae grown in palm oil mill effluent and algal culture. Aquaculture, 158(1-2): 95-105. https://doi.org/http://dx.doi.org/10.1016/S0044-8486(97)00176-2. [24] Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, Mitchell AL. (2017) InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Research, 45(D1): D190-D199. https://doi.org/10.1093/nar/gkw1107. [25] Kent WJ. (2002) BLAT — The BLAST -Like Alignment Tool. Genome Research, 12: 656- 664. https://doi.org/10.1101/gr.229202. 11 << /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles false /AutoRotatePages /None /Binding /Left /CalGrayProfile (Gray Gamma 2.2) /CalRGBProfile (None) /CalCMYKProfile (None) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Warning /CompatibilityLevel 1.7 /CompressObjects /Off /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize false /OPM 0 /ParseDSCComments false /ParseDSCCommentsForDocInfo false /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo false /PreserveFlatness true /PreserveHalftoneInfo true /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts false /TransferFunctionInfo /Remove /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 200 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages false /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /ColorImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 200 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /GrayImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 400 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /CreateJDFFile false /Description << /ARA /BGR /CHS /CHT /CZE /DAN /DEU /ESP /ETI /FRA /GRE /HEB /HRV /HUN /ITA (Utilizzare queste impostazioni per creare documenti Adobe PDF adatti per visualizzare e stampare documenti aziendali in modo affidabile. I documenti PDF creati possono essere aperti con Acrobat e Adobe Reader 6.0 e versioni successive.) /JPN /KOR /LTH /LVI /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken waarmee zakelijke documenten betrouwbaar kunnen worden weergegeven en afgedrukt. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 6.0 en hoger.) /NOR /POL /PTB /RUM /RUS /SKY /SLV /SUO /SVE /TUR /UKR /ENU (Use these settings to create Adobe PDF documents suitable for reliable viewing and printing of business documents. Created PDF documents can be opened with Acrobat and Adobe Reader 6.0 and later.) >> >> setdistillerparams << /HWResolution [600 600] /PageSize [595.440 841.680] >> setpagedevice