Microsoft Word - ETASR_V13_N2_pp10571-10577 Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10571-10577 10571 www.etasr.com Hiremath & Dayananda: Differential Gene Expression Analysis of Non-Small Cell Lung Cancer … Differential Gene Expression Analysis of Non- Small Cell Lung Cancer Samples to Classify Candidate Genes Neelambika B. Hiremath Department of Computer Science and Engineering, JSS Academy of Technical Education, Bengaluru, India neelambika@ieee.org Pruthviraja Dayananda Department of Information Science and Engineering, JSS Academy of Technical Education, Bengaluru, India dayanandap@gmail.com (corresponding author) Received: 9 February 2023 | Revised: 25 February 2023 | Accepted: 4 March 2023 ABSTRACT Differential gene expression is an analysis of gene data, in which the RNA sequence data after next- generation sequencing are to be visualized for any quantitative changes in the levels of the experimental data set. This work aims to derive the transcript statistics on a gene transcript file with a fold change of genes on a normalized scale, in order to identify quantitative changes in gene expression of the difference between the reference genome and Non-Small Cell Lung Cancer (NSCLC) samples. This insight makes a clinical impact in assessing and characterizing candidate genes. The pipeline comprises tuxedo protocol and programming language R with the standard ballgown package. The resultant data set and the plot displays depict the candidate genes in their respective location which are significant in expressing their changes in NSCLC samples. The samples are compared with prominent gene labels of NSCLC samples. The results explain the differential expression of particular samples across samples from both genders. Keywords-differential gene expression; next-generation sequencing; RNA sequence; machine learning; non- small cell lung cancer; classification I. INTRODUCTION Lung cancer is the leading type of cancer worldwide. About 1.76 million deaths were caused by lung cancer during 2019. The disease is caused by lung malignancies. There are a lot of advances in lung cancer therapies, but, despite the advancement the prognosis of the disease is unfavourable [1], because after the diagnosis the survival rate is less than 15% for the next five years. The understanding of molecular characteristics leading to identifying cancer will speed up the diagnosis. Cancer is one of the most challenging diseases [2]. The recent advancement of next-generation sequencing of the human gene has created opportunities to use gene expression information for decision making [3]. Lung cancer can be categorised into NSCLC, which are about the 85% of the cases, and as small cell lung cancer. The study of molecular signature shows that carcinogenesis is caused by oncogenic drivers. This has led the research to oncogenic driver identification rather than clinical parameter studies [4]. The most common types of NSCLC are squamous cell carcinoma, large-cell carcinoma, and adenocarcinoma, with several other types occurring less frequently. All types can occur in unusual histologic variants and as mixed cell-type combinations [5]. The lack of early detection of NSCLC leads to an increased mortality rate, specifically in developing countries. Cancer is globally considered as a leading cause of death [6]. It was reported that 1 in 6 deaths in 2018 happened due to cancer. The lung cancer accounts for about 2.09 million cases worldwide. In India, about 70 thousand cases of lung cancer are reported every year [7]. Depending on the stage of cancer, the general condition of the person, age, reaction to chemotherapy, and other considerations, such as the possible side effects of the procedure, more than one methods of treatment are used. Usually, NSCLC patients are categorized as patients with early, non-metastatic disease (stages I and II, and pick type III tumors) and patients with locally advanced thoracic cavity- bound disease (e.g. large tumors). Several facets of cancer science are now being reshaped by the many implementations and uses of Next Generation Sequencing (NGS) technologies. As a result of aberrant mutagenesis and its contribution to Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10571-10577 10572 www.etasr.com Hiremath & Dayananda: Differential Gene Expression Analysis of Non-Small Cell Lung Cancer … carcinogenesis, these technologies have enabled researchers to explain improvements in any degree of regulation carried out in a cell [8]. Usage of RNA sequencing to understand and characterize the genes involved in cancer diagnostics and therapy has become a well-used tool. The reduced cost of sequencing and its advantages over microarrays have made it accessible to everyone, which is improving the time taken to detect and treat cancer patients. RNA sequencing provides a view of the transcriptome of gene incomprehensive structure [9]. It is not dependent on any prior sequence knowledge and it can detect structural variations such as alternative splicing events and gene fusion, but the data storage and the analysis are more complex and it does not use any standard protocol, while it is also an expensive process. The most significant factor in NSCLC is that its identification is delayed, causing NSCLCs to have the lowest survival rate. Researchers have concluded that faster diagnosis happens by understanding gene expression at a molecular level, as gene sequence expression plays a very significant role in NSCLCs [10]. As the latest NGS sequencing provides us with a comprehensive genome structure with high throughput, identifying the candidate genes by analyzing the RNA sequence data set allows focusing on those sets of genes. The candidate genes mark a subset of biomarkers [11] or signatures of this biological condition. The present study aims to analyze and map the significant mutations in the genes responsible for NSCLC, something that might help biomarker identification, leading to early detection and helping adjuvant therapies in personalised medicine which increases survival rate. The DNA analysis at nucleotide level is a new research area [12], and the research on the identification of biomarkers not only supports the medication treatment, but also predicts the proteins which create and activate post gene expression [13]. II. MATERIALS AND METHODS A. Datasets This work is based on real biological samples' data, the dataset is available on the Sequence Read Archive database maintained by the National Centre for Biotechnology Information (NCBI). Data from the project SRP117020 were selected for analyzing representative samples on both genders with age bigger than 50 years. The data consisted of RNA sequences with the distribution of poor to well-differentiated adenocarcinomas and squamous cell cancers which were sequenced using Illumina Hiseq2500 [14]. The details of the sequence run according to the accession number are shown in Table I. TABLE I. DETAILS ABOUT SEQUENCE ARCHIVE RUN (SRR) ACCESSION, WITH ACTUAL SEQUENCING DATA FROM THE PARTICULAR SEQUENCING EXPERIMENT [15] Run Average spot length Bases – giga basepairs Size Histology Date published Access type Gender (age > 50 years) SRR6013475 199 6.3Gbp 3.86Gb Squamous cell carcinoma 2018-08-03 Public Male SRR6013476 199 6.16Gbp 3.84Gb Squamous cell carcinoma 2018-08-03 Public Male SRR6013477 199 5.46Gbp 3.33Gb Adenocarcinoma 2018-08-03 Public Male SRR6013479 199 6.00Gbp 3.65Gb Adenocarcinoma 2018-08-03 Public Male SRR6013492 199 5.56Gbp 3.38Gb Adenocarcinoma 2018-08-03 Public Female SRR6013502 199 6.82Gbp 4.20Gb Adenocarcinoma 2018-08-03 Public Female SRR6013508 199 5.56Gbp 3.45Gb Adenocarcinoma 2018-08-03 Public Female SRR6013509 197 7.97Gbp 4.86Gb Adenocarcinoma 2018-08-03 Public Female B. Data Analysis The analysis of the dataset was carried out by raw sequencing data in fastq format. The reference genome was obtained from the assembly resources of the NCBI. 1) Quality Check The quality of the RNAseq reads was validated with the FastQC software [16]. 2) Read Alignment and Assembly of Transcripts RNA-seq reads were aligned to the human reference genome using a fast and sensitive alignment programme called HISAT2 [17]. The visual exploration and differences in the expressions were obtained using the Ballgown R package. The experiment is carried out using a standard protocol of tuxedo suite tools which is useful in the analysis of RNA-seq data. The protocol is described by different processes which are more convenient in analyzing raw sequences [18] of large data. The detailed flow chart of the processes is depicted in Figure 1. The hardware environment utilized to run the tuxedo protocol was a 64-bit computer run in Linux environment with 8Gb RAM. 3) Differential Gene Expression and Pathway Analysis The transcripts and expression levels obtained from Stringtie were subjected to the differentially expressed genes which were performed using the Ballgown package [18]. The package uses statistical methods to get the differentially expressed genes. The obtained list of all the differentially expressed genes was subjected to pathway and gene ontology analysis. This was carried out using KOBAS 3.0 annotation module [19]. The annotation module accepts gene sequences in FASTA format or Gene ID as input and presently covers 5944 different species to run the annotations. 4) Enrichment Analysis From the pathway analysis results, only genes involved in NSCLC disease pathway were taken and functional enrichment analysis was performed using KOBAS 3.0 enrichment module [19]. Enrichment gives gene ontologies that are significant statistically. This gives an output based on the hypergeometric distribution. 5) Obtaining the Candidate Genes Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10571-10577 10573 www.etasr.com Hiremath & Dayananda: Differential Gene Expression Analysis of Non-Small Cell Lung Cancer … To obtain the main candidate genes responsible for NSCLC, the mutations responsible for NSCLC were obtained using the ClinVar database [20]. The genes involved in mutations were taken into account and their corresponding expression values obtained from the ballgown package were evaluated to shortlist the candidate genes. Fig. 1. The protocol to produce tables and plots and obtain differentially expressed genes. III. RESULTS A. Data Analysis The experimental data were analyzed as shown in Table II. 1) Quality Check Analysis of the raw reads in FastQC format gave a quick impression of the dataset with various features including sequence quality, GC content, N content, and statistics. A brief overview of the obtained results is shown in Table II. 2) Read Alignment and Assembly of Transcripts Read alignment of the reference genome is a very important step. When the alignment does not occur properly, transcriptome reconstruction becomes difficult, especially for genes expressed at lower levels. The alignment of the reads to the human reference genome gave SAM files whose alignment rates were above 70% which were later converted to BAM, which is the compressed binary version of SAM, files. When the alignment files were subjected to transcriptome reconstruction using StringTie, annotation files were obtained with all the expression levels of all genes and transcripts. 3) Differential Expression and Pathway Analysis The relevant packages under Ballgown were loaded, and the phenotype data were loaded in .csv format which contained sample ID and gender of the sample. Ballgown gives two tables with differentially expressed genes and transcripts between genders, giving 3 types of values. One, the fold change value referring to the ratio between expression levels in male and female. The values that are <1 indicate that it is expressed at a lower level and values >1 indicate that it is expressed at a higher level. Second, the p-value to get the idea of spotting data if no difference existed. Third, the q-value which is the adjusted p-value obtained by applying the false discovery rate. The gene abundance, measured in terms of FPKM (fragments per kilobase of the model per million reads mapped) distribution across all samples is visualised in Figure 2. Fig. 2. Distribution of FPKM values across 8 samples. TABLE II. OVERVIEW OF THE FASTQC RESULTS [15] SRR6013475 SRR6013476 SRR6013477 SRR6013479 SRR6013492 SRR6013502 SRR6013509 SRR6013508 Basic stats (total sequences) 3,15,05,786 3,08,90,798 2,74,86,767 3,00,87,832 2,79,04,343 3,42,10,392 4,02,79,600 2,78,67,756 Per base sequence quality average score for all bases 32-40 34-40 32-40 34-40 34-40 34-40 34-40 34-39 Sequence quality score (Phred mean) Ave 2-32 Ave 2-30 Ave 2-32 Ave 2-30 Ave 2-30 Ave 2-30 Ave 2-28 Ave 2-28 Sequence GC content 46 48 47 45 44 45 43 45 Base N content 0-2 Nil Nil Nil Nil Nil Nil Nil Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10571-10577 10574 www.etasr.com Hiremath & Dayananda: Differential Gene Expression Analysis of Non-Small Cell Lung Cancer … The y-axis shows the log2 transformation of FPKM and the x-axis shows the accession numbers of the data. Additionally, structure and expression levels of isoforms of KRAS were obtained from Ballgown which is shown in Figures 3-4. The KRAS gene belongs to a set of genes known as oncogenes. When mutated, normal cells with these oncogenes have the potential to cause cancer [21]. Figure 3 shows the structure and isoforms of the KRAS gene in sample SRR6013502. The expression levels are shown in varying colors (from yellow to red). The isoform expressed in a higher level is shown in a darker shade. The x-axis indicates the genomic position of the KRAS gene. Figure 4 describes the expressions and structure comparison of KRAS gene in male and female samples. The y- axis indicates the genomic position. The darker color indicates higher expression level which means the KRAS gene is expressed at a higher rate in females than in males. When all the differentially expressed genes were subjected to pathway analysis setting the organism as homo sapiens and the method as gene symbol ID mapping, a total of 5071 genes in different pathways were involved. The results showed different sections of output for a particular gene: the pathways the gene was involved in, the diseases the genes were involved in, and the gene ontology indicating the biological functions the genes were related to. Fig. 3. Structure and expression level of KRAS gene in SRR6013502. 4) Enrichment Analysis Out of the 5702 hits from the pathway analysis, the genes specifically related to NSCLC were chosen and were again subjected to pathway analysis to see the commonality between breast cancer and small-cell lung cancer. The complete pathway analysis of these genes is provided as a supplementary document. The genes involved in NSCLC and their involvement in breast and small cell lung cancer are shown in Table III. Fig. 4. The structure and expression of KRAS in male and female samples. The genes responsible for NSCLC and their involvement in breast cancer and SCLC are indicated in Figure 5 and Table III. Figure 5 indicates that 20% of the genes are involved in both NSCLC and breast cancer, 22% are involved in NSCLC and SCLC, and 58% are involved only in NSCLC. The genes given in Table III were subjected to enrichment analysis. The detailed table of the functional enrichment is computed in the functional enrichment of the genes involved in NSCLC. The enrichment analysis and the corrected p-values are conducted using a hypergeometric distribution which is used for over-representation analysis of genes. The barplot representation of functional enrichment is shown in Figure 6. Each row represents an enriched function, and the length of the bar represents the enrichment ratio,which is calculated as input gene number/background gene number. The indicated color codes are based on the module number which has been assigned based on the enrichment ratio. The functional annotation with the same module number is grouped and indicated by the same color. Fig. 5. Pie chart for genes involved in different cancers. Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10571-10577 10575 www.etasr.com Hiremath & Dayananda: Differential Gene Expression Analysis of Non-Small Cell Lung Cancer … TABLE III. GENES INVOLVED IN NSCLC AND THEIR COMMONALITY WITH BREAST CANCER AND SCLC. THE HIGHLIGHTED GENES ARE THE ONES INVOLVED ONLY IN NSCLC GENE name Disease1 Disease2 Disease3 ABCC4 Breast Cancer NSCLC AGER NSCLC AKT1 Breast Cancer NSCLC AKT2 NSCLC ARAF Breast Cancer NSCLC BAD NSCLC BAK1 NSCLC SCLC BAX NSCLC SCLC BRAF NSCLC SCLC CASP9 NSCLC SCLC CCND1 NSCLC SCLC CDK4 NSCLC SCLC CDK6 NSCLC SCLC CDKN1A NSCLC SCLC DCBLD1 NSCLC SCLC DDB2 NSCLC SCLC DLST NSCLC SCLC E2F3 NSCLC SCLC EGFR Breast cancer NSCLC SCLC EIF4E2 NSCLC SCLC EML4 Breast cancer NSCLC SCLC ERBB2 Breast cancer NSCLC SCLC ETS2 Breast cancer NSCLC SCLC FHIT NSCLC SCLC FOXO3 NSCLC SCLC KRAS NSCLC SCLC MAP2K1 NSCLC SCLC NRAS NSCLC SCLC GADD45A NSCLC SCLC The circular enrich network of the functional annotation is shown in Figure 7. Each node represents an enriched term, and the edges represent the connections between two enriched terms that have a gene-overlapped ratio more than a specific cut-off (default > 0.5). Different modules in different colors represent nodes belonging to specific topologic communities in a structured network, which were defined using the Infomap algorithm. The network is in a circular layout according to the gravity of the two nodes. The color of the bar in the bar plot is the same as the color in the circular network, which represents different modules and their interactions [19]. 5) Obtaining the Candidate Genes Τhe results obtained from ClinVar database considered only single nucleotide polymorphisms and the corresponding expression analysis from Ballgown. It is seen that the genes BRAF, NRAS, KRAS, EGFR, and MAP2K1, involved in multiple mutations, are computed in the single nucleotide polymorphisms involved in NSCLC, which gives a vast idea about the SNPs. Table III provides detailed information about the nucleotide change and the corresponding amino acid change, chromosome number and the position in the human genome (GRCh38). The expression values of genes involved in mutations are highlighted in Table IV. TABLE IV. GENES INVOLVED IN MULTIPLE SNPS (AMINO ACID CHANGES) Gene Fold change p-value q-value BRAF 0.8108669 0.5386988 0.9979959 MAP2K1 1.004441 0.9728909 0.9993876 KRAS 0.7801372 0.2109766 0.9979959 EGFR 0.4406753 0.1661845 0.9979959 NRAS 1.0061834 0.9705951 0.9993876 Fig. 6. Barplot representation of functional enrichment. Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10571-10577 10576 www.etasr.com Hiremath & Dayananda: Differential Gene Expression Analysis of Non-Small Cell Lung Cancer … Fig. 7. Circular enrichment network. IV. DISCUSSION The epidermal growth factor receptor was the first oncogenic target uncovered in NSCLC (EGFR). 40% of Asian patients have EGFR mutations, compared to 11-17% of Caucasian patients. Almost all EGFR mutations involve exons 18 to 21. Around 40-50% of EGFR mutations are represented by small in-frame deletions in exon 19 (del 19), whereas p.Leu858Arg amino acid substitutions in exon 21 account for 30-40% [22]. BRAF mutations are found in 2%-8% of people with NSCLC. The BRAF exon 15 p.Val600Glu activating mutation is the cause of 50% of all BRAF mutations. Additional mutations are discovered in exons 11 and 15 and are classified as either triggering (p.Gly469X, p.Leu597Arg, or p.Lys601Glu) or faulty (p.Gly466Val, p.Asp594X, p.Gly596Cys). As predicted by melanoma results [23], single BRAF inhibitors (i.e. vemurafenib or dabrafenib) elicit cell cycle arrest and death in p.Val600Glu-mutated-NSCLC. KRAS-activating mutations are detected in around 30% of the cancer population and are being employed as an exclusion biomarker. In smokers, KRAS-mutated tumours are more prevalent and host other drug-related drivers less frequently. The precise type of KRAS mutation may provide information regarding the aggressiveness of a disease or its sensitivity to certain drugs [24]. For instance, the G12D mutation in NSCLC has been linked to a more favourable prognosis than the G12V or G12R variants [25]. MAP2K1 mutations are infrequent in NSCLC and are assumed to be mutually exclusive with known driver mutations. MEK1 cascade activation is believed to play a major role in the resistance to selective treatment regimens, and the MAP2K1 K57 N mutation has been associated with resistance in preclinical animals [26]. The fold change value for MAP2K1 and NRAS is greater than 1, indicating that these two genes are much more expressed than BRAF, KRAS, and EGFR. BRAF and KRAS are also expressed with values close to 1, although EGFR has a relatively low expression value. PIK3CA, ROS1, and FGFR1 are additional prevalent genes whose mutations are associated with NSCLC, with PIK3CA and FGFR1 being much more expressed than ROS1. The functional enrichment employed in the research focuses on a novel classification strategy for biomarker genes across 3 major illnesses, such as breast cancer, NSCLC, and SCLC. V. CONCLUSION The differential gene expression of the human genome (GRCh38) identifies the most frequently mutated genes as candidates. Mutated genes BRAF, NRAS, KRAS, EGFR, MAP2K1, PIK3CA, MET, ROS1, and FGFR1 were found to be expressed more in the selected data of lung adenoma cancer patients. The knowledge gained from genomic profiling can be utilized to clinically correlate and target medicine for diseases caused by genomic abnormalities. By examining the gene mapping for the specific activity, it is possible to classify these medicines as tumour suppressors or chemotherapy resistant. To comprehend and categorize the common genes associated with breast cancer and other malignancies, the employed enrichment technique has identified mutant genes as candidates. After a comprehensive validation of mutant genes, potential genes were identified. The discovery of a drug's interactions is prompted by a process of validation that must be carried out to match clinical therapy. The process of chemotherapy and tumor suppression can be revisited with a decreased dose to make the therapies less hazardous. REFERENCES [1] D. Wang et al., "The predictive effect of the systemic immune- inflammation index for patients with small-cell lung cancer," Future Oncology, vol. 15, no. 29, pp. 3367–3379, Oct. 2019, https://doi.org/ 10.2217/fon-2019-0288. [2] N. Behar and M. Shrivastava, "A Novel Model for Breast Cancer Detection and Classification," Engineering, Technology & Applied Science Research, vol. 12, no. 6, pp. 9496–9502, Dec. 2022, https://doi.org/10.48084/etasr.5115. [3] S. Garinet, P. Laurent-Puig, H. Blons, and J.-B. Oudart, "Current and Future Molecular Testing in NSCLC, What Can We Expect from New Sequencing Technologies?," Journal of Clinical Medicine, vol. 7, no. 6, Jun. 2018, Art. no. 144, https://doi.org/10.3390/jcm7060144. [4] A. El-Telbany and P. C. Ma, "Cancer Genes in Lung Cancer: Racial Disparities: Are There Any?," Genes & Cancer, vol. 3, no. 7–8, pp. 467– 480, Jul. 2012, https://doi.org/10.1177/1947601912465177. [5] S. A. Kenfield, E. K. Wei, M. J. Stampfer, B. A. Rosner, and G. A. Colditz, "Comparison of aspects of smoking among the four histological types of lung cancer," Tobacco Control, vol. 17, no. 3, pp. 198–204, Jun. 2008, https://doi.org/10.1136/tc.2007.022582. [6] J. R. Molina, P. Yang, S. D. Cassivi, S. E. Schild, and A. A. Adjei, "Non-Small Cell Lung Cancer: Epidemiology, Risk Factors, Treatment, and Survivorship," Mayo Clinic Proceedings, vol. 83, no. 5, pp. 584– 594, May 2008, https://doi.org/10.4065/83.5.584. [7] "Fact sheets," WHO. https://www.who.int/news-room/fact-sheets. [8] D. E. Dupuy and M. Shulman, "Current Status of Thermal Ablation Treatments for Lung Malignancies," Seminars in Interventional Radiology, vol. 27, no. 3, pp. 268–275, Sep. 2010, https://doi.org/ 10.1055/s-0030-1261785. [9] S.-S. Han et al., "RNA sequencing identifies novel markers of non-small cell lung cancer," Lung Cancer, vol. 84, no. 3, pp. 229–235, Jun. 2014, https://doi.org/10.1016/j.lungcan.2014.03.018. [10] S. Coco et al., "Next generation sequencing in non-small cell lung cancer: new avenues toward the personalized medicine," Current Drug Targets, vol. 16, no. 1, pp. 47–59, 2015, https://doi.org/10.2174/ 1389450116666141210094640. [11] S. Cheng et al., "Predicting the regrowth of clinically non-functioning pituitary adenoma with a statistical model," Journal of Translational Medicine, vol. 17, no. 1, May 2019, https://doi.org/10.1186/s12967-019- 1915-2, Art. no. 164. [12] V. Mero and D. Machuve, "The Usability Testing of SSAAT, a Bioinformatic Web Application for DNA Analysis at a Nucleotide Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10571-10577 10577 www.etasr.com Hiremath & Dayananda: Differential Gene Expression Analysis of Non-Small Cell Lung Cancer … Level," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7075–7078, Jun. 2021, https://doi.org/10.48084/etasr.4107. [13] S. Tahzeeb and S. Hasan, "A Neural Network-Based Multi-Label Classifier for Protein Function Prediction," Engineering, Technology & Applied Science Research, vol. 12, no. 1, pp. 7974–7981, Feb. 2022, https://doi.org/10.48084/etasr.4597. [14] S. Bakr et al., "A radiogenomic dataset of non-small cell lung cancer," Scientific Data, vol. 5, no. 1, Oct. 2018, Art. no. 180202, https://doi.org/ 10.1038/sdata.2018.202. [15] N. B. Hiremath and P. Dayananda, "Identification and Characterization of SNP Mutation in Genes Related to Non-small Cell Lung Cancer," Current Signal Transduction Therapy, vol. 16, no. 3, pp. 253–261, http://doi.org/10.2174/1574362415999200819202218. [16] E. Frenkel, "Gauge theory and Langlands duality," Astérisque, vol. 332, no. 1010, pp. 369–403, 2010. [17] D. Kim, J. M. Paggi, C. Park, C. Bennett, and S. L. Salzberg, "Graph- based genome alignment and genotyping with HISAT2 and HISAT- genotype," Nature Biotechnology, vol. 37, no. 8, pp. 907–915, Aug. 2019, https://doi.org/10.1038/s41587-019-0201-4. [18] M. Pertea, D. Kim, G. M. Pertea, J. T. Leek, and S. L. Salzberg, "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown," Nature Protocols, vol. 11, no. 9, pp. 1650–1667, Sep. 2016, https://doi.org/10.1038/nprot.2016.095. [19] C. Xie et al., "KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases," Nucleic Acids Research, vol. 39, no. suppl_2, pp. W316–W322, Jul. 2011, https://doi.org/10.1093/nar/gkr483. [20] M. J. Landrum et al., "ClinVar: public archive of relationships among sequence variation and human phenotype," Nucleic Acids Research, vol. 42, no. D1, pp. D980–D985, Jan. 2014, https://doi.org/10.1093/nar/ gkt1113. [21] S. Jančík, J. Drábek, D. Radzioch, and M. Hajdúch, "Clinical Relevance of KRAS in Human Cancers," BioMed Research International, vol. 2010, Jun. 2010, Art. no. e150960, https://doi.org/10.1155/2010/150960. [22] F. Barlesi et al., "Routine molecular profiling of patients with advanced non-small-cell lung cancer: results of a 1-year nationwide programme of the French Cooperative Thoracic Intergroup (IFCT)," The Lancet, vol. 387, no. 10026, pp. 1415–1426, Apr. 2016, https://doi.org/ 10.1016/S0140-6736(16)00004-0. [23] C. S. Baik, N. J. Myall, and H. A. Wakelee, "Targeting BRAF‐Mutant Non‐Small Cell Lung Cancer: From Molecular Profiling to Rationally Designed Therapy," The Oncologist, vol. 22, no. 7, pp. 786–796, Jul. 2017, https://doi.org/10.1634/theoncologist.2016-0458. [24] P. A. Jänne et al., "Selumetinib Plus Docetaxel Compared With Docetaxel Alone and Progression-Free Survival in Patients With KRAS- Mutant Advanced Non–Small Cell Lung Cancer: The SELECT-1 Randomized Clinical Trial," JAMA, vol. 317, no. 18, pp. 1844–1853, May 2017, https://doi.org/10.1001/jama.2017.3438. [25] M. Román et al., "KRAS oncogene in non-small cell lung cancer: clinical perspectives on the treatment of an old target," Molecular Cancer, vol. 17, no. 1, Feb. 2018, https://doi.org/10.1186/s12943-018- 0789-x, Art. no. 33. [26] M. Scheffler et al., "Co-occurrence of targetable mutations in Non-small cell lung cancer (NSCLC) patients harboring MAP2K1 mutations," Lung Cancer, vol. 144, pp. 40–48, Jun. 2020, https://doi.org/10.1016/ j.lungcan.2020.04.020.