please check the attached documents. - Biology
Grading Rubric for Bioinformatics Midterm Student: Points Available Points Earned Format: Is the midterm written in the style of a term paper, and does it compare and contrast the two papers read for the assignment? 10 Comprehension: Does the midterm provide evidence that suggests the author comprehends key concepts presented in the papers, including, for example, research questions addressed and major findings? 10 Synthesis: Does the midterm present a combination of ideas that as a whole reflect on the importance/relevance of the papers within its field or for society? 10 Bioinformatics: Does the midterm include discussion of key bioinformatic techniques, software, and/or databases that facilitated the research? 10 Technical: Does the midterm provide evidence that the writer has a fundamental technical understanding of the methods used? 10 Figures: Does the midterm include discussion of key figures that are central to the papers read and present information related to how the binoinformatic methods used enabled the formation of those figures? Is an opinion on the effectiveness/quality of the figures provided? 10 Distinction: Does the midterm provide information that clearly distinguishes the source of the various details discussed? 10 Connection: Does the midterm link the overall importance of the scientific findings with a perspective that has relevance for the writer/reader? 10 Conclusions: Does the midterm provide concluding statements that reflect on the key themes presented in the writing? 10 Writing Style: Is the writing concise? Is it easy to understand, and are logical connections between ideas made? Overall, does the writer use correct spelling and proper grammar? 10 Point Total: 100 BIOL 57601 Bioinformatics Midterm Exam Papers: Read and consider the following papers that are found in the Bioinformatics Brightspace ‘Midterm Exam’ content folder: Dong, X., Huang, D., Yi, X., Zhang, S., Wang, Z., Yan, B., Chung Sham, P., Chen, K., & Jun Li, M. (2020). Diversity spectrum analysis identifies mutation-specific effects of cancer driver genes. Communications biology, 3: e6. https://doi.org/10.1038/s42003-019-0736-4 Zhang, J., Hu, H., Xu, S., Jiang, H., Zhu, J., Qin, E., He, Z., & Chen, E. (2020). The Functional Effects of Key Driver KRAS Mutations on Gene Expression in Lung Cancer. Frontiers in genetics, 11: e17. https://doi.org/10.3389/fgene.2020.00017 Optional background: https://www.cdc.gov/genomics/about/precision_med.htm Stratton, M. R., Campbell, P. J., & Futreal, P. A. (2009). The cancer genome. Nature, 458: 719–724. https://doi.org/10.1038/nature07943 Midterm Exam Assignment: Focus on research: Each of the following questions should be addressed in a 2–3 page paper that will be turned in on Sunday one week from today (due 24 Oct. @ 10:30pm CST). Compare and contrast the papers of Zhang et al. 2020 and Dong et al. 2020. In essay format, please address the following question: Why do the introductions from these papers suggest there is a need to carry out these types of studies? What scientific question/s was/were asked in each paper, and why did the researchers select the particular areas of research that was the focus of each of these papers? What are two major findings from each paper? Which figures from these papers most directly reflect these major findings? Why do you say this? How are the papers similar? How are they different? Focus on bioinformatics: What general bioinformatic methods were used in each of these papers, and what specific programs/methods were used? Where any specialized databases used? If so, what where they specifically and how were they used? Consider key figures (see above) that are related to central parts of the stories that both of these papers are telling, name the specific figures you will be focusing on, and then discuss how the binoinformatic methods enabled the formation of those figures. What elements of these figures help them tell the story? Do you consider the figures to be well executed, or are there elements that you would improve on? Wrap-up: Synthesize all the information you have reviewed here. What is the overall significance of these papers and what are the implications of this work in this current era? https://doi.org/10.1038/s42003-019-0736-4 https://doi.org/10.3389/fgene.2020.00017 https://www.cdc.gov/genomics/about/precision_med.htm https://doi.org/10.1038/nature07943 The cancer genome Michael R. Stratton1,2, Peter J. Campbell1,3, and P. Andrew Futreal1 1Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK 2Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey SM2 5NG, UK 3Department of Haematology, University of Cambridge, Cambridge CB2 2XY, UK Abstract All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed. Cancer is responsible for one in eight deaths worldwide1. It encompasses more than 100 distinct diseases with diverse risk factors and epidemiology which originate from most of the cell types and organs of the human body and which are characterized by relatively unrestrained proliferation of cells that can invade beyond normal tissue boundaries and metastasize to distant organs. Early insights into the central role of the genome in cancer development emerged in the late nineteenth and early twentieth centuries from studies by David von Hansemann2 and Theodor Boveri3. Examining dividing cancer cells under the microscope, they observed the presence of bizarre chromosomal aberrations. This led to the proposal that cancers are abnormal clones of cells characterized by and caused by abnormalities of hereditary material. Following the discovery of DNA as the molecular substrate of inheritance4 and determination of its structure5, this speculation was supported by the demonstration that agents that damage DNA and generate mutations also cause cancer6. Subsequently, increasingly refined analyses of cancer cell chromosomes showed that specific and recurrent genomic abnormalities, such as the translocation between chromosomes 9 and 22 in chronic myeloid leukaemia (known as the ‘Philadelphia’ translocation7,8), are associated with particular cancer types. Finally, it was demonstrated that introduction of total genomic DNA from human cancers into phenotypically normal NIH3T3 cells could convert them into cancer cells9,10. Isolation of the specific DNA segment responsible for this transforming activity led to the identification of the first naturally occurring, human cancer-causing sequence change—the single base G > T substitution that causes a glycine to valine substitution in codon 12 of the HRAS gene11,12. This seminal discovery in 1982 inaugurated an era of vigorous searching for the abnormal genes underlying the development of human cancer that continues today. ©2009 Macmillan Publishers Limited. All rights reserved Correspondence and requests for materials should be addressed to M.R.S. ([email protected]).. Author Information Reprints and permissions information is available at www.nature.com/reprints Europe PMC Funders Group Author Manuscript Nature. Author manuscript; available in PMC 2010 February 15. Published in final edited form as: Nature. 2009 April 9; 458(7239): 719–724. doi:10.1038/nature07943. E urope P M C F unders A uthor M anuscripts E urope P M C F unders A uthor M anuscripts http://www.nature.com/reprints Here we review the principles of our current understanding of cancer genomes. We look forward to the explosion of information about cancer genomes that is imminent and the insights into the process of oncogenesis that this promises to generate. Cancer is an evolutionary process All cancers are thought to share a common pathogenesis. Each is the outcome of a process of Darwinian evolution occurring among cell populations within the microenvironments provided by the tissues of a multicellular organism. Analogous to Darwinian evolution occurring in the origins of species, cancer development is based on two constituent processes, the continuous acquisition of heritable genetic variation in individual cells by more-or-less random mutation and natural selection acting on the resultant phenotypic diversity. The selection may weed out cells that have acquired deleterious mutations or it may foster cells carrying alterations that confer the capability to proliferate and survive more effectively than their neighbours. Within an adult human there are probably thousands of minor winners of this ongoing competition, most of which have limited abnormal growth potential and are invisible or manifest as common benign growths such as skin moles. Occasionally, however, a single cell acquires a set of sufficiently advantageous mutations that allows it to proliferate autonomously, invade tissues and metastasize. The catalogue of somatic mutations in a cancer genome Like all the cells that constitute the human body, a cancer cell is a direct descendant, through a lineage of mitotic cell divisions, of the fertilized egg from which the cancer patient developed and therefore carries a copy of its diploid genome (Fig. 1). However, the DNA sequence of a cancer cell genome, and indeed of most normal cell genomes, has acquired a set of differences from its progenitor fertilized egg. These are collectively termed somatic mutations to distinguish them from germline mutations that are inherited from parents and transmitted to offspring. The somatic mutations in a cancer cell genome may encompass several distinct classes of DNA sequence change. These include substitutions of one base by another; insertions or deletions of small or large segments of DNA; rearrangements, in which DNA has been broken and then rejoined to a DNA segment from elsewhere in the genome; copy number increases from the two copies present in the normal diploid genome, sometimes to several hundred copies (known as gene amplification); and copy number reductions that may result in complete absence of a DNA sequence from the cancer genome (Fig. 2). In addition, the cancer cell may have acquired, from exogenous sources, completely new DNA sequences, notably those of viruses such as human papilloma virus, Epstein Barr virus, hepatitis B virus, human T lymphotropic virus 1 and human herpes virus 8, each of which is known to contribute to the genesis of one or more type of cancer13. Compared to the fertilized egg, the cancer genome will also have acquired epigenetic changes which alter chromatin structure and gene expression, and which manifest at DNA sequence level by changes in the methylation status of some cytosine residues. Epigenetic changes can be subject to the same Darwinian natural selection as genetic events, provided that there is epigenetic variation in the population of competing cells, that the epigenetic changes are stably heritable from the mother to the daughter cell and that they generate phenotypic effects for selection to act on. Finally, it should not be forgotten that another genome is harboured within the cancer cell. The thousands of mitochondria present each carry a circular genome of approximately 17 Stratton et al. Page 2 Nature. Author manuscript; available in PMC 2010 February 15. E urope P M C F unders A uthor M anuscripts E urope P M C F unders A uthor M anuscripts kilobases. Somatic mutations in mitochondrial genomes have been reported in many human cancers, although their role in the development of the disease is not clear14. Acquisition of somatic mutations in cancer genomes The mutations found in a cancer cell genome have accumulated over the lifetime of the cancer patient. Some were acquired when ancestors of the cancer cell were biologically normal, showing no phenotypic characteristics of a cancer cell (Fig. 1). DNA in normal cells is continuously damaged by mutagens of both internal and external origins. Most of this damage is repaired. However, a small fraction may be converted into fixed mutations and DNA replication itself has a low intrinsic error rate. Our understanding of somatic mutation rates in normal human cells is still relatively rudimentary. However, it is likely that the mutation rates of each of the various structural classes of somatic mutation differ and that there are differences among cell types too. Mutation rates increase in the presence of substantial exogenous mutagenic exposures, for example tobacco smoke carcinogens, naturally occurring chemicals such as aflatoxins, which are produced by fungi, or various forms of radiation including ultraviolet light. These exposures are associated with increased rates of lung, liver and skin cancer, respectively, and somatic mutations within such cancers often exhibit the distinctive mutational signatures known to be associated with the mutagen15. The rates of the different classes of somatic mutation are also increased in several rare inherited diseases, for example Fanconi anaemia, ataxia telangiectasia, mosaic variegated aneuploidy and xeroderma pigmentosum, each of which is also associated with increased risks of cancer16,17. The rest of the somatic mutations in a cancer cell genome have been acquired during the segment of the cell lineage in which predecessors of the cancer cell already show phenotypic evidence of neoplastic change (Fig. 1). Whether the somatic mutation rate is always higher during this part of the lineage is controversial18,19. For some cancers this is clearly the case. For example, colorectal and endometrial cancers with defective DNA mismatch repair due to abnormalities in genes such as MLH1 and MSH2, exhibit increased rates of acquisition of single nucleotide changes and small insertions/deletions at polynucleotide tracts20. Other classes of such ‘mutator phenotypes’ may exist, for example leading to abnormalities in chromosome number or increased rates of genomic rearrangement, although these are generally less well characterized20. The merit of an increased somatic mutation rate with respect to the development of cancer is that it increases the DNA sequence diversity on which selection can act. However, it has been suggested that the mutation rates of normal cells may be sufficient to account for the development of some cancers, without the requirement for a mutator phenotype18,19. The course of mutation acquisition need not be smooth and predecessors of the cancer cell may suddenly acquire a large number of mutations. This is sometimes termed ‘crisis’21, and can occur after attrition of the telomeres that normally cap the ends of chromosomes, with the cell having to substantially reorganize its genome to survive. Although complex and potentially cryptic to decipher, the catalogue of somatic mutations present in a cancer cell therefore represents a cumulative archaeological record of all the mutational processes the cancer cell has experienced throughout the lifetime of the patient. It provides a rich, and predominantly unmined, source of information for cancer epidemiologists and biologists with which to interrogate the development of individual tumours. Stratton et al. Page 3 Nature. Author manuscript; available in PMC 2010 February 15. E urope P M C F unders A uthor M anuscripts E urope P M C F unders A uthor M anuscripts Driver and passenger mutations Each somatic mutation in a cancer cell genome, whatever its structural nature, may be classified according to its consequences for cancer development. ‘Driver’ mutations confer growth advantage on the cells carrying them and have been positively selected during the evolution of the cancer. They reside, by definition, in the subset of genes known as ‘cancer genes’. The remainder of mutations are ‘passengers’ that do not confer growth advantage, but happened to be present in an ancestor of the cancer cell when it acquired one of its drivers (see Box 1). The number of driver mutations, and hence the number of abnormal cancer genes, in an individual cancer is a central conceptual parameter of cancer development, but is not well established. It is highly likely that most cancers carry more than one driver and that the number varies between cancer types. On the basis of age–incidence statistics it has been suggested that common adult epithelial cancers such as breast, colorectal and prostate require 5–7 rate-limiting events, possibly equating to drivers, whereas cancers of the haematological system may require fewer22. These estimates are supported by experimental studies which show that engineering changes in the functions of at least five or six genes in normal primary human cells is necessary to convert them into cancer cells23. However, recent analyses of somatic mutation data from cancers indicate that the number of drivers might be much higher24. Ultimately, direct estimates of the number of drivers in individual cancers will be provided by identifying all the cancer genes and systematically measuring the prevalence of mutations in them. Box 1 | Driver and passenger mutations All cancers arise as a result of somatically acquired changes in the DNA of cancer cells. That does not mean, however, that all the somatic abnormalities present in a cancer genome have been involved in development of the cancer. Indeed, it is likely that some have made no contribution at all. To embody this concept, the terms ‘driver’ and ‘passenger’ mutation have been coined. A driver mutation is causally implicated in oncogenesis. It has conferred growth advantage on the cancer cell and has been positively selected in the microenvironment of the tissue in which the cancer arises. A driver mutation need not be required for maintenance of the final cancer (although it often is) but it must have been selected at some point along the lineage of cancer development shown in Fig. 1. A passenger mutation has not been selected, has not conferred clonal growth advantage and has therefore not contributed to cancer development. Passenger mutations are found within cancer genomes because somatic mutations without functional consequences often occur during cell division. Thus, a cell that acquires a driver mutation will already have biologically inert somatic mutations within its genome. These will be carried along in the clonal expansion that follows and therefore will be present in all cells of the final cancer. Some somatic mutations may actually impair cell survival. These will usually be subject to negative selection and hence be absent from the cancer genome. The traces of negative selection in cancer genomes are currently limited but it would be surprising if it was not operative. A central goal of cancer genome analysis is the identification of cancer genes that, by definition, carry driver mutations. A key challenge will therefore be to distinguish driver from passenger mutations. The main strategy generally used exploits a number of structural signatures associated with mutations that are under positive selection. For example, driver mutations cluster in the subset of genes that are cancer genes whereas Stratton et al. Page 4 Nature. Author manuscript; available in PMC 2010 February 15. E urope P M C F unders A uthor M anuscripts E urope P M C F unders A uthor M anuscripts passenger mutations are more or less randomly distributed. This has been the approach adopted fruitfully in the past to identify most somatically mutated cancer genes in studies targeted at small regions of the genome. Whole-genome sequencing, however, incorporating analysis of more than 20,000 protein-coding genes and unknown numbers of functional elements in intronic and intergenic DNA, presents a greater challenge, one rendered more daunting by the likelihood that passenger mutations in most cancer genomes substantially outnumber drivers. Because many cancer genes seem to contribute to cancer development in only a small fraction of tumours, large sample sets will have to be analysed to distinguish infrequently mutated cancer genes from genes with random clusters of passenger mutations. Furthermore, it is conceivable that some mutational processes are directed at specific genomic regions and thus generate clusters of passenger mutations that may be mistaken for drivers. Therefore, all such signatures of positive selection need to be interpreted with caution. In practice, however, used in an informed and critical manner they will remain effective and reliable guides to the identification of cancer genes. Investigation of the biological consequences of putative driver mutations will often consolidate the evidence implicating them in oncogenesis and will provide insight into the subverted biological processes by which they contribute to cancer development. One important subclass of driver is a mutation that confers resistance to cancer therapy (Fig. 1). These are typically found in recurrences of cancers that have initially responded to treatment but that are now resistant. Resistance mutations often confer limited growth advantage on the cancer cell in the absence of therapy. Some seem to predate initiation of treatment, existing as passengers in minor subclones of the cancer cell population until the selective environment is changed by the initiation of therapy25,26. The passenger is then converted into a driver and the resistant subclone preferentially expands, manifesting as the recurrence. The repertoire of somatically mutated cancer genes The identification of driver mutations and the cancer genes that they alter has been a central aim of cancer research for more than a quarter of a century. It has been a remarkably successful endeavour, with at least 350 (1.6\%) of the ~22,000 protein-coding genes in the human genome reported to show recurrent somatic mutations in cancer with strong evidence that these contribute to cancer development27 (http://www.sanger.ac.uk/genetics/CGP/ Census/). Most were identified by first establishing their physical location in the genome through low-resolution genome-wide screens, in particular cytogenetics for chromosomal translocations in leukaemias and lymphomas. A few were discovered using biological assays for transforming activity of whole cancer cell DNA and others through targeted mutational screens guided by biologically well-informed guesswork. Mutations in ~10\% of these genes are also found in the germ line, where they confer an increased risk of developing cancer, and these were often initially identified by genetic linkage analysis of affected families. The size of the full repertoire of human cancer genes is a matter of speculation. However, studies in mice have suggested that more than 2,000 genes, when appropriately altered, may have the potential to contribute to cancer development28. The known cancer genes run the gamut of tissue specificities and mutation prevalences. Some, for example TP53 and KRAS, are frequently mutated in diverse types of cancer whereas others are rare and/or restricted to one cancer type (http://www.sanger.ac.uk/ genetics/CGP/cosmic/). In some cancer types, for example colorectal and pancreatic cancer, Stratton et al. Page 5 Nature. Author manuscript; available in PMC 2010 February 15. E urope P M C F unders A uthor M anuscripts E urope P M C F unders A uthor M anuscripts http://www.sanger.ac.uk/genetics/CGP/Census/ http://www.sanger.ac.uk/genetics/CGP/Census/ http://www.sanger.ac.uk/genetics/CGP/cosmic/ http://www.sanger.ac.uk/genetics/CGP/cosmic/ abnormalities in several known cancer genes are common. In contrast, in gastric cancer, relatively few mutations in known cancer genes have been reported. Approximately 90\% of the known somatically mutated cancer genes are dominantly acting, that is, mutation of just one allele is sufficient to contribute to cancer development. The mutation in such cases usually results in activation of the encoded protein. Ten per cent act in a recessive manner, requiring mutation of both alleles, and the mutations usually result in abrogation of protein function (these are sometimes known as tumour suppressor genes). Patterns of mutation differ between dominant and recessive cancer genes. Recessive cancer genes are characterized by diverse mutation types, ranging from single base substitutions to whole gene deletions, which have the common outcome of abolishing the function of the encoded protein. In each dominantly acting cancer gene, however, the repertoire of cancer- causing somatic mutations is usually more constrained, both with respect to the type of mutation and its location in the gene. Missense amino acid changes (often restricted to certain key amino acids), in-frame insertions and deletions, and gene amplification are all common mutational mechanisms for activating dominantly acting cancer genes. Most, however, are activated through genomic rearrangement. This may join the sequences of two different genes to create a fusion gene or it may position the cancer gene adjacent to regulatory elements from elsewhere in the genome, resulting in abnormal expression patterns. Most of the known rearranged cancer genes are operative in the relatively rare subset of cancers constituted by leukaemias, lymphomas and sarcomas. Recently, however, rearranged cancer fusion genes were discovered in more than half of prostate cancer cases29 and in lung adenocarcinomas30. Their late discovery probably reflects the difficulty of identifying them amidst the jumble of passenger rearrangements present in many cancer genomes and hints that there are many more rearranged cancer genes to be found in common cancers. Much of what we know about the biological pathways and processes that are subverted in cancer has originated from experiments exploring the functions of cancer genes. Certain gene families, notably the protein kinases, feature particularly prominently among cancer genes. Furthermore, cancer genes cluster on certain signalling pathways. For example, in the classical MAPK/ERK pathway31 upstream mutations are found in cell-membrane-bound receptor tyrosine kinases such as EGFR, ERBB2, FGFR1, FGFR2, FGFR3, PDGFRA and PDGFRB and also in the downstream cytoplasmic components NF1, PTPN11, HRAS, KRAS, NRAS and BRAF. Recent exhaustive mutational analyses in gliomas have indicated that almost all cases have a mutation at one of the genes on these critical signalling pathways32. For some cancers, classification and treatment protocols are now defined by the presence of abnormal cancer genes. Acute myeloid leukaemia, for example, is subclassified on the basis of the presence of abnormalities involving specific cancer genes33. Each subtype has a characteristic gene expression profile, cellular morphology, clinical syndrome, prognosis and opportunity for targeted therapy. Moreover, because cancer cells are dependent on the abnormal proteins encoded by mutated cancer genes, they have become targets for the development of new cancer therapeutics. Flagships for this new generation of treatments include imatinib, an inhibitor of the proteins encoded by the ABL and KIT genes, which are mutated and activated, respectively, in chronic myeloid leukaemia34 and gastrointestinal stromal tumours35, and trastuzumab, an antibody directed against the protein encoded by ERBB2 (also known as HER2), which is commonly amplified and overexpressed in breast cancer36. Stratton et al. Page 6 Nature. Author manuscript; available in PMC 2010 February 15. E urope P M C F unders A uthor M anuscripts E urope P M C F unders A uthor M anuscripts Early systematic sequencing of cancer genomes Provision of the reference human genome sequence at the turn of the millennium offered new strategies and opportunities for surveying cancer genomes. Rather than depending on low-resolution maps, the highest possible resolution map, the DNA sequence itself, became available and has empowered investigation of cancer genomes in several ways. For example, much higher-resolution arrays have been developed, allowing finer mapping of copy number changes in cancer genomes leading to the identification of several new amplified cancer genes. The availability of the human genome sequence has also raised the possibility that DNA sequencing itself could become the primary tool for exploration of cancer genomes. This has prompted several pilot experiments. So far, most have sequenced large numbers of PCR products to detect the base substitutions and small insertions and deletions (collectively termed ‘point’ mutations) present in the coding exons of protein-coding genes32,37-44. Typically, such studies have covered several hundred megabases of cancer genome with designs ranging from hundreds of genes analysed in a few hundred cancers to most of the ~22,000 protein-coding genes in 10–20 examples of a particular cancer class. Several insights have been provided by these screens. They have brought success in the identification of point-mutated cancer genes including BRAF45, PIK3CA46, EGFR47, HER2 (ref. 48), JAK2 (ref. 49), UTX (ref. 50) and IDH1 (ref. 41). Some of these were unique discoveries, whereas others were simultaneously discovered in targeted mutational screens. Some were previously known cancer genes, but the discovery of point mutations highlighted new mechanisms and cancer types in which they are operative. Some were surprising and highlight the virtue of systematic and comprehensive screens, for example the discovery of the enzyme isocitrate dehydrogenase (IDH1), which constitutes part of the Krebs cycle of oxidative phosphorylation, as a cancer gene mutated in glioma41. Because many are kinases that are activated by the mutations found in cancer, they have prompted a wave of drug discovery to find inhibitors that may serve as anticancer therapeutics51, some of which are already in clinical trials. Exposing the landscape of the cancer genome Important insights into the general parameters and patterns of somatic mutation in cancer have also emerged from these early studies. It appears that most somatic point mutations in cancer genomes are passengers39. Although this might have been predicted for mutations in intergenic and intronic DNA, it applies even in protein-coding exons. There is, however, statistical evidence in favour of many more driver mutations than can be accounted for by known cancer genes. These drivers appear to be distributed across a large number of genes, each of which is mutated infrequently, suggesting that the repertoire of somatically mutated human cancer genes is much larger than the ~350 currently catalogued39,44. Conceivably, these infrequently mutated cancer genes confer less selective growth advantage on a clone of cancer cells than more commonly mutated cancer genes, but other explanations can also be invoked. Some analyses also indicate that there may be as many as 20 driver mutations in individual cancers, considerably more than the 5–7 previously predicted24. Understanding of the prevalence and types of somatic mutation in cancer genomes has been greatly fostered by these studies. Some cancer genomes carry >100,000 point mutations whereas others have fewer than 1,000. Some of this variation can be accounted for by previous heavy mutagenic exposures or the existence of known DNA repair defects. However, in a subset of breast cancers there are large numbers of C-to-G base substitutions, almost always occurring at cytosines that follow a thymine, for which there is no obvious Stratton et al. Page 7 Nature. Author manuscript; available in PMC 2010 February 15. E urope P M C F unders A uthor M anuscripts E urope P M C F unders A uthor M anuscripts explanation and for which unknown exposures and/or mutator phenotypes are presumably responsible42,43. The effects of chemotherapy on the cancer genome have also been revealed by systematic sequencing experiments. For example, gliomas that recur after treatment with the DNA alkylating agent temozolomide have been shown to carry huge numbers of mutations with a signature typical of such agents32,52,53. The fact that the mutations could be detected at all indicates that these recurrences are clonal. Thus, these studies indicate that, although temozolomide only confers a short increased lifespan for the patient, almost all cells in a glioma respond and a single cell that is resistant to the chemotherapy proliferates to form the recurrence. Additional studies guided by these observations led to the identification of the underlying mutated resistance gene52,53. Beyond point mutations, some investigations have begun to explore the features of genomic rearrangements in common cancers, about which remarkably little is known. Early studies using conventional Sanger sequencing indicated that there is substantial complexity of rearrangement in these genomes54,55. The recent advent of massively parallel, second- generation sequencing technologies has enabled more comprehensive genome-wide screens revealing that some cancer genomes carry hundreds of somatically acquired rearrangements, whereas others carry very few. Moreover, the distinctive patterns of rearrangement found indicate that currently uncharacterized mutational processes may be at work56. Sequencing of cancer genomes in the future The large-scale, systematic sequencing studies conducted so far have been constrained by the relatively low throughput and high cost of sequencing. They have therefore generally been restricted to components of the cancer genome (for example, coding exons), to small numbers of cancer samples or to a subset of the mutational classes present. In principle, however, all the structural classes of somatic mutation can be detected genome-wide by randomly fragmenting the cancer genome and sequencing large numbers … Frontiers in Genetics | www.frontiersin.org Edited by: Tao Huang, Shanghai Institutes for Biological Sciences (CAS), China Reviewed by: Jing Feng, Tianjin Medical University General Hospital, China Xiaoying Huang, Wenzhou Medical University, China *Correspondence: Zhengfu He [email protected] Enguo Chen [email protected] Specialty section: This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics Received: 11 October 2019 Accepted: 07 January 2020 Published: 04 February 2020 Citation: Zhang J, Hu H, Xu S, Jiang H, Zhu J, Qin E, He Z and Chen E (2020) The Functional Effects of Key Driver KRAS Mutations on Gene Expression in Lung Cancer. Front. Genet. 11:17. doi: 10.3389/fgene.2020.00017 ORIGINAL RESEARCH published: 04 February 2020 doi: 10.3389/fgene.2020.00017 The Functional Effects of Key Driver KRAS Mutations on Gene Expression in Lung Cancer Jisong Zhang1, Huihui Hu1, Shan Xu1, Hanliang Jiang1, Jihong Zhu2, E. Qin3, Zhengfu He4* and Enguo Chen1* 1 Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China, 2 Department of Anesthesiology, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China, 3 Department of Respiratory Medicine, Shaoxing People’s Hospital (Shaoxing Hospital, Zhejiang University School of Medicine), Shaoxing, China, 4 Department of Thoracic Surgery, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China Lung cancer is a common malignant cancer. Kirsten rat sarcoma oncogene (KRAS) mutations have been considered as a key driver for lung cancers. KRAS p.G12C mutations were most predominant in NSCLC which was comprised about 11–16\% of lung adenocarcinomas (p.G12C accounts for 45–50\% of mutant KRAS). But it is still not clear how the KRAS mutation triggers lung cancers. To study the molecular mechanisms of KRAS mutation in lung cancer. We analyzed the gene expression profiles of 156 KRAS mutation samples and other negative samples with two stage feature selection approach: (1) minimal Redundancy Maximal Relevance (mRMR) and (2) Incremental Feature Selection (IFS). At last, 41 predictive genes for KRAS mutation were identified and a KRAS mutation predictor was constructed. Its leave one out cross validation MCC was 0.879. Our results were helpful for understanding the roles of KRAS mutation in lung cancer. Keywords: Kirsten rat sarcoma oncogene (KRAS), mutation, lung cancer, predictor, gene expression INTRODUCTION Lung cancer, known as a malignant cancer which defined as the overgrowth of uncontrolled cell in lung tissues, has proved be a key cause of cancer death. Each year, 1.3 million people die of lung cancer (Jemal et al., 2006; Jemal et al., 2011). Non-small-cell lung cancer (NSCLC) accounts for more than 85\% of diagnosed lung cancer patients (Morgensztern et al., 2010). NSCLC can be further divided into adenocarcinoma, squamous cell carcinoma (SCC), and large cell carcinoma (Sandler et al., 2006; Morgensztern et al., 2010). At present, the pathogenesis of lung cancer is not very clear, but is generally believed that one of the most important reason is the accumulation of mutations including single nucleotide transformation, small fragments of insertions and deletions, the changes of copy number, and chromosome rearrangement. Moreover, these mutations are closed with cell proliferation, invasion, metastasis, and apoptosis (Scagliotti et al., 2008; Liu et al., 2012). So, studying mutations in living systems will be helpful to understand how mutations are associated with lung-cancer biological processes. February 2020 | Volume 11 | Article 171 https://www.frontiersin.org/article/10.3389/fgene.2020.00017/full https://www.frontiersin.org/article/10.3389/fgene.2020.00017/full https://www.frontiersin.org/article/10.3389/fgene.2020.00017/full https://loop.frontiersin.org/people/873947 https://loop.frontiersin.org/people/812887 https://www.frontiersin.org/journals/genetics http://www.frontiersin.org/ https://www.frontiersin.org/journals/genetics#articles http://creativecommons.org/licenses/by/4.0/ mailto:[email protected] mailto:[email protected] https://doi.org/10.3389/fgene.2020.00017 https://www.frontiersin.org/journals/genetics#editorial-board https://www.frontiersin.org/journals/genetics#editorial-board https://doi.org/10.3389/fgene.2020.00017 https://www.frontiersin.org/journals/genetics http://crossmark.crossref.org/dialog/?doi=10.3389/fgene.2020.00017&domain=pdf&date_stamp=2020-02-04 Zhang et al. Functional Effects of KRAS Mutations In the last decade, researchers have uncovered the source of one of the important mutations is called as Kirsten rat sarcoma oncogene (KRAS) mutations in lung cancers using molecular studies (Gautschi et al., 2007). KRAS is the principal isoform of RAS. KRAS p.G12C mutations were most predominant in NSCLC which was comprised about 11–16\% of lung adenocarcinomas (p.G12C accounts for 45–50\% of mutant KRAS) (Cox et al., 2014). Other common KRAS mutations in lung cancer are G12V and G12D. In other cancers, such as pancreatic cancer and colorectal cancer, KRAS mutations are also frequent. Based on the TCGA data in cBioPortal (Gao et al., 2013), the most frequent KRAS mutations in pancreatic cancer are G12D, G12V, and G12R; the most frequent KRAS mutations in colorectal cancer are G12D, G12V, and G13D. KRAS may be a good lung cancer therapeutic target for searching potential drugs. As above mentioned, mutations in KRAS is the most usual mutations that occur in lung cancer, especially in NSCLC (Mao et al., 1994; Mills et al., 1995; Nakamoto et al., 2001). KRAS mutation is more frequent in Caucasians than in Asians. Moreover, smokers may have more KRAS mutations than nonsmokers (Westcott and To, 2013; Ferrer et al., 2018). Single amino acid substitutions in codon 12 were most common KRAS mutations in NSCLC (Graziano et al., 1999). Therefore, the search for how the KRAS mutations affected the gene in lung cancer has been a long-standing goal in cancer biology. In this study, to study the functional effects of key driver KRAS mutations on gene expression in lung cancer, we analyzed the gene expression profiles of 156 lung cancer cell lines with KRAS mutations and other 3,582 lung cancer cell lines without KRAS mutations. Forty-one discriminative genes for KRAS mutations were identified using two stage feature selection approach: (1) minimal Redundancy Maximal Relevance (mRMR) and (2) Incremental Feature Selection (IFS). METHODS The Gene Expression Profiles of Cell Lines With and Without KRAS Mutations To identify the key genes that distinguishes key driver KRAS mutations from other mutations, we downloaded the gene expression profiles of 156 lung cancer cell lines with KRAS mutations as positive samples and other 3,582 lung cancer cell lines without KRAS mutations as negative samples from publicly available Gene Expression Omnibus (GEO) database under accession number of GSE83744 (Berger et al., 2016). The expression levels of 978 representative genes from Broad Institute Human L1000 landmark were measured. The L1000 landmark was derived from the Connectivity Map (CMap) project (Subramanian et al., 2017). CMap is a large gene- expression dataset of human cells perturbed with many chemicals and genetic reagents (Lamb et al., 2006). These 1,000 genes were sensitive to perturbations and can reflect 81\% of non- measured transcripts (Subramanian et al., 2017). Frontiers in Genetics | www.frontiersin.org 2 Two Stage Feature Selection Approach We applied two stage feature selection approach to select the biomarker genes. First, the genes were ranked based on not only their relevance with mutation samples, but also their redundancy among genes using the mRMR algorithm (Peng et al., 2005). It had a wide range of applications in bioinformatics for feature selection (Chen et al., 2018c; Chen et al., 2019e; Li and Huang, 2018; Li et al., 2019b; Wang and Huang, 2019a). As the equation shown below, Ωs, Ωt and Ω were the set of m selected genes, n to- be-selected genes, and all m+n genes, respectively. We use mutual information (I) to measure the relevance of the expression levels of gene g from Ωt with KRAS mutation status t (Huang and Cai, 2013):/> D = I g, tð Þ (1) Meanwhile, the redundancy R of the gene g with the selected genes in Ωs can be calculated as below: R = 1 m ∑gi ∈ WsI g, gið Þ � � (2) The optimal gene gj from Ωt with max relevance with KRAS mutation status t and min redundancy with the selected genes in Ωs can be selected by maximizing mRMR function listed below max gj ∈ Wt I gj, t � � − 1 m ∑gi ∈ Ws I gj, gi � �� �� � j = 1, 2, …, nð Þ (3) With N round evaluations, genes can be ranked as S = g 0 1, g 0 2, …, g 0 h, …, g 0 N, n o (4) The top ranked genes were associated with KRAS mutation status, and had little redundancy with other genes. Such genes were suitable for biomarkers. The top 200 genes were further analyzed at the second stage. The second stage was to determine the number of selected genes using the IFS method (Chen et al., 2018b; Chen et al., 2019b; Chen et al., 2019c; Chen et al., 2019d; Chen et al., 2019f; Li et al., 2019a; Pan et al., 2019a; Pan et al., 2019b; ). To do so, 200 classifiers were constructed using top 1, top 2, top 200 genes. The LOOCV (leave-one-out cross validation) MCC (Mathew’s correlation coefficient) of the top k-gene classifier was calculated each time. We tried several different classifiers: (1) SVM (Support Vector Machine) (Jiang et al., 2019; Yan et al., 2019; Chen et al., 2019a; Li et al., 2019a; Pan et al., 2019a; Wang and Huang, 2019b; Chen et al., 2019d), (2) 1NN (1 Nearest Neighbor) (Lei et al., 2013; Chen et al., 2016; Wang et al., 2017a), (3) 3NN (3 Nearest Neighbors), (4) 5NN (5 Nearest Neighbors), (5) Decision Tree (DT) (Huang et al., 2008; Huang et al., 2011; Chen et al., 2015), (6) Neural Network (NN) (Liu et al., 2017; Pan et al., 2018; Chen et al., 2019e). The function svm from R package e1071, function knn from R package class, function rpart from R package rpart, function nnet from R package nnet were used to apply these classification algorithms. February 2020 | Volume 11 | Article 17 https://www.frontiersin.org/journals/genetics http://www.frontiersin.org/ https://www.frontiersin.org/journals/genetics#articles Zhang et al. Functional Effects of KRAS Mutations Based on the IFS curve in which x-axis was the number of genes and y-axis was the corresponding LOOCV MCC, we can decide the best gene combinations we should select. The peak of the curve was the optimal selection. Prediction Performance Evaluation of the Classifier As we mentioned before, the prediction performance of each classifier was evaluated with leave-one-out cross validation (LOOCV) (Cui et al., 2013; Yang et al., 2014). It will go through N rounds and each sample will be tested during the N rounds. In each round, one sample will be tested using the model trained with the other N-1 samples. It can objectively evaluate all samples (Chou, 2011). The performance metrics, including Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), and Mathew’s correlation coefficient (MCC) were all calculated: Sn = TP TP + FN (5) Sp = TN TN + FP (6) ACC = TP + TN TP + TN + FP + FN (7) MCC = TP � TN − FP � FNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi TP + FPð Þ TP + FNð Þ TN + FPð Þ TN + FNð Þ p (8) where TP, TN, FP, and FN stand for the number of true positive samples, true negative samples, false positive samples, and false negative samples, respectively. Since the sizes of KRAS mutation + samples and KRAS mutation - samples were imbalance and MCC can trade-off sensitivity and specificity (Chen et al., 2018a; Li et al., 2018; Pan et al., 2018; Pan et al., 2019a; Pan et al., 2019b), MCC was used as the main performance metric. RESULTS AND DISCUSSION The Genes That Showed Different Expression Pattern Between KRAS Mutations From Other Mutations Samples The top 200 most informative genes for KRAS mutations were identified using the mRMR method which has been widely used in bioinformatics filed (Zhao et al., 2013; Zhang et al., 2016). The C/C++ version software written by Peng et al. (Peng et al., 2005; Best et al., 2017) (http://home.penglab.com/ proj/mRMR/) was used to apply the mRMR algorithm. Unlike the traditional statistical test based univariate feature selection methods, mRMR considers the relevance between gene expression and KRAS mutation status, and the redundancy among genes. Frontiers in Genetics | www.frontiersin.org 3 The Optimal Biomarkers Identified From the mRMR Gene List With IFS Methods After genes were ranked by mRMR, the IFS procedure was applied to find the optimal number of genes to be selected. The IFS curve in Figure 1 showed the relationship between the number of genes and their MCCs. The peak LOOCV MCCs of SVM, 1NN, 3NN, 5NN, DT, and NN were 0.858 with 8 genes, 0.853 with 48 genes, 0.879 with 41 genes, 0.878 with 59 genes, 0.871 with 69 genes, 0.842 with 174 genes. 3NN performed best. The corresponding 41 genes were shown in Table 1. The Prediction Metrics of the 41 Genes The 41 genes were chosen with two stage feature selection methods: mRMR and IFS. To more carefully evaluate their prediction power, we checked their confusion matrix which showed the overlaps between actual KRAS mutation status and predicted KRAS mutation status using 3NN (Table 2). The LOOCV sensitivity, specificity, accuracy, and MCC were 0.840, 0.997, 0.991, and 0.879, respectively. The Network Associations Between KRAS and the 41 Genes We searched KRAS and the eight genes in STRING database Version: 11.0 (https://string-db.org) and Figure 2 showed their functional association networks. It can be seen that 20 out of 41 genes (CCND3, CDK19, CEBPA, CEBPD, CSNK1E, CTSL, DUSP6, GRB10, HMGA2, MMP1, MTHFD2, NR3C1, PAK4, PMAIP1, RAP1GAP, SDHB, STX1A, TP53, TRIB3, UBE2L6) FIGURE 1 | The IFS curves of six different classifiers. The x-axis was the number of genes and the y-axis was the then leave one out cross validation (LOOCV) MCC. The red, blue, brown, black, orange, and purple curves were the IFS results of SVM, 1NN, 3NN, 5NN, DT, and NN, respectively. Peak LOOCV MCCs of SVM, 1NN, 3NN, 5NN, DT, and NN were 0.858 with 8 genes, 0.853 with 48 genes, 0.879 with 41 genes, 0.878 with 59 genes, 0.871 with 69 genes, 0.842 with 174 genes. 3NN performed best. Therefore, the corresponding 41 genes were finally selected. February 2020 | Volume 11 | Article 17 http://home.penglab.com/proj/mRMR/ http://home.penglab.com/proj/mRMR/ https://string-db.org https://www.frontiersin.org/journals/genetics http://www.frontiersin.org/ https://www.frontiersin.org/journals/genetics#articles Zhang et al. Functional Effects of KRAS Mutations had direct interactions with KRAS. The STRING network results supported that most of the 41 genes had direct interactions with KRAS. The Biological Significance of the Selected Genes in Lung Cancer As mentioned earlier, we used mRMR algorithm and IFS program to screen out 41 genes which may be molecular markers for identifying KARS mutations. Subsequently, we reviewed studies of these genes in lung cancer and other cancers with high frequency of KARS mutations such as colorectal and pancreatic cancer. In the study of Zhang X et al., Tribbles-3 (TRIB3) pseudokinase can activate the b-catenin signal pathway, which in turn promotes the proliferation and migration of NSCLC cells (Zhang et al., 2019). In addition, blocking the activity of TRIB3 may be one of the mechanisms for the treatment of lung cancer (Ding et al., 2018). Wang X et al. have found that PAK4 is significantly associated with poor prognosis of NSCLC (Wang et al., 2016b), and LIMK1 phosphorylation mediated by it regulates the migration and invasion of NSCLC. Therefore, PAK4 may be an important prognostic indicator and a potential molecular target for treatment of NSCLC (Cai et al., 2015). HMGA2 affects apoptosis and is highly expressed in metastatic LUAD through Caspase 3/9 and Bcl-2. It is also considered to be a biomarker and potential therapeutic target for lung cancer therapy (Kumar et al., 2014; Gao Frontiers in Genetics | www.frontiersin.org 4 et al., 2017b). A meta-analysis of lung cancer showed that metallo- proteinase 1 (MMP1)-16071G/2G polymorphism was a risk factor for lung cancer in Asians (Li et al., 2015). In addition, DUSP6 rs2279574 gene polymorphism is thought to predict the survival time of NSCLC patients after chemotherapy (Wang et al., 2016a). Cyclin D3 gene (CCND3) is a key cell cycle gene of NSCLC, which can promote the growth of LUAD (Zhang et al., 2017). Casein kinase I epsilon (CSNK1E), a circadian rhythm gene, whose genetic variation has a very significant correlation with the risk of lung cancer (Ortega and Mas-Oliva, 1986). CEPBA, can be used as a new tumor suppressor factor, Lu H et al. through clinical experiments, it was found that up-regulation of CEBPA is an effective method for the treatment of human NSCLC (Halmos et al., 2002; Lu et al., 2015). In addition, a comprehensive analysis of lung cancer genes by, Lv M shows that CEPBD may be involved in the development of lung cancer (Lv and Wang, 2015). TP53 mutation is very common in NSCLC and is considered to be a marker of poor prognosis and a prognostic indicator of lung cancer (Gao et al., 2017a; Labbe et al., 2017). Methylenetetrahydrofolate dehydrogenase 2 (MTHFD2) has redox homeostasis and can be used in the treatment of lung cancer (Nishimura et al., 2019). NR3C1 is reported to be involved in the pathways related to the biological process of lung cancer, and as a gene marker has a significant correlation with the survival of LUAD (Zhao et al., 2015; Luo et al., 2018). Cathepsin L1, as a protein was encoded by the CTSL1 gene, could reduce the cellular matrix and proteolytic cascades which resulting to promote invasion or metastatic activity (Duffy, 1996; Turk et al., 2012). Elevated expression of extracellular Cathepsin L was related with cancer progression of lung cancer cells (Okudela et al., 2016). Moreover, Cathepsin L is viewed as a downstream target of oncogenic KRAS mutations. The above genes have not only been proved to be closely related to the prognosis, diagnosis, and treatment of lung cancer, but also have a direct interaction with KRAS. Some of the 41 selected genes have no direct interaction with KRAS, but are considered to be involved in the occurrence and development of lung cancer. RBM6 protein is located at 3p21.3, and its expression changes regulate many of the most common abnormal splicing events in lung cancer (Sutherland et al., 2010; Coomer et al., 2019). The double up-regulation of RGS2 gene is related to the poor overall survival rate of patients with lung adenocarcinoma (Yin et al., 2016). Epigenetic silencing of BAMBI has been identified as a marker of NSCLC, and overexpression of BAMBI may become a new target for the treatment of this cancer (Marwitz et al., 2016; Wang et al., 2017b). Overexpression of PAFA-H1B1 can lead to the occurrence and poor prognosis of lung cancer (Lo et al., 2012). Collagen alpha-1(IV) chain (COL4A1), encoded by the COL4A1 gene, was found previously to play a crucial role in the coordinating alveolar morphogenesis and formatting the epithelium vasculature lung tissue (Abe et al., 2017). The Potential Roles of the Selected Genes in Other Cancers KRAS related genes are likely to be diagnostic, prognostic markers and therapeutic targets of lung cancer. We also TABLE 1 | The 41 genes selected by mRMR and IFS. Rank Gene Rank Gene 1 CTSL1 22 CCDC92 2 GNPDA1 23 BRP44 3 TRIB3 24 CDK19 4 STX1A 25 CD320 5 PHKA1 26 ATP1B1 6 CSNK1E 27 DRAP1 7 COL4A1 28 DUSP6 8 CEBPA 29 RAP1GAP 9 CEBPD 30 GALE 10 NSDHL 31 SSBP2 11 TP53 32 UBE2L6 12 MTHFD2 33 CCND3 13 RGS2 34 PAFAH1B1 14 NR3C1 35 RBM6 15 PPIC 36 C5 16 BAMBI 37 SDHB 17 PAK4 38 GRB10 18 FEZ2 39 UFM1 19 KTN1 40 ARL4C 20 HMGA2 41 PMAIP1 21 MMP1 TABLE 2 | The confusion matrix of actual sample classes and predicted sample classes using 3NN. Predicted KRAS mutation + Predicted KRAS mutation − Actual KRAS mutation + 131 25 Actual KRAS mutation − 10 3572 MCC = 0.879 Sensitivity = 0.840 Specificity = 0.997 February 2020 | Volume 11 | Article 17 https://www.frontiersin.org/journals/genetics http://www.frontiersin.org/ https://www.frontiersin.org/journals/genetics#articles Zhang et al. Functional Effects of KRAS Mutations looked for studies of these genes and KRAS high-frequency mutations in other cancers, mainly in colorectal and pancreatic cancer. According to Hua F et al., TRIB 3 gene knockout can reduce the occurrence of colon tumors in mice, reduce the migration of colorectal cancer cells, and reduce their growth in mouse transplanted tumors. The strategy of blocking the activity of TRIB3 can be used to treat colorectal cancer (Hua et al., 2019). Tyagi N et al. have found that PAK4 can maintain the stem cell phenotype of pancreatic cancer cells by activating STAT3 signal, which can be used as a new therapeutic target (Tyagi et al., 2016). TP53 mutation is associated with early stage of colorectal cancer (Laurent et al., 2011). There was a significant correlation between MMP1 and colon cancer mortality (Slattery and Lundgreen, 2014). Frontiers in Genetics | www.frontiersin.org 5 DATA AVAILABILITY STATEMENT We downloaded the blood gene expression profiles of 156 KRAS mutations as positive samples and other 3582 mutations as negative samples from publicly available GEO (Gene Expression Omnibus) under accession number of GSE83744. AUTHOR CONTRIBUTIONS JZha conceived and designed the study. HH and SX performed data analysis. HJ wrote the paper. JZhu, EC and ZH reviewed and edited the manuscript. JZha approved final version of the manuscript. All authors read and approved the manuscript. FIGURE 2 | The functional association network of KRAS and the selected genes based on STRING database. Twenty out of 41 genes (CCND3, CDK19, CEBPA, CEBPD, CSNK1E, CTSL, DUSP6, GRB10, HMGA2, MMP1, MTHFD2, NR3C1, PAK4, PMAIP1, RAP1GAP, SDHB, STX1A, TP53, TRIB3, UBE2L6) had direct interactions with KRAS. Each line represented an interaction supported by different evidences. The skype-blue, purple, green, red, blue, grass green, black, and navy-blue edges were interactions from curated databases, experiment, gene neighborhood, gene fusions, gene co-occurrence, text mining, co-expression, and protein homology, respectively. For more detailed explanations, please refer to STRING database (https://string-db.org). February 2020 | Volume 11 | Article 17 https://string-db.org https://www.frontiersin.org/journals/genetics http://www.frontiersin.org/ https://www.frontiersin.org/journals/genetics#articles Zhang et al. Functional Effects of KRAS Mutations FUNDING This study was supported by the Funds from Science Technology Department of Zhejiang Province (LGF19H010010), Medical and Health Research Foundation of Zhejiang Province Frontiers in Genetics | www.frontiersin.org 6 (2016ZDB005, 2017ZD020), China, WU JIEPING MEDICAL foundation (320.6750.19092-12), Beijing Xisike Clinical Oncology Research Foundation (Y-HS2017-037) and Medical Health and Scientific Technology Project of Zhejiang Province (2019RC182). Cox, A. D., Fesik, S. W., Kimmelman, A. C., Luo, J., and Der, C. J. (2014). Drugging REFERENCES Abe, Y., Matsuduka, A., Okanari, K., Miyahara, H., Kato, M., Miyatake, S., et al. (2017). A severe pulmonary complication in a patient with COL4A1-related disorder: a case report. Eur. J. Med. Genet. 60 (3), 169–171. doi: 10.1016/ j.ejmg.2016.12.008 Berger, A. H., Brooks, A. N., Wu, X., Shrestha, Y., Chouinard, C., Piccioni, F., et al. (2016). High-throughput phenotyping of lung cancer somatic mutations. Cancer Cell 30 (2), 214–228. doi: 10.1016/j.ccell.2016.06.022 Best, M. G., Sol, N., In ‘t Veld, S., Vancura, A., Muller, M., Niemeijer, A. N., et al. (2017). Swarm intelligence-enhanced detection of non-small-cell lung cancer using tumor-educated platelets. Cancer Cell 32 (2), 238–252.e239. doi: 10.1016/ j.ccell.2017.07.004 Cai, S., Ye, Z., Wang, X., Pan, Y., Weng, Y., Lao, S., et al. (2015). Overexpression of P21-activated kinase 4 is associated with poor prognosis in non-small cell lung cancer and promotes migration and invasion. J. Exp. Clin. Cancer Res. 34, 48. doi: 10.1186/s13046-015-0165-2 Chen, L., Chu, C., Huang, T., Kong, X., and Cai, Y. D. (2015). Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models. Amino Acids 47 (7), 1485–1493. doi: 10.1007/s00726- 015-1974-5 Chen, L., Zhang, Y. H., Huang, T., and Cai, Y. D. (2016). Gene expression profiling gut microbiota in different races of humans. Sci. Rep. 6, 23075. doi: 10.1038/ srep23075 Chen, L., Li, J., Zhang, Y. H., Feng, K., Wang, S., Zhang, Y., et al. (2018a). Identification of gene expression signatures across different types of neural stem cells with the Monte-Carlo feature selection method. J. Cell Biochem. 119 (4), 3394–3403. doi: 10.1002/jcb.26507 Chen, L., Zhang, Y.-H., Pan, X., Liu, M., Wang, S., Huang, T., et al. (2018b). Tissue Expression difference between mRNAs and lncRNAs. Int. J. Mol. Sci. 19 (11), 3416. doi: 10.3390/ijms19113416 Chen, L., Zhang, Y. H., Huang, G., Pan, X., Wang, S., Huang, T., et al. (2018c). Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection. Mol. Genet. Genomics 293 (1), 137–149. doi: 10.1007/s00438-017-1372-7 Chen, L., Pan, X., Zeng, T., Zhang, Y., Huang, T., and Cai, Y. (2019a). Identifying essential signature genes and expression rules associated with distinctive development stages of early embryonic cells. IEEE Access 7, 128570–128578. doi: 10.1109/ACCESS.2019.2939556 Chen, L., Pan, X., Zhang, Y.-h., Hu, X., Feng, K., Huang, T., et al. (2019b). Primary tumor site specificity is preserved in patient-derived tumor xenograft models. Front. In Genet. doi: 10.3389/fgene.2019.00738 Chen, L., Pan, X., Zhang, Y.-H., Huang, T., and Cai, Y.-D. (2019c). Analysis of gene expression differences between different pancreatic cells. ACS Omega 4 (4), 6421–6435. doi: 10.1021/acsomega.8b02171 Chen, L., Pan, X., Zhang, Y.-H., Kong, X., Huang, T., and Cai, Y.-D. (2019d). Tissue differences revealed by gene expression profiles of various cell lines. J. Cell. Biochem. 120 (5), 7068–7081. doi: 10.1002/jcb.27977 Chen, L., Pan, X., Zhang, Y.-H., Liu, M., Huang, T., and Cai, Y.-D. (2019e). Classification of widely and rarely expressed genes with recurrent neural network. Comput. Struct. Biotechnol. J. 17, 49–60. doi: 10.1016/j.csbj.2018.12.002 Chen, L., Zhang, S., Pan, X., Hu, X., Zhang, Y. H., Yuan, F., et al. (2019f). HIV infection alters the human epigenetic landscape. Gene Ther. 26 (1-2), 29–39. doi: 10.1038/s41434-018-0051-6 Chou, K. C. (2011). Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 273 (1), 236–247. doi: 10.1016/ j.jtbi.2010.12.024 Coomer, A. O., Black, F., Greystoke, A., Munkley, J., and Elliott, D. J. (2019). Alternative splicing in lung cancer. Biochim. Biophys. Acta Gene Regul. Mech. 1862 (11-12), 194388. doi: 10.1016/j.bbagrm.2019.05.006 the undruggable RAS: mission possible? Nat. Rev. Drug Discovery 13 (11), 828– 851. doi: 10.1038/nrd4389 Cui, W., Chen, L., Huang, T., Gao, Q., Jiang, M., Zhang, N., et al. (2013). Computationally identifying virulence factors based on KEGG pathways. Mol. Biosyst. 9 (6), 1447–1452. doi: 10.1039/c3mb70024k Ding, C. Z., Guo, X. F., Wang, G. L., Wang, H. T., Xu, G. H., Liu, Y. Y., et al. (2018). High glucose contributes to the proliferation and migration of non-small cell lung cancer cells via GAS5-TRIB3 axis. Biosci. Rep. 38 (2), BSR20171014. doi: 10.1042/BSR20171014 Duffy, M. J. (1996). PSA as a marker for prostate cancer: a critical review. Ann. Clin. Biochem. 33 (Pt 6), 511–519. doi: 10.1177/000456329603300604 Ferrer, I., Zugazagoitia, J., Herbertz, S., John, W., Paz-Ares, L., and Schmid- Bindert, G. (2018). KRAS-Mutant non-small cell lung cancer: From biology to therapy. Lung Cancer 124, 53–64. doi: 10.1016/j.lungcan.2018.07.013 Gao, J., Aksoy, B. A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S. O., et al. (2013). Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal 6 (269), pl1. doi: 10.1126/scisignal.2004088 Gao, W., Jin, J., Yin, J., Land, S., Gaither-Davis, A., Christie, N., et al. (2017a). KRAS and TP53 mutations in bronchoscopy samples from former lung cancer patients. Mol. Carcinog. 56 (2), 381–388. doi: 10.1002/mc.22501 Gao, X., Dai, M., Li, Q., Wang, Z., Lu, Y., and Song, Z. (2017b). HMGA2 regulates lung cancer proliferation and metastasis. Thorac. Cancer 8 (5), 501–510. doi: 10.1111/1759-7714.12476 Gautschi, O., Huegli, B., Ziegler, A., Gugger, M., Heighway, J., Ratschiller, D., et al. (2007). Origin and prognostic value of circulating KRAS mutations in lung cancer patients. Cancer Lett. 254 (2), 265–273. doi: 10.1016/ j.canlet.2007.03.008 Graziano, S. L., Gamble, G. P., Newman, N. B., Abbott, L. Z., Rooney, … ARTICLE Diversity spectrum analysis identifies mutation- specific effects of cancer driver genes Xiaobao Dong 1*, Dandan Huang2, Xianfu Yi3, Shijie Zhang4, Zhao Wang4, Bin Yan5,6, Pak Chung Sham 6, Kexin Chen7 & Mulin Jun Li1,4* Mutation-specific effects of cancer driver genes influence drug responses and the success of clinical trials. We reasoned that these effects could unbalance the distribution of each mutation across different cancer types, as a result, the cancer preference can be used to distinguish the effects of the causal mutation. Here, we developed a network-based frame- work to systematically measure cancer diversity for each driver mutation. We found that half of the driver genes harbor cancer type-specific and pancancer mutations simultaneously, suggesting that the pervasive functional heterogeneity of the mutations from even the same driver gene. We further demonstrated that the specificity of the mutations could influence patient drug responses. Moreover, we observed that diversity was generally increased in advanced tumors. Finally, we scanned potentially novel cancer driver genes based on the diversity spectrum. Diversity spectrum analysis provides a new approach to define driver mutations and optimize off-label clinical trials. https://doi.org/10.1038/s42003-019-0736-4 OPEN 1 Department of Genetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China. 2 Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China. 3 School of Biomedical Engineering, Tianjin Medical University, Tianjin, China. 4 Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China. 5 School of Biomedical Sciences, Department of Anesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China. 6 Centre of Genomics Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong SAR, China. 7 Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China. *email: [email protected]; [email protected] COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio 1 12 3 4 5 6 7 8 9 0 () :,; http://orcid.org/0000-0003-1652-117X http://orcid.org/0000-0003-1652-117X http://orcid.org/0000-0003-1652-117X http://orcid.org/0000-0003-1652-117X http://orcid.org/0000-0003-1652-117X http://orcid.org/0000-0002-2533-7270 http://orcid.org/0000-0002-2533-7270 http://orcid.org/0000-0002-2533-7270 http://orcid.org/0000-0002-2533-7270 http://orcid.org/0000-0002-2533-7270 mailto:[email protected] mailto:[email protected] www.nature.com/commsbio www.nature.com/commsbio C ancer-promoted genetic events and related genes (or so- called driver mutations and driver genes) have been not only successfully identified in most types of cancer but also linked to novel therapeutic opportunities, such as EGFR mutations to lung cancer, BRAF mutations to melanoma, and KIT mutations to gastrointestinal stromal tumors1,2. Off-label- targeted therapies, such as NCI-MATCH, aim at treating tumors across anatomical sites based on cancer genomic altera- tions3. However, cancer type-specific and mutation-specific oncogenic signaling has been observed in a number of recent clinical and preclinical studies4,5. The quantitative characteriza- tion of cancer type preference of driver mutations and their biological and clinical significance remains inadequate. Mutation-specific effects of driver mutations have been demonstrated in multiple well-characterized cancer driver genes6–13, which implies that the functional heterogeneities of driver mutations in the same cancer gene could be very common. For example, NRAS mutations at codons 12, 13, and 61 were characterized as driver mutations in many cancers. However, only the NRAS Q61 mutation can efficiently promote melanoma9. Recently, BRAF driver mutations were categorized into at least three classes with different kinase activity, RAS dependency, and dimer dependency6. More importantly, these mutation-specific effects seem tightly connected with the clinical features of patients. A multicenter clinical study10 on the efficacy of the HER kinase inhibitor neratinib showed that the responses of patients were determined by both cancer types and mutations, which is consistent with the conclusion of a previous clinical study14 in which the BRAF inhibitor vemurafenib was tested on patients from different cancer types but harboring BRAF V600 mutation. Thus, compared with sophisticated studies at the driver gene level, the development of a unified approach to define the role of each driver mutation will be important to deepen our under- standing of cancer genomics and guide clinical trial designs15,16. Much work has been done to characterize cancer drivers at a subgene resolution, including at the protein linear sequence, protein domain, protein 3D structure, and protein–protein interface levels17. While these methods can provide mutation- level classifications of driver mutations, all of them classify mutations based only on the molecular information of the gene/ protein itself and neglect their cancer context, thus may lead to misleading of the effects of mutations. Specifically, the roles of driver genes may vary with different cancer types18. Genome- wide screen experiments19 and a pancancer analysis of the evo- lutionary selection on driver mutations20 showed that this phe- nomenon exists widely. To precisely understand the functions of driver mutations, both the subgene resolution and cancer-context information need to be integrated. The mutation-specific effects, if they are functional, may unbalance the distribution of each driver mutation in different cancer types, such as NRAS Q61R, which is almost exclusively observed in melanoma. Given the cancer distributions of multiple driver mutations from one driver gene, we could distinguish their potential functional differences by comparing their cancer preferences. In this study, we developed a network-based framework to quantify and compare the cancer preference of driver mutations. By projecting mutations onto a cancer diversity spectrum, we can classify them into three categories, including cancer-specific (SPM), relatively specific (RSM), and pancancer mutations (PCM). The distribution of these mutations in protein domains, genes, and cellular pathways as well as their comutation patterns were systematically characterized. To demonstrate the potential value of the cancer diversity spectrum for clinical and biological problems, we leveraged this information to predict patient drug responses and identify new cancer driver genes. We finally developed a web portal to visualize the cancer diversity for driver mutations at http://mulinlab.org/firework. Results Network-based measurement of driver mutation specificity. We first characterized a compendium of driver mutations across 33 TCGA cancer types (see legend of Fig. 1) using more than three million somatic mutations from 10,429 patients. To maximally keep with the conventions of clinical genomic literature and minimize the influence of biased curation in the existing cancer genomics databases, we applied a rule-based approach to identify driver mutations (Supplementary Data 1) in well-characterized cancer driver genes (according to the records of the Cancer Gene Cen- sus18), which has been widely used in many clinical cancer studies21,22. For instance, a missense mutation in an oncogene (OG) would be taken as a driver mutation if it is highly recurrent in cancer patients (recurrence rule). In contrast, a frameshift insertion or damaging missense mutation would be selected as a driver only if this mutation is in a tumor suppressor gene (TSG) (damaging rule). We constructed a bipartite network (Fig. 1a) to summarize the relationships among patients and 33 cancer types from TCGA project, in which each patient or driver mutation was represented as a node and a patient and a driver mutation were connected if this mutation was detected in the patient. To improve the reliability of subsequent analyses for cancer diversity of mutations, mutations that occur less than three times on the whole TCGA dataset were removed from the network. The final patient–mutation network (Supplementary Fig. 1, Supplementary Files) contains 1570 mutations, 6286 patients (Fig. 1b), and 12,924 edges between them. These mutations belong to 314 cancer driver genes (Fig. 1c), and the highest contribution (16\%) is from TP53, which is the most frequently mutated gene in cancers23. However, there are no individual genes or cancer types that dominate the network. By compressing all patients from the same cancer type into one node (Fig. 2a), we investigated and visualized the similarity of mutations among all cancer types with force-directed layout algorithm24. This algorithm is an intuitive method to spatially organize network data within, usually, a two-dimensional plane. Nodes in the network will repel each other as they were like charged bubbles. On the other hand, each edge will act like a spring to pull a pair of connected nodes together. As the result, cancer types associated with similar driver mutation sets will be clustered and pushed away from other cancer types with different mutation profiles in the final network (Fig. 2a), which allows us to observe the similarity among these cancer types in a globally and flexible manner. The results showed that 79\% (26/33) of cancer types shared at least two driver mutations with other cancer types, and 54\% (18/33) of cancer types contained at least two private mutations. Cancer types belonging to the same tissues or organs were clustered together, such as two squamous cell carcinomas LUSC and HNSC or two brain cancers GBM and LGG, suggesting that the driver mutation profile can partly reflect the origin of cancers. Few driver mutations were shared with others for relatively rare cancer types, including ACC, CHOL, KICH, PCPG, SARC, THYM, and UVM, which might be attributed to both the small size of the patient cohorts and the distinct molecular characteristics of these cancers, such as the KICH compared with other kidney cancers25. Thus, shared and distinct driver mutations composed the patient–mutation networks, which motivated us to precisely quantify the tumor preference of each mutation. Specificity-based classification of driver mutations. We fol- lowed a network diversity approach26 to compute the preference of each mutation (Supplementary Data 2). The network diversity ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4 2 COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio http://mulinlab.org/firework www.nature.com/commsbio is an entropy-based index initially proposed to measure the relationship diversity of an individual in social networks. In our measurement, the network diversity values start from 0 to 1, and a higher value indicates that the mutation is observed in patients of multiple cancer types with a more similar possibility. If a mutation occurs in multiple cancer types and a cancer type dominates the cancer type composition, the network diversity value will be low. On the contrary, if the mutation occurrences among multiple cancer types are similar, the network diversity value will be high. For example, although both KRAS G12V and KRAS G12R occur in >5 different cancer types, their probabilistic distributions of cancer types are different. There are total 37 patients associated KRAS G12R in our data and above 75\% of them are PADD patients. In contrast, for the 176 patients asso- ciated with KRAS G12V, there are three cancer types occupy much of the composition (23\% of PADD, 22\% of LUAD, and 19\% of COAD). Thus, the network diversity value of it (G12V, network diversity = 0.40) is relatively high than KRAS G12R (network diversity = 0.28), representing a different cancer speci- ficity. Note that the network diversity was normalized so that a mutation with high frequency could be compared with a rare mutation directly, which is a merit required for the long-tailed distributed cancer mutation frequency. A continuum of network diversity values formed a cancer diversity spectrum comprising all driver mutations, allowing us to systematically classify and characterize the biological and clinical implications of these mutations. We found that there are three dominant peaks in the cancer diversity spectrum, which are distributed near network diversity values of 0, 0.5, and 1.0. This trimodal distribution suggests that Fig. 1 Measurement of the cancer distribution of driver mutations with network diversity (network diversity). a Driver mutations identified from patients of 33 cancer types are used to construct a patient–mutation bipartite network. The 33 cancer types include adrenocortical carcinoma (ACC), bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma, and endocervical adenocarcinoma (CESC), cholangiocarcinoma (CHOL), colon adenocarcinoma (COAD), lymphoid neoplasm diffuse large B-cell lymphoma (DLBC), esophageal carcinoma (ESCA), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), mesothelioma (MESO), ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma (PAAD), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), sarcoma (SARC), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), testicular germ cell tumors (TGCT), thyroid carcinoma (THCA), thymoma (THYM), uterine corpus endometrial carcinoma (UCEC), uterine carcinosarcoma (UCS), and uveal melanoma (UVM). Based on this network, the network diversity (ND) value of each mutation is calculated and mapped onto the cancer diversity spectrum. According to the spectrum, driver mutations are classified into specific, relatively specific and pancancer mutations. b The overall composition of cancer types in the patient–mutation network related to 1570 analyzed driver mutations in the study. c The genes that harbor the 1570 mutations and their relative contributions. COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4 ARTICLE COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio 3 www.nature.com/commsbio www.nature.com/commsbio Fig. 2 Classification of drive mutations and corresponding functional analysis. a The compressed patient–mutation network in which patients from same cancer types are summarized on a red node. Mutations have same connection pattern with cancer types are compressed into one blue node. The number in a blue node represents the number of mutations included in this node. Note that only node includes at least two mutations are shown. b The distribution of network diversity values on cancer diversity spectrum and classification of driver mutations. The mutations above the bar plot are the cases from corresponding categories. Different color nodes connected with a mutation represent patients from different cancer types. c The overlap of genes harboring the three types of driver mutations. The GO biological process enrichment results of the SPM (d), RSM (e), and PCM (f) enriched gene network are shown. ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4 4 COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio www.nature.com/commsbio driver mutations could be split into three distinct populations (Fig. 2b). Consequently, we classified the mutations into three categories using two theoretically estimated network diversity cutoffs 0.3 and 0.64 (see Methods for details), and generated three types mutations, 230 specific mutations (network diversity <0.3, SPMs), 622 RSMs (0.3 ≤ network diversity < 0.64), and 718 PCMs (network diversity ≥ 0.64). Of note, APC, EGFR, PTEN, SPOP, and LRP1B are the most frequent driver genes in the SPM category (Supplementary Fig. 2A). This category also includes many known biomarkers for cancer diagnosis or targeted treatment, such as APC Q1291* (for COAD), EGFR L858R (for LUAD), BRAF V600E (for THCA and SKCM), DNMT3A R882H and NPM1 W288Cfs*12 (for LAML). RSMs are exemplified by SF3B1 K700E, which was mostly observed in BRCA patients (9/ 15), but sporadic cases were observed in other cancer types (LAML, PRAD, SARC, SKCM and THYM) with low frequency. For this RSM category, TP53, PIK3CA, APC, and PTEN mutations were most common (Supplementary Fig. 2B). In contrast to the other two mutation classes, TP53 mutations significantly dominated the PCM spectrum (Fisher’s exact test, p value <0.001), which is consistent with a previous integrative study23 in 12 major cancer types that demonstrated that TP53 was the only gene mutated near half of the tumors (Supplemen- tary Fig. 2C). Driver genes that harbor multiple types of mutations are common. A total of 18\% of genes harbor three types of mutations and 50\% of genes harbor at least two types of mutations (Fig. 2c). Except for TP53 (Fisher’s exact test, p value <0.01, q < 0.01, Benjamini & Hochberg correction), there was no other driver gene significantly enriched in any specific category after multiple hypothesis correction (Supplementary Data 3). Thus, the functional heterogeneity of the mutations could be a common phenomenon from even the same cancer driver gene. More details about the associated cancer types of each mutation can be found in our web portal or Supplementary Data. To explore biological pathways involved in different categories, we constructed gene subnetworks by mapping the enriched genes of each category onto protein functional networks using the STRING database27 and performed a Gene Ontology (GO) enrichment analysis to nominate related pathways or biological processes (Fig. 2d–f, Supplementary Fig. 3, Supplementary Data 4–6). The functional analysis showed that DNA repair and cell cycle processes were generally observed in all three categories. However, some processes were specific, including signaling transduction processes, such as the ERK cascade and peptidyl-tyrosine modification, which are mainly enriched in the SPM gene network. Immune response genes are only enriched in the RSM gene network, and chromatin remodeling is the most prominent process for the PCM gene network. These results suggest that certain biological pathways could influence tumor- igenesis in specific tissues, while some pathways, such as epigenetic processes, might have a wide impact on tumorigenesis across many cancer types. Cancer diversity spectrum and patients’ drug responses. Pre- sumably, even if one driver gene contains multiple driver muta- tions with varied specificities, then these mutations should appear in separate protein domains corresponding to their specificity categories. To test this hypothesis, we annotated driver mutations in the functional protein domains of the driver gene by using the Uniprot database28. Although some domains were enriched with driver mutations, we unexpectedly found that the majority of them harbored more than two types of mutations in the same region (Fig. 3a). A typical example is the protein kinase domain of BRAF protein. In this domain, S467L, G469V, V600M, V600G, and V600E are SPMs, but K601E, G466E, G466V, G469R, G469A, N581S, and D594N are RSMs or PCMs. One possible explanation is that the annotations of the protein domain are either incomplete or inaccurate. However, to reject the previous hypothesis, we have to explain why mutations located at the same position could belong to different categories, as exemplified by the BRAF mutations G469V (SPM), G469R (RSM), and G469A (PCM) and the KRAS mutations G12C (SPM), G12R (SPM), G12D (RSM), and G12V (RSM). Previous biochemical studies on BRAF and SPOP mutations showed that driver mutations could induce very different biochemical behaviors of a protein and exhibit opposite pharmaceutical effects, although these mutations were very closed in linear sequence6,11. Our analysis also revealed that cancer diversity classification could distinguish drug response-related mutation effects in the same protein domain. For example, BRAF mutations that were sensitive to vemurafenib were classified as SPMs (V600M and V600E), and insensitive mutations were classified as RSMs or PCMs (G469A, G469R, G466V, G466E, N581S, D594N, and K601E)6. Similar to vemurafenib, SPOP mutations showed BET inhibitor sensitivities that were also consistent with our network diversity-based clas- sifications but in a reverse relationship. Ishikawa cells over- expressing the SPMs of SPOP, including Y87C, W131G, and F133L, were resistant to treatment with the BET inhibitor JQ1, while RSMs (R121Q and D140N) were sensitive11. To comprehensively investigate the association between the cancer diversity of mutations and antineoplastic therapy, we integrated the cancer diversity spectrum with drug response data predicted by an imputed drug-wide association study (IDWAS)29. IDWAS learned statistical models from cell line-based drug response data and gene expression profiles to predict 138 cancer drug responses for 5548 TCGA patients, which allows us to analyze the relationships of the cancer diversity of mutations and drug responses in an unbiased manner. Moreover, IDWAS only uses gene expression data, and its results are independent of gene mutation information. We evaluated whether there were different drug responses among patients harboring SPMs, RSMs, and PCMs in the same drug target (see Methods for details). Note that because the drug response data from IDWAS are predicted from a gene expression- based statistical model, the drug response values from IDWAS have no clearly defined biological meaning and are not directly comparable with traditional drug sensitivity values such as IC50 (drug concentration that reduces cell viability by 50\%); however, lower value means greater drug sensitivity. In approximately one- third of the tested drug–gene pairs (30/89), the drug response seemed influenced by the cancer diversity of mutations (ANOVA, p < 0.2, Supplementary Data 7), such as temsirolimus-BRAF (ANOVA, p = 0.005), afatinib-EGFR (ANOVA, p = 2.92 × 10−8), gemcitabine-KRAS (ANOVA, p = 0.0009), and AZD6482-PTEN (ANOVA, p = 0.118) (Fig. 3b). We also observed that drug sensitivity decreased as the cancer diversity of mutation increased in multiple cases. For example, patients with SPMs of KRAS were sensitive to gemcitabine, but the resistance was shown in patients with RSMs and PCMs. The same trend was observed in EGFR-mutated patients to erlotinib, BRAF-mutated patients to PLX4720, and PTEN-mutated patients to AZD6482. One exception is paclitaxel-KRAS, in which the drug sensitivity increased with mutation cancer diversity. When compared with the mutation-negative group (i.e., patients who did not harbor driver mutations on the corresponding drug target), the largest number of significantly differential drug responses (two-sided t- test, p < 0.05) were from SPMs, which were nearly twice or more than the observed number from RSMs or PCMs (Fig. 3c). We also overlapped driver mutations with actionable mutations collected from OncoKB30 and found that a majority of the actionable mutations belonged to SPMs (Fig. 3d, Supplementary Data 8). Overall, our results suggest that cancer diversity of mutations, COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4 ARTICLE COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio 5 www.nature.com/commsbio www.nature.com/commsbio Fig. 3 Distribution of three types of driver mutations in functional protein domains and the association of cancer diversity and drug sensitivity. a The distribution of driver mutations in the functional domains of three representative genes. The functional protein domains are annotated according to Uniport records. Three types of driver mutations were distinguished by the color and height of the dots in the lollipop plots. SPMs (blue and short), RSMs (green and middle height), and PCMs (red and high). b The drug sensitivity of patients harboring SPMs, RSMs, and PCMs, respectively. The red stars mark statistically significant groups when compared with corresponding negative groups (*p < 0.05, **p < 0.01, two-sided t-test). Drug sensitivity is predicted by IDWAS. c The number of drug-mutation combinations that are significantly associated with drug response. Drug sensitivity data are from IDWAS. d The composition of OncoKB evidence level in three types of mutations. From levels 1 to 4, the strength of evidence for clinical recommendation gradually decreased. ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4 6 COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio www.nature.com/commsbio especially for SPMs, are more correlated with patient drug responses, and such effects cannot be readily inferred from the functional domains of mutations. Cancer diversity spectrum and cancer evolution. To understand the impact of cancer evolution on the cancer diversity spectrum, we first examined the correlation between cancer diversity spec- trum and variant allele frequency (VAF) of driver mutations. VAF represents the burden of mutations in a patient and is used as an agent to quantify the relative size of tumor clones harboring certain driver mutations. A high VAF value in primary tumors usually implies that the corresponding mutation was from an early/founder clone. To exclude the confounders that might dis- tort VAF, we selected tumors with cancer cell purity >70\% and mutation data from copy number neutral regions. We computed the Pearson correlation coefficient (PCC) between mutations’ VAFs and network diversity values for genes with ten or more mutations (Fig. 4a, Supplementary Fig. 4). Among significant correlations, cancer diversity of mutations negatively correlates with VAFs in BRAF, KIT, PREX2, NRAS, and SF3B1 but posi- tively correlates with VAFs in FBXW7, KMT2D, NF1, and SPOP. After examining the mode of action of these driver genes, we found that OGs involved more negative correlation relationships, and TSGs included more positive correlation relationships (Fig. 4b). The average PCC values for OGs and TSGs are –0.08 and 0.01, respectively, but the difference between them is not significant (Wilcoxon sum-rank test, p value = 0.079). Con- sidering that high VAF generally indicates an early tumor clone, our results imply that a part of OG-related SPMs and tumor suppressor-related PCMs tend to occur in the early stage of tumorigenesis. To explore the pattern of different mutation types in the long- term cancer evolution, we compared the network diversity values of driver mutations between primary and advanced tumors. We used MSK-IMPACT data31 that include genetic aberrations of approximately 400 cancer-related genes from more than 10,000 patients with advanced tumors, representing the mutation landscape of the late stage of tumorigenesis. The mutational frequencies of genes in TCGA and MSK-IMPACT cohorts are highly consistent31. We calculated and compared the network diversity values of 625 common driver mutations between the TCGA and MSK-IMPACT groups (Fig. 4c, Supplementary Data 9). The results showed that 57\% (359/625) of the cancer diversity classifications of driver mutations were conserved. Nevertheless, 140 RSMs in TCGA increase their cancer diversity and covert to PCMs in MSK-IMPACT (Fig. 4d). Overall, the cancer diversity of mutations in advanced tumors was signifi- cantly higher than those in primary tumors (Fig. 4e). Interest- ingly, we found three mutations, EGFR L861Q, MAP2K4 S184L, and TP53 E285V, that were PCMs in TCGA but became SPMs in MSK-IMPACT tumors, suggesting that cancer-specific selection may drive them during the continuous progression of related tumors. A previous study related EGFR L861Q to the resistance of EGFR-TKI therapy in lung cancer7, which suggests that this improved cancer specificity in advanced tumors might be attributed to the result of selection during targeted cancer therapies. Taken together, the cancer diversity results of driver mutations not only can influence clonal evolution but also can be reshaped in cancer progression. Comutation patterns between mutations from different classes. It has been demonstrated that there are complex dependencies among driver mutations and that they are related to clonal evo- lution and the clinical prognosis of tumors32. We asked whether there are unique dependencies in mutations with different cancer type specificities. To answer this question, we performed comu- tation analysis for all driver mutation pairs and constructed …
CATEGORIES
Economics Nursing Applied Sciences Psychology Science Management Computer Science Human Resource Management Accounting Information Systems English Anatomy Operations Management Sociology Literature Education Business & Finance Marketing Engineering Statistics Biology Political Science Reading History Financial markets Philosophy Mathematics Law Criminal Architecture and Design Government Social Science World history Chemistry Humanities Business Finance Writing Programming Telecommunications Engineering Geography Physics Spanish ach e. Embedded Entrepreneurship f. Three Social Entrepreneurship Models g. Social-Founder Identity h. Micros-enterprise Development Outcomes Subset 2. Indigenous Entrepreneurship Approaches (Outside of Canada) a. Indigenous Australian Entrepreneurs Exami Calculus (people influence of  others) processes that you perceived occurs in this specific Institution Select one of the forms of stratification highlighted (focus on inter the intersectionalities  of these three) to reflect and analyze the potential ways these ( American history Pharmacology Ancient history . Also Numerical analysis Environmental science Electrical Engineering Precalculus Physiology Civil Engineering Electronic Engineering ness Horizons Algebra Geology Physical chemistry nt When considering both O lassrooms Civil Probability ions Identify a specific consumer product that you or your family have used for quite some time. This might be a branded smartphone (if you have used several versions over the years) or the court to consider in its deliberations. Locard’s exchange principle argues that during the commission of a crime Chemical Engineering Ecology aragraphs (meaning 25 sentences or more). Your assignment may be more than 5 paragraphs but not less. INSTRUCTIONS:  To access the FNU Online Library for journals and articles you can go the FNU library link here:  https://www.fnu.edu/library/ In order to n that draws upon the theoretical reading to explain and contextualize the design choices. Be sure to directly quote or paraphrase the reading ce to the vaccine. Your campaign must educate and inform the audience on the benefits but also create for safe and open dialogue. A key metric of your campaign will be the direct increase in numbers.  Key outcomes: The approach that you take must be clear Mechanical Engineering Organic chemistry Geometry nment Topic You will need to pick one topic for your project (5 pts) Literature search You will need to perform a literature search for your topic Geophysics you been involved with a company doing a redesign of business processes Communication on Customer Relations. Discuss how two-way communication on social media channels impacts businesses both positively and negatively. Provide any personal examples from your experience od pressure and hypertension via a community-wide intervention that targets the problem across the lifespan (i.e. includes all ages). Develop a community-wide intervention to reduce elevated blood pressure and hypertension in the State of Alabama that in in body of the report Conclusions References (8 References Minimum) *** Words count = 2000 words. *** In-Text Citations and References using Harvard style. *** In Task section I’ve chose (Economic issues in overseas contracting)" Electromagnetism w or quality improvement; it was just all part of good nursing care.  The goal for quality improvement is to monitor patient outcomes using statistics for comparison to standards of care for different diseases e a 1 to 2 slide Microsoft PowerPoint presentation on the different models of case management.  Include speaker notes... .....Describe three different models of case management. visual representations of information. They can include numbers SSAY ame workbook for all 3 milestones. You do not need to download a new copy for Milestones 2 or 3. When you submit Milestone 3 pages): Provide a description of an existing intervention in Canada making the appropriate buying decisions in an ethical and professional manner. Topic: Purchasing and Technology You read about blockchain ledger technology. Now do some additional research out on the Internet and share your URL with the rest of the class be aware of which features their competitors are opting to include so the product development teams can design similar or enhanced features to attract more of the market. The more unique low (The Top Health Industry Trends to Watch in 2015) to assist you with this discussion.         https://youtu.be/fRym_jyuBc0 Next year the $2.8 trillion U.S. healthcare industry will   finally begin to look and feel more like the rest of the business wo evidence-based primary care curriculum. Throughout your nurse practitioner program Vignette Understanding Gender Fluidity Providing Inclusive Quality Care Affirming Clinical Encounters Conclusion References Nurse Practitioner Knowledge Mechanics and word limit is unit as a guide only. The assessment may be re-attempted on two further occasions (maximum three attempts in total). All assessments must be resubmitted 3 days within receiving your unsatisfactory grade. You must clearly indicate “Re-su Trigonometry Article writing Other 5. June 29 After the components sending to the manufacturing house 1. In 1972 the Furman v. Georgia case resulted in a decision that would put action into motion. Furman was originally sentenced to death because of a murder he committed in Georgia but the court debated whether or not this was a violation of his 8th amend One of the first conflicts that would need to be investigated would be whether the human service professional followed the responsibility to client ethical standard.  While developing a relationship with client it is important to clarify that if danger or Ethical behavior is a critical topic in the workplace because the impact of it can make or break a business No matter which type of health care organization With a direct sale During the pandemic Computers are being used to monitor the spread of outbreaks in different areas of the world and with this record 3. Furman v. Georgia is a U.S Supreme Court case that resolves around the Eighth Amendments ban on cruel and unsual punishment in death penalty cases. The Furman v. Georgia case was based on Furman being convicted of murder in Georgia. Furman was caught i One major ethical conflict that may arise in my investigation is the Responsibility to Client in both Standard 3 and Standard 4 of the Ethical Standards for Human Service Professionals (2015).  Making sure we do not disclose information without consent ev 4. Identify two examples of real world problems that you have observed in your personal Summary & Evaluation: Reference & 188. Academic Search Ultimate Ethics We can mention at least one example of how the violation of ethical standards can be prevented. Many organizations promote ethical self-regulation by creating moral codes to help direct their business activities *DDB is used for the first three years For example The inbound logistics for William Instrument refer to purchase components from various electronic firms. During the purchase process William need to consider the quality and price of the components. In this case 4. A U.S. Supreme Court case known as Furman v. Georgia (1972) is a landmark case that involved Eighth Amendment’s ban of unusual and cruel punishment in death penalty cases (Furman v. Georgia (1972) With covid coming into place In my opinion with Not necessarily all home buyers are the same! When you choose to work with we buy ugly houses Baltimore & nationwide USA The ability to view ourselves from an unbiased perspective allows us to critically assess our personal strengths and weaknesses. This is an important step in the process of finding the right resources for our personal learning style. Ego and pride can be · By Day 1 of this week While you must form your answers to the questions below from our assigned reading material CliftonLarsonAllen LLP (2013) 5 The family dynamic is awkward at first since the most outgoing and straight forward person in the family in Linda Urien The most important benefit of my statistical analysis would be the accuracy with which I interpret the data. The greatest obstacle From a similar but larger point of view 4 In order to get the entire family to come back for another session I would suggest coming in on a day the restaurant is not open When seeking to identify a patient’s health condition After viewing the you tube videos on prayer Your paper must be at least two pages in length (not counting the title and reference pages) The word assimilate is negative to me. I believe everyone should learn about a country that they are going to live in. It doesnt mean that they have to believe that everything in America is better than where they came from. It means that they care enough Data collection Single Subject Chris is a social worker in a geriatric case management program located in a midsize Northeastern town. She has an MSW and is part of a team of case managers that likes to continuously improve on its practice. The team is currently using an I would start off with Linda on repeating her options for the child and going over what she is feeling with each option.  I would want to find out what she is afraid of.  I would avoid asking her any “why” questions because I want her to be in the here an Summarize the advantages and disadvantages of using an Internet site as means of collecting data for psychological research (Comp 2.1) 25.0\% Summarization of the advantages and disadvantages of using an Internet site as means of collecting data for psych Identify the type of research used in a chosen study Compose a 1 Optics effect relationship becomes more difficult—as the researcher cannot enact total control of another person even in an experimental environment. Social workers serve clients in highly complex real-world environments. Clients often implement recommended inte I think knowing more about you will allow you to be able to choose the right resources Be 4 pages in length soft MB-920 dumps review and documentation and high-quality listing pdf MB-920 braindumps also recommended and approved by Microsoft experts. The practical test g One thing you will need to do in college is learn how to find and use references. References support your ideas. College-level work must be supported by research. You are expected to do that for this paper. You will research Elaborate on any potential confounds or ethical concerns while participating in the psychological study 20.0\% Elaboration on any potential confounds or ethical concerns while participating in the psychological study is missing. Elaboration on any potenti 3 The first thing I would do in the family’s first session is develop a genogram of the family to get an idea of all the individuals who play a major role in Linda’s life. After establishing where each member is in relation to the family A Health in All Policies approach Note: The requirements outlined below correspond to the grading criteria in the scoring guide. At a minimum Chen Read Connecting Communities and Complexity: A Case Study in Creating the Conditions for Transformational Change Read Reflections on Cultural Humility Read A Basic Guide to ABCD Community Organizing Use the bolded black section and sub-section titles below to organize your paper. For each section Losinski forwarded the article on a priority basis to Mary Scott Losinksi wanted details on use of the ED at CGH. He asked the administrative resident