please check the attached documents. - Biology
Grading Rubric for Bioinformatics Midterm
Student:
Points
Available
Points
Earned
Format: Is the midterm written in the style of a term paper, and does it
compare and contrast the two papers read for the assignment?
10
Comprehension: Does the midterm provide evidence that suggests
the author comprehends key concepts presented in the papers,
including, for example, research questions addressed and major
findings?
10
Synthesis: Does the midterm present a combination of ideas that as a
whole reflect on the importance/relevance of the papers within its field
or for society?
10
Bioinformatics: Does the midterm include discussion of key
bioinformatic techniques, software, and/or databases that facilitated
the research?
10
Technical: Does the midterm provide evidence that the writer has a
fundamental technical understanding of the methods used?
10
Figures: Does the midterm include discussion of key figures that are
central to the papers read and present information related to how the
binoinformatic methods used enabled the formation of those figures?
Is an opinion on the effectiveness/quality of the figures provided?
10
Distinction: Does the midterm provide information that clearly
distinguishes the source of the various details discussed?
10
Connection: Does the midterm link the overall importance of the
scientific findings with a perspective that has relevance for the
writer/reader?
10
Conclusions: Does the midterm provide concluding statements that
reflect on the key themes presented in the writing?
10
Writing Style: Is the writing concise? Is it easy to understand, and are
logical connections between ideas made? Overall, does the writer use
correct spelling and proper grammar?
10
Point Total:
100
BIOL 57601 Bioinformatics Midterm Exam
Papers:
Read and consider the following papers that are found in the Bioinformatics Brightspace ‘Midterm Exam’
content folder:
Dong, X., Huang, D., Yi, X., Zhang, S., Wang, Z., Yan, B., Chung Sham, P., Chen, K., & Jun Li, M.
(2020). Diversity spectrum analysis identifies mutation-specific effects of cancer driver genes.
Communications biology, 3: e6. https://doi.org/10.1038/s42003-019-0736-4
Zhang, J., Hu, H., Xu, S., Jiang, H., Zhu, J., Qin, E., He, Z., & Chen, E. (2020). The Functional Effects of
Key Driver KRAS Mutations on Gene Expression in Lung Cancer. Frontiers in genetics, 11: e17.
https://doi.org/10.3389/fgene.2020.00017
Optional background:
https://www.cdc.gov/genomics/about/precision_med.htm
Stratton, M. R., Campbell, P. J., & Futreal, P. A. (2009). The cancer genome. Nature, 458: 719–724.
https://doi.org/10.1038/nature07943
Midterm Exam Assignment:
Focus on research: Each of the following questions should be addressed in a 2–3 page paper that will be
turned in on Sunday one week from today (due 24 Oct. @ 10:30pm CST). Compare and contrast the papers
of Zhang et al. 2020 and Dong et al. 2020. In essay format, please address the following question: Why do
the introductions from these papers suggest there is a need to carry out these types of studies? What
scientific question/s was/were asked in each paper, and why did the researchers select the particular areas
of research that was the focus of each of these papers? What are two major findings from each paper?
Which figures from these papers most directly reflect these major findings? Why do you say this? How are
the papers similar? How are they different?
Focus on bioinformatics: What general bioinformatic methods were used in each of these papers, and what
specific programs/methods were used? Where any specialized databases used? If so, what where they
specifically and how were they used? Consider key figures (see above) that are related to central parts of
the stories that both of these papers are telling, name the specific figures you will be focusing on, and then
discuss how the binoinformatic methods enabled the formation of those figures. What elements of these
figures help them tell the story? Do you consider the figures to be well executed, or are there elements that
you would improve on?
Wrap-up: Synthesize all the information you have reviewed here. What is the overall significance of these
papers and what are the implications of this work in this current era?
https://doi.org/10.1038/s42003-019-0736-4
https://doi.org/10.3389/fgene.2020.00017
https://www.cdc.gov/genomics/about/precision_med.htm
https://doi.org/10.1038/nature07943
The cancer genome
Michael R. Stratton1,2, Peter J. Campbell1,3, and P. Andrew Futreal1
1Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
2Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey SM2 5NG, UK
3Department of Haematology, University of Cambridge, Cambridge CB2 2XY, UK
Abstract
All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of
cancer cells. Over the past quarter of a century much has been learnt about these mutations and the
abnormal genes that operate in human cancers. We are now, however, moving into an era in which
it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes.
These studies will provide us with a detailed and comprehensive perspective on how individual
cancers have developed.
Cancer is responsible for one in eight deaths worldwide1. It encompasses more than 100
distinct diseases with diverse risk factors and epidemiology which originate from most of
the cell types and organs of the human body and which are characterized by relatively
unrestrained proliferation of cells that can invade beyond normal tissue boundaries and
metastasize to distant organs.
Early insights into the central role of the genome in cancer development emerged in the late
nineteenth and early twentieth centuries from studies by David von Hansemann2 and
Theodor Boveri3. Examining dividing cancer cells under the microscope, they observed the
presence of bizarre chromosomal aberrations. This led to the proposal that cancers are
abnormal clones of cells characterized by and caused by abnormalities of hereditary
material. Following the discovery of DNA as the molecular substrate of inheritance4 and
determination of its structure5, this speculation was supported by the demonstration that
agents that damage DNA and generate mutations also cause cancer6. Subsequently,
increasingly refined analyses of cancer cell chromosomes showed that specific and recurrent
genomic abnormalities, such as the translocation between chromosomes 9 and 22 in chronic
myeloid leukaemia (known as the ‘Philadelphia’ translocation7,8), are associated with
particular cancer types. Finally, it was demonstrated that introduction of total genomic DNA
from human cancers into phenotypically normal NIH3T3 cells could convert them into
cancer cells9,10. Isolation of the specific DNA segment responsible for this transforming
activity led to the identification of the first naturally occurring, human cancer-causing
sequence change—the single base G > T substitution that causes a glycine to valine
substitution in codon 12 of the HRAS gene11,12. This seminal discovery in 1982
inaugurated an era of vigorous searching for the abnormal genes underlying the
development of human cancer that continues today.
©2009 Macmillan Publishers Limited. All rights reserved
Correspondence and requests for materials should be addressed to M.R.S. ([email protected])..
Author Information Reprints and permissions information is available at www.nature.com/reprints
Europe PMC Funders Group
Author Manuscript
Nature. Author manuscript; available in PMC 2010 February 15.
Published in final edited form as:
Nature. 2009 April 9; 458(7239): 719–724. doi:10.1038/nature07943.
E
urope P
M
C
F
unders A
uthor M
anuscripts
E
urope P
M
C
F
unders A
uthor M
anuscripts
http://www.nature.com/reprints
Here we review the principles of our current understanding of cancer genomes. We look
forward to the explosion of information about cancer genomes that is imminent and the
insights into the process of oncogenesis that this promises to generate.
Cancer is an evolutionary process
All cancers are thought to share a common pathogenesis. Each is the outcome of a process
of Darwinian evolution occurring among cell populations within the microenvironments
provided by the tissues of a multicellular organism. Analogous to Darwinian evolution
occurring in the origins of species, cancer development is based on two constituent
processes, the continuous acquisition of heritable genetic variation in individual cells by
more-or-less random mutation and natural selection acting on the resultant phenotypic
diversity. The selection may weed out cells that have acquired deleterious mutations or it
may foster cells carrying alterations that confer the capability to proliferate and survive more
effectively than their neighbours. Within an adult human there are probably thousands of
minor winners of this ongoing competition, most of which have limited abnormal growth
potential and are invisible or manifest as common benign growths such as skin moles.
Occasionally, however, a single cell acquires a set of sufficiently advantageous mutations
that allows it to proliferate autonomously, invade tissues and metastasize.
The catalogue of somatic mutations in a cancer genome
Like all the cells that constitute the human body, a cancer cell is a direct descendant, through
a lineage of mitotic cell divisions, of the fertilized egg from which the cancer patient
developed and therefore carries a copy of its diploid genome (Fig. 1). However, the DNA
sequence of a cancer cell genome, and indeed of most normal cell genomes, has acquired a
set of differences from its progenitor fertilized egg. These are collectively termed somatic
mutations to distinguish them from germline mutations that are inherited from parents and
transmitted to offspring.
The somatic mutations in a cancer cell genome may encompass several distinct classes of
DNA sequence change. These include substitutions of one base by another; insertions or
deletions of small or large segments of DNA; rearrangements, in which DNA has been
broken and then rejoined to a DNA segment from elsewhere in the genome; copy number
increases from the two copies present in the normal diploid genome, sometimes to several
hundred copies (known as gene amplification); and copy number reductions that may result
in complete absence of a DNA sequence from the cancer genome (Fig. 2).
In addition, the cancer cell may have acquired, from exogenous sources, completely new
DNA sequences, notably those of viruses such as human papilloma virus, Epstein Barr virus,
hepatitis B virus, human T lymphotropic virus 1 and human herpes virus 8, each of which is
known to contribute to the genesis of one or more type of cancer13.
Compared to the fertilized egg, the cancer genome will also have acquired epigenetic
changes which alter chromatin structure and gene expression, and which manifest at DNA
sequence level by changes in the methylation status of some cytosine residues. Epigenetic
changes can be subject to the same Darwinian natural selection as genetic events, provided
that there is epigenetic variation in the population of competing cells, that the epigenetic
changes are stably heritable from the mother to the daughter cell and that they generate
phenotypic effects for selection to act on.
Finally, it should not be forgotten that another genome is harboured within the cancer cell.
The thousands of mitochondria present each carry a circular genome of approximately 17
Stratton et al. Page 2
Nature. Author manuscript; available in PMC 2010 February 15.
E
urope P
M
C
F
unders A
uthor M
anuscripts
E
urope P
M
C
F
unders A
uthor M
anuscripts
kilobases. Somatic mutations in mitochondrial genomes have been reported in many human
cancers, although their role in the development of the disease is not clear14.
Acquisition of somatic mutations in cancer genomes
The mutations found in a cancer cell genome have accumulated over the lifetime of the
cancer patient. Some were acquired when ancestors of the cancer cell were biologically
normal, showing no phenotypic characteristics of a cancer cell (Fig. 1). DNA in normal cells
is continuously damaged by mutagens of both internal and external origins. Most of this
damage is repaired. However, a small fraction may be converted into fixed mutations and
DNA replication itself has a low intrinsic error rate. Our understanding of somatic mutation
rates in normal human cells is still relatively rudimentary. However, it is likely that the
mutation rates of each of the various structural classes of somatic mutation differ and that
there are differences among cell types too. Mutation rates increase in the presence of
substantial exogenous mutagenic exposures, for example tobacco smoke carcinogens,
naturally occurring chemicals such as aflatoxins, which are produced by fungi, or various
forms of radiation including ultraviolet light. These exposures are associated with increased
rates of lung, liver and skin cancer, respectively, and somatic mutations within such cancers
often exhibit the distinctive mutational signatures known to be associated with the
mutagen15. The rates of the different classes of somatic mutation are also increased in
several rare inherited diseases, for example Fanconi anaemia, ataxia telangiectasia, mosaic
variegated aneuploidy and xeroderma pigmentosum, each of which is also associated with
increased risks of cancer16,17.
The rest of the somatic mutations in a cancer cell genome have been acquired during the
segment of the cell lineage in which predecessors of the cancer cell already show phenotypic
evidence of neoplastic change (Fig. 1). Whether the somatic mutation rate is always higher
during this part of the lineage is controversial18,19. For some cancers this is clearly the
case. For example, colorectal and endometrial cancers with defective DNA mismatch repair
due to abnormalities in genes such as MLH1 and MSH2, exhibit increased rates of
acquisition of single nucleotide changes and small insertions/deletions at polynucleotide
tracts20. Other classes of such ‘mutator phenotypes’ may exist, for example leading to
abnormalities in chromosome number or increased rates of genomic rearrangement,
although these are generally less well characterized20. The merit of an increased somatic
mutation rate with respect to the development of cancer is that it increases the DNA
sequence diversity on which selection can act. However, it has been suggested that the
mutation rates of normal cells may be sufficient to account for the development of some
cancers, without the requirement for a mutator phenotype18,19.
The course of mutation acquisition need not be smooth and predecessors of the cancer cell
may suddenly acquire a large number of mutations. This is sometimes termed ‘crisis’21, and
can occur after attrition of the telomeres that normally cap the ends of chromosomes, with
the cell having to substantially reorganize its genome to survive.
Although complex and potentially cryptic to decipher, the catalogue of somatic mutations
present in a cancer cell therefore represents a cumulative archaeological record of all the
mutational processes the cancer cell has experienced throughout the lifetime of the patient. It
provides a rich, and predominantly unmined, source of information for cancer
epidemiologists and biologists with which to interrogate the development of individual
tumours.
Stratton et al. Page 3
Nature. Author manuscript; available in PMC 2010 February 15.
E
urope P
M
C
F
unders A
uthor M
anuscripts
E
urope P
M
C
F
unders A
uthor M
anuscripts
Driver and passenger mutations
Each somatic mutation in a cancer cell genome, whatever its structural nature, may be
classified according to its consequences for cancer development. ‘Driver’ mutations confer
growth advantage on the cells carrying them and have been positively selected during the
evolution of the cancer. They reside, by definition, in the subset of genes known as ‘cancer
genes’. The remainder of mutations are ‘passengers’ that do not confer growth advantage,
but happened to be present in an ancestor of the cancer cell when it acquired one of its
drivers (see Box 1).
The number of driver mutations, and hence the number of abnormal cancer genes, in an
individual cancer is a central conceptual parameter of cancer development, but is not well
established. It is highly likely that most cancers carry more than one driver and that the
number varies between cancer types. On the basis of age–incidence statistics it has been
suggested that common adult epithelial cancers such as breast, colorectal and prostate
require 5–7 rate-limiting events, possibly equating to drivers, whereas cancers of the
haematological system may require fewer22. These estimates are supported by experimental
studies which show that engineering changes in the functions of at least five or six genes in
normal primary human cells is necessary to convert them into cancer cells23. However,
recent analyses of somatic mutation data from cancers indicate that the number of drivers
might be much higher24. Ultimately, direct estimates of the number of drivers in individual
cancers will be provided by identifying all the cancer genes and systematically measuring
the prevalence of mutations in them.
Box 1 | Driver and passenger mutations
All cancers arise as a result of somatically acquired changes in the DNA of cancer cells.
That does not mean, however, that all the somatic abnormalities present in a cancer
genome have been involved in development of the cancer. Indeed, it is likely that some
have made no contribution at all. To embody this concept, the terms ‘driver’ and
‘passenger’ mutation have been coined.
A driver mutation is causally implicated in oncogenesis. It has conferred growth
advantage on the cancer cell and has been positively selected in the microenvironment of
the tissue in which the cancer arises. A driver mutation need not be required for
maintenance of the final cancer (although it often is) but it must have been selected at
some point along the lineage of cancer development shown in Fig. 1.
A passenger mutation has not been selected, has not conferred clonal growth advantage
and has therefore not contributed to cancer development. Passenger mutations are found
within cancer genomes because somatic mutations without functional consequences often
occur during cell division. Thus, a cell that acquires a driver mutation will already have
biologically inert somatic mutations within its genome. These will be carried along in the
clonal expansion that follows and therefore will be present in all cells of the final cancer.
Some somatic mutations may actually impair cell survival. These will usually be subject
to negative selection and hence be absent from the cancer genome. The traces of negative
selection in cancer genomes are currently limited but it would be surprising if it was not
operative.
A central goal of cancer genome analysis is the identification of cancer genes that, by
definition, carry driver mutations. A key challenge will therefore be to distinguish driver
from passenger mutations. The main strategy generally used exploits a number of
structural signatures associated with mutations that are under positive selection. For
example, driver mutations cluster in the subset of genes that are cancer genes whereas
Stratton et al. Page 4
Nature. Author manuscript; available in PMC 2010 February 15.
E
urope P
M
C
F
unders A
uthor M
anuscripts
E
urope P
M
C
F
unders A
uthor M
anuscripts
passenger mutations are more or less randomly distributed. This has been the approach
adopted fruitfully in the past to identify most somatically mutated cancer genes in studies
targeted at small regions of the genome.
Whole-genome sequencing, however, incorporating analysis of more than 20,000
protein-coding genes and unknown numbers of functional elements in intronic and
intergenic DNA, presents a greater challenge, one rendered more daunting by the
likelihood that passenger mutations in most cancer genomes substantially outnumber
drivers. Because many cancer genes seem to contribute to cancer development in only a
small fraction of tumours, large sample sets will have to be analysed to distinguish
infrequently mutated cancer genes from genes with random clusters of passenger
mutations. Furthermore, it is conceivable that some mutational processes are directed at
specific genomic regions and thus generate clusters of passenger mutations that may be
mistaken for drivers.
Therefore, all such signatures of positive selection need to be interpreted with caution. In
practice, however, used in an informed and critical manner they will remain effective and
reliable guides to the identification of cancer genes. Investigation of the biological
consequences of putative driver mutations will often consolidate the evidence implicating
them in oncogenesis and will provide insight into the subverted biological processes by
which they contribute to cancer development.
One important subclass of driver is a mutation that confers resistance to cancer therapy (Fig.
1). These are typically found in recurrences of cancers that have initially responded to
treatment but that are now resistant. Resistance mutations often confer limited growth
advantage on the cancer cell in the absence of therapy. Some seem to predate initiation of
treatment, existing as passengers in minor subclones of the cancer cell population until the
selective environment is changed by the initiation of therapy25,26. The passenger is then
converted into a driver and the resistant subclone preferentially expands, manifesting as the
recurrence.
The repertoire of somatically mutated cancer genes
The identification of driver mutations and the cancer genes that they alter has been a central
aim of cancer research for more than a quarter of a century. It has been a remarkably
successful endeavour, with at least 350 (1.6\%) of the ~22,000 protein-coding genes in the
human genome reported to show recurrent somatic mutations in cancer with strong evidence
that these contribute to cancer development27 (http://www.sanger.ac.uk/genetics/CGP/
Census/). Most were identified by first establishing their physical location in the genome
through low-resolution genome-wide screens, in particular cytogenetics for chromosomal
translocations in leukaemias and lymphomas. A few were discovered using biological assays
for transforming activity of whole cancer cell DNA and others through targeted mutational
screens guided by biologically well-informed guesswork. Mutations in ~10\% of these genes
are also found in the germ line, where they confer an increased risk of developing cancer,
and these were often initially identified by genetic linkage analysis of affected families. The
size of the full repertoire of human cancer genes is a matter of speculation. However, studies
in mice have suggested that more than 2,000 genes, when appropriately altered, may have
the potential to contribute to cancer development28.
The known cancer genes run the gamut of tissue specificities and mutation prevalences.
Some, for example TP53 and KRAS, are frequently mutated in diverse types of cancer
whereas others are rare and/or restricted to one cancer type (http://www.sanger.ac.uk/
genetics/CGP/cosmic/). In some cancer types, for example colorectal and pancreatic cancer,
Stratton et al. Page 5
Nature. Author manuscript; available in PMC 2010 February 15.
E
urope P
M
C
F
unders A
uthor M
anuscripts
E
urope P
M
C
F
unders A
uthor M
anuscripts
http://www.sanger.ac.uk/genetics/CGP/Census/
http://www.sanger.ac.uk/genetics/CGP/Census/
http://www.sanger.ac.uk/genetics/CGP/cosmic/
http://www.sanger.ac.uk/genetics/CGP/cosmic/
abnormalities in several known cancer genes are common. In contrast, in gastric cancer,
relatively few mutations in known cancer genes have been reported.
Approximately 90\% of the known somatically mutated cancer genes are dominantly acting,
that is, mutation of just one allele is sufficient to contribute to cancer development. The
mutation in such cases usually results in activation of the encoded protein. Ten per cent act
in a recessive manner, requiring mutation of both alleles, and the mutations usually result in
abrogation of protein function (these are sometimes known as tumour suppressor genes).
Patterns of mutation differ between dominant and recessive cancer genes. Recessive cancer
genes are characterized by diverse mutation types, ranging from single base substitutions to
whole gene deletions, which have the common outcome of abolishing the function of the
encoded protein. In each dominantly acting cancer gene, however, the repertoire of cancer-
causing somatic mutations is usually more constrained, both with respect to the type of
mutation and its location in the gene. Missense amino acid changes (often restricted to
certain key amino acids), in-frame insertions and deletions, and gene amplification are all
common mutational mechanisms for activating dominantly acting cancer genes. Most,
however, are activated through genomic rearrangement. This may join the sequences of two
different genes to create a fusion gene or it may position the cancer gene adjacent to
regulatory elements from elsewhere in the genome, resulting in abnormal expression
patterns. Most of the known rearranged cancer genes are operative in the relatively rare
subset of cancers constituted by leukaemias, lymphomas and sarcomas. Recently, however,
rearranged cancer fusion genes were discovered in more than half of prostate cancer cases29
and in lung adenocarcinomas30. Their late discovery probably reflects the difficulty of
identifying them amidst the jumble of passenger rearrangements present in many cancer
genomes and hints that there are many more rearranged cancer genes to be found in common
cancers.
Much of what we know about the biological pathways and processes that are subverted in
cancer has originated from experiments exploring the functions of cancer genes. Certain
gene families, notably the protein kinases, feature particularly prominently among cancer
genes. Furthermore, cancer genes cluster on certain signalling pathways. For example, in the
classical MAPK/ERK pathway31 upstream mutations are found in cell-membrane-bound
receptor tyrosine kinases such as EGFR, ERBB2, FGFR1, FGFR2, FGFR3, PDGFRA and
PDGFRB and also in the downstream cytoplasmic components NF1, PTPN11, HRAS,
KRAS, NRAS and BRAF. Recent exhaustive mutational analyses in gliomas have indicated
that almost all cases have a mutation at one of the genes on these critical signalling
pathways32.
For some cancers, classification and treatment protocols are now defined by the presence of
abnormal cancer genes. Acute myeloid leukaemia, for example, is subclassified on the basis
of the presence of abnormalities involving specific cancer genes33. Each subtype has a
characteristic gene expression profile, cellular morphology, clinical syndrome, prognosis
and opportunity for targeted therapy. Moreover, because cancer cells are dependent on the
abnormal proteins encoded by mutated cancer genes, they have become targets for the
development of new cancer therapeutics. Flagships for this new generation of treatments
include imatinib, an inhibitor of the proteins encoded by the ABL and KIT genes, which are
mutated and activated, respectively, in chronic myeloid leukaemia34 and gastrointestinal
stromal tumours35, and trastuzumab, an antibody directed against the protein encoded by
ERBB2 (also known as HER2), which is commonly amplified and overexpressed in breast
cancer36.
Stratton et al. Page 6
Nature. Author manuscript; available in PMC 2010 February 15.
E
urope P
M
C
F
unders A
uthor M
anuscripts
E
urope P
M
C
F
unders A
uthor M
anuscripts
Early systematic sequencing of cancer genomes
Provision of the reference human genome sequence at the turn of the millennium offered
new strategies and opportunities for surveying cancer genomes. Rather than depending on
low-resolution maps, the highest possible resolution map, the DNA sequence itself, became
available and has empowered investigation of cancer genomes in several ways. For example,
much higher-resolution arrays have been developed, allowing finer mapping of copy number
changes in cancer genomes leading to the identification of several new amplified cancer
genes.
The availability of the human genome sequence has also raised the possibility that DNA
sequencing itself could become the primary tool for exploration of cancer genomes. This has
prompted several pilot experiments. So far, most have sequenced large numbers of PCR
products to detect the base substitutions and small insertions and deletions (collectively
termed ‘point’ mutations) present in the coding exons of protein-coding genes32,37-44.
Typically, such studies have covered several hundred megabases of cancer genome with
designs ranging from hundreds of genes analysed in a few hundred cancers to most of the
~22,000 protein-coding genes in 10–20 examples of a particular cancer class.
Several insights have been provided by these screens. They have brought success in the
identification of point-mutated cancer genes including BRAF45, PIK3CA46, EGFR47,
HER2 (ref. 48), JAK2 (ref. 49), UTX (ref. 50) and IDH1 (ref. 41). Some of these were
unique discoveries, whereas others were simultaneously discovered in targeted mutational
screens. Some were previously known cancer genes, but the discovery of point mutations
highlighted new mechanisms and cancer types in which they are operative. Some were
surprising and highlight the virtue of systematic and comprehensive screens, for example the
discovery of the enzyme isocitrate dehydrogenase (IDH1), which constitutes part of the
Krebs cycle of oxidative phosphorylation, as a cancer gene mutated in glioma41. Because
many are kinases that are activated by the mutations found in cancer, they have prompted a
wave of drug discovery to find inhibitors that may serve as anticancer therapeutics51, some
of which are already in clinical trials.
Exposing the landscape of the cancer genome
Important insights into the general parameters and patterns of somatic mutation in cancer
have also emerged from these early studies. It appears that most somatic point mutations in
cancer genomes are passengers39. Although this might have been predicted for mutations in
intergenic and intronic DNA, it applies even in protein-coding exons. There is, however,
statistical evidence in favour of many more driver mutations than can be accounted for by
known cancer genes. These drivers appear to be distributed across a large number of genes,
each of which is mutated infrequently, suggesting that the repertoire of somatically mutated
human cancer genes is much larger than the ~350 currently catalogued39,44. Conceivably,
these infrequently mutated cancer genes confer less selective growth advantage on a clone of
cancer cells than more commonly mutated cancer genes, but other explanations can also be
invoked. Some analyses also indicate that there may be as many as 20 driver mutations in
individual cancers, considerably more than the 5–7 previously predicted24.
Understanding of the prevalence and types of somatic mutation in cancer genomes has been
greatly fostered by these studies. Some cancer genomes carry >100,000 point mutations
whereas others have fewer than 1,000. Some of this variation can be accounted for by
previous heavy mutagenic exposures or the existence of known DNA repair defects.
However, in a subset of breast cancers there are large numbers of C-to-G base substitutions,
almost always occurring at cytosines that follow a thymine, for which there is no obvious
Stratton et al. Page 7
Nature. Author manuscript; available in PMC 2010 February 15.
E
urope P
M
C
F
unders A
uthor M
anuscripts
E
urope P
M
C
F
unders A
uthor M
anuscripts
explanation and for which unknown exposures and/or mutator phenotypes are presumably
responsible42,43.
The effects of chemotherapy on the cancer genome have also been revealed by systematic
sequencing experiments. For example, gliomas that recur after treatment with the DNA
alkylating agent temozolomide have been shown to carry huge numbers of mutations with a
signature typical of such agents32,52,53. The fact that the mutations could be detected at all
indicates that these recurrences are clonal. Thus, these studies indicate that, although
temozolomide only confers a short increased lifespan for the patient, almost all cells in a
glioma respond and a single cell that is resistant to the chemotherapy proliferates to form the
recurrence. Additional studies guided by these observations led to the identification of the
underlying mutated resistance gene52,53.
Beyond point mutations, some investigations have begun to explore the features of genomic
rearrangements in common cancers, about which remarkably little is known. Early studies
using conventional Sanger sequencing indicated that there is substantial complexity of
rearrangement in these genomes54,55. The recent advent of massively parallel, second-
generation sequencing technologies has enabled more comprehensive genome-wide screens
revealing that some cancer genomes carry hundreds of somatically acquired rearrangements,
whereas others carry very few. Moreover, the distinctive patterns of rearrangement found
indicate that currently uncharacterized mutational processes may be at work56.
Sequencing of cancer genomes in the future
The large-scale, systematic sequencing studies conducted so far have been constrained by
the relatively low throughput and high cost of sequencing. They have therefore generally
been restricted to components of the cancer genome (for example, coding exons), to small
numbers of cancer samples or to a subset of the mutational classes present. In principle,
however, all the structural classes of somatic mutation can be detected genome-wide by
randomly fragmenting the cancer genome and sequencing large numbers …
Frontiers in Genetics | www.frontiersin.org
Edited by:
Tao Huang,
Shanghai Institutes for Biological
Sciences (CAS), China
Reviewed by:
Jing Feng,
Tianjin Medical University General
Hospital, China
Xiaoying Huang,
Wenzhou Medical University, China
*Correspondence:
Zhengfu He
[email protected]
Enguo Chen
[email protected]
Specialty section:
This article was submitted to
Bioinformatics and
Computational Biology,
a section of the journal
Frontiers in Genetics
Received: 11 October 2019
Accepted: 07 January 2020
Published: 04 February 2020
Citation:
Zhang J, Hu H, Xu S, Jiang H, Zhu J,
Qin E, He Z and Chen E (2020) The
Functional Effects of Key Driver KRAS
Mutations on Gene Expression in
Lung Cancer.
Front. Genet. 11:17.
doi: 10.3389/fgene.2020.00017
ORIGINAL RESEARCH
published: 04 February 2020
doi: 10.3389/fgene.2020.00017
The Functional Effects of Key Driver
KRAS Mutations on Gene Expression
in Lung Cancer
Jisong Zhang1, Huihui Hu1, Shan Xu1, Hanliang Jiang1, Jihong Zhu2, E. Qin3,
Zhengfu He4* and Enguo Chen1*
1 Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China,
2 Department of Anesthesiology, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China, 3 Department of
Respiratory Medicine, Shaoxing People’s Hospital (Shaoxing Hospital, Zhejiang University School of Medicine), Shaoxing,
China, 4 Department of Thoracic Surgery, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
Lung cancer is a common malignant cancer. Kirsten rat sarcoma oncogene (KRAS)
mutations have been considered as a key driver for lung cancers. KRAS p.G12C
mutations were most predominant in NSCLC which was comprised about 11–16\% of
lung adenocarcinomas (p.G12C accounts for 45–50\% of mutant KRAS). But it is still not
clear how the KRAS mutation triggers lung cancers. To study the molecular mechanisms
of KRAS mutation in lung cancer. We analyzed the gene expression profiles of 156 KRAS
mutation samples and other negative samples with two stage feature selection approach:
(1) minimal Redundancy Maximal Relevance (mRMR) and (2) Incremental Feature
Selection (IFS). At last, 41 predictive genes for KRAS mutation were identified and a
KRAS mutation predictor was constructed. Its leave one out cross validation MCC was
0.879. Our results were helpful for understanding the roles of KRAS mutation in
lung cancer.
Keywords: Kirsten rat sarcoma oncogene (KRAS), mutation, lung cancer, predictor, gene expression
INTRODUCTION
Lung cancer, known as a malignant cancer which defined as the overgrowth of uncontrolled cell in
lung tissues, has proved be a key cause of cancer death. Each year, 1.3 million people die of lung
cancer (Jemal et al., 2006; Jemal et al., 2011). Non-small-cell lung cancer (NSCLC) accounts for
more than 85\% of diagnosed lung cancer patients (Morgensztern et al., 2010). NSCLC can be further
divided into adenocarcinoma, squamous cell carcinoma (SCC), and large cell carcinoma (Sandler
et al., 2006; Morgensztern et al., 2010).
At present, the pathogenesis of lung cancer is not very clear, but is generally believed that one of
the most important reason is the accumulation of mutations including single nucleotide
transformation, small fragments of insertions and deletions, the changes of copy number, and
chromosome rearrangement. Moreover, these mutations are closed with cell proliferation, invasion,
metastasis, and apoptosis (Scagliotti et al., 2008; Liu et al., 2012). So, studying mutations in living
systems will be helpful to understand how mutations are associated with lung-cancer
biological processes.
February 2020 | Volume 11 | Article 171
https://www.frontiersin.org/article/10.3389/fgene.2020.00017/full
https://www.frontiersin.org/article/10.3389/fgene.2020.00017/full
https://www.frontiersin.org/article/10.3389/fgene.2020.00017/full
https://loop.frontiersin.org/people/873947
https://loop.frontiersin.org/people/812887
https://www.frontiersin.org/journals/genetics
http://www.frontiersin.org/
https://www.frontiersin.org/journals/genetics#articles
http://creativecommons.org/licenses/by/4.0/
mailto:[email protected]
mailto:[email protected]
https://doi.org/10.3389/fgene.2020.00017
https://www.frontiersin.org/journals/genetics#editorial-board
https://www.frontiersin.org/journals/genetics#editorial-board
https://doi.org/10.3389/fgene.2020.00017
https://www.frontiersin.org/journals/genetics
http://crossmark.crossref.org/dialog/?doi=10.3389/fgene.2020.00017&domain=pdf&date_stamp=2020-02-04
Zhang et al. Functional Effects of KRAS Mutations
In the last decade, researchers have uncovered the source of
one of the important mutations is called as Kirsten rat sarcoma
oncogene (KRAS) mutations in lung cancers using molecular
studies (Gautschi et al., 2007). KRAS is the principal isoform of
RAS. KRAS p.G12C mutations were most predominant in
NSCLC which was comprised about 11–16\% of lung
adenocarcinomas (p.G12C accounts for 45–50\% of mutant
KRAS) (Cox et al., 2014). Other common KRAS mutations in
lung cancer are G12V and G12D. In other cancers, such as
pancreatic cancer and colorectal cancer, KRAS mutations are
also frequent. Based on the TCGA data in cBioPortal (Gao et al.,
2013), the most frequent KRAS mutations in pancreatic cancer
are G12D, G12V, and G12R; the most frequent KRAS
mutations in colorectal cancer are G12D, G12V, and G13D.
KRAS may be a good lung cancer therapeutic target for
searching potential drugs.
As above mentioned, mutations in KRAS is the most usual
mutations that occur in lung cancer, especially in NSCLC (Mao
et al., 1994; Mills et al., 1995; Nakamoto et al., 2001). KRAS
mutation is more frequent in Caucasians than in Asians.
Moreover, smokers may have more KRAS mutations than
nonsmokers (Westcott and To, 2013; Ferrer et al., 2018).
Single amino acid substitutions in codon 12 were most
common KRAS mutations in NSCLC (Graziano et al., 1999).
Therefore, the search for how the KRAS mutations affected the
gene in lung cancer has been a long-standing goal in
cancer biology.
In this study, to study the functional effects of key driver KRAS
mutations on gene expression in lung cancer, we analyzed the gene
expression profiles of 156 lung cancer cell lines with KRAS
mutations and other 3,582 lung cancer cell lines without KRAS
mutations. Forty-one discriminative genes for KRAS mutations
were identified using two stage feature selection approach: (1)
minimal Redundancy Maximal Relevance (mRMR) and (2)
Incremental Feature Selection (IFS).
METHODS
The Gene Expression Profiles of Cell Lines
With and Without KRAS Mutations
To identify the key genes that distinguishes key driver KRAS
mutations from other mutations, we downloaded the gene
expression profiles of 156 lung cancer cell lines with KRAS
mutations as positive samples and other 3,582 lung cancer cell
lines without KRAS mutations as negative samples from publicly
available Gene Expression Omnibus (GEO) database under
accession number of GSE83744 (Berger et al., 2016). The
expression levels of 978 representative genes from Broad
Institute Human L1000 landmark were measured. The L1000
landmark was derived from the Connectivity Map (CMap)
project (Subramanian et al., 2017). CMap is a large gene-
expression dataset of human cells perturbed with many
chemicals and genetic reagents (Lamb et al., 2006). These 1,000
genes were sensitive to perturbations and can reflect 81\% of non-
measured transcripts (Subramanian et al., 2017).
Frontiers in Genetics | www.frontiersin.org 2
Two Stage Feature Selection Approach
We applied two stage feature selection approach to select the
biomarker genes. First, the genes were ranked based on not only
their relevance with mutation samples, but also their redundancy
among genes using the mRMR algorithm (Peng et al., 2005). It
had a wide range of applications in bioinformatics for feature
selection (Chen et al., 2018c; Chen et al., 2019e; Li and Huang,
2018; Li et al., 2019b; Wang and Huang, 2019a). As the equation
shown below, Ωs, Ωt and Ω were the set of m selected genes, n to-
be-selected genes, and all m+n genes, respectively. We use
mutual information (I) to measure the relevance of the
expression levels of gene g from Ωt with KRAS mutation status
t (Huang and Cai, 2013):/>
D = I g, tð Þ (1)
Meanwhile, the redundancy R of the gene g with the selected
genes in Ωs can be calculated as below:
R =
1
m
∑gi ∈ WsI g, gið Þ
� �
(2)
The optimal gene gj from Ωt with max relevance with KRAS
mutation status t and min redundancy with the selected genes in
Ωs can be selected by maximizing mRMR function listed below
max
gj ∈ Wt
I gj, t
� �
−
1
m
∑gi ∈ Ws I gj, gi
� �� �� �
j = 1, 2, …, nð Þ (3)
With N round evaluations, genes can be ranked as
S = g
0
1, g
0
2, …, g
0
h, …, g
0
N,
n o
(4)
The top ranked genes were associated with KRAS mutation
status, and had little redundancy with other genes. Such genes
were suitable for biomarkers. The top 200 genes were further
analyzed at the second stage.
The second stage was to determine the number of selected
genes using the IFS method (Chen et al., 2018b; Chen et al.,
2019b; Chen et al., 2019c; Chen et al., 2019d; Chen et al., 2019f; Li
et al., 2019a; Pan et al., 2019a; Pan et al., 2019b; ). To do so, 200
classifiers were constructed using top 1, top 2, top 200 genes. The
LOOCV (leave-one-out cross validation) MCC (Mathew’s
correlation coefficient) of the top k-gene classifier was
calculated each time.
We tried several different classifiers: (1) SVM (Support
Vector Machine) (Jiang et al., 2019; Yan et al., 2019; Chen
et al., 2019a; Li et al., 2019a; Pan et al., 2019a; Wang and
Huang, 2019b; Chen et al., 2019d), (2) 1NN (1 Nearest
Neighbor) (Lei et al., 2013; Chen et al., 2016; Wang et al.,
2017a), (3) 3NN (3 Nearest Neighbors), (4) 5NN (5 Nearest
Neighbors), (5) Decision Tree (DT) (Huang et al., 2008;
Huang et al., 2011; Chen et al., 2015), (6) Neural Network
(NN) (Liu et al., 2017; Pan et al., 2018; Chen et al., 2019e). The
function svm from R package e1071, function knn from R
package class, function rpart from R package rpart, function
nnet from R package nnet were used to apply these
classification algorithms.
February 2020 | Volume 11 | Article 17
https://www.frontiersin.org/journals/genetics
http://www.frontiersin.org/
https://www.frontiersin.org/journals/genetics#articles
Zhang et al. Functional Effects of KRAS Mutations
Based on the IFS curve in which x-axis was the number of
genes and y-axis was the corresponding LOOCV MCC, we can
decide the best gene combinations we should select. The peak of
the curve was the optimal selection.
Prediction Performance Evaluation
of the Classifier
As we mentioned before, the prediction performance of each
classifier was evaluated with leave-one-out cross validation
(LOOCV) (Cui et al., 2013; Yang et al., 2014). It will go
through N rounds and each sample will be tested during the N
rounds. In each round, one sample will be tested using the model
trained with the other N-1 samples. It can objectively evaluate all
samples (Chou, 2011).
The performance metrics, including Sensitivity (Sn),
Specificity (Sp), Accuracy (ACC), and Mathew’s correlation
coefficient (MCC) were all calculated:
Sn =
TP
TP + FN
(5)
Sp =
TN
TN + FP
(6)
ACC =
TP + TN
TP + TN + FP + FN
(7)
MCC =
TP � TN − FP � FNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
TP + FPð Þ TP + FNð Þ TN + FPð Þ TN + FNð Þ
p (8)
where TP, TN, FP, and FN stand for the number of true
positive samples, true negative samples, false positive samples,
and false negative samples, respectively. Since the sizes of
KRAS mutation + samples and KRAS mutation - samples were
imbalance and MCC can trade-off sensitivity and specificity
(Chen et al., 2018a; Li et al., 2018; Pan et al., 2018; Pan et al.,
2019a; Pan et al., 2019b), MCC was used as the main
performance metric.
RESULTS AND DISCUSSION
The Genes That Showed Different
Expression Pattern Between KRAS
Mutations From Other Mutations Samples
The top 200 most informative genes for KRAS mutations were
identified using the mRMR method which has been widely
used in bioinformatics filed (Zhao et al., 2013; Zhang et al.,
2016). The C/C++ version software written by Peng et al.
(Peng et al., 2005; Best et al., 2017) (http://home.penglab.com/
proj/mRMR/) was used to apply the mRMR algorithm. Unlike
the traditional statistical test based univariate feature selection
methods, mRMR considers the relevance between gene
expression and KRAS mutation status, and the redundancy
among genes.
Frontiers in Genetics | www.frontiersin.org 3
The Optimal Biomarkers Identified From
the mRMR Gene List With IFS Methods
After genes were ranked by mRMR, the IFS procedure was
applied to find the optimal number of genes to be selected.
The IFS curve in Figure 1 showed the relationship between the
number of genes and their MCCs. The peak LOOCV MCCs of
SVM, 1NN, 3NN, 5NN, DT, and NN were 0.858 with 8 genes,
0.853 with 48 genes, 0.879 with 41 genes, 0.878 with 59 genes,
0.871 with 69 genes, 0.842 with 174 genes. 3NN performed best.
The corresponding 41 genes were shown in Table 1.
The Prediction Metrics of the 41 Genes
The 41 genes were chosen with two stage feature selection
methods: mRMR and IFS. To more carefully evaluate their
prediction power, we checked their confusion matrix which
showed the overlaps between actual KRAS mutation status and
predicted KRAS mutation status using 3NN (Table 2). The
LOOCV sensitivity, specificity, accuracy, and MCC were 0.840,
0.997, 0.991, and 0.879, respectively.
The Network Associations Between KRAS
and the 41 Genes
We searched KRAS and the eight genes in STRING database
Version: 11.0 (https://string-db.org) and Figure 2 showed their
functional association networks. It can be seen that 20 out of 41
genes (CCND3, CDK19, CEBPA, CEBPD, CSNK1E, CTSL,
DUSP6, GRB10, HMGA2, MMP1, MTHFD2, NR3C1, PAK4,
PMAIP1, RAP1GAP, SDHB, STX1A, TP53, TRIB3, UBE2L6)
FIGURE 1 | The IFS curves of six different classifiers. The x-axis was the
number of genes and the y-axis was the then leave one out cross validation
(LOOCV) MCC. The red, blue, brown, black, orange, and purple curves were
the IFS results of SVM, 1NN, 3NN, 5NN, DT, and NN, respectively. Peak
LOOCV MCCs of SVM, 1NN, 3NN, 5NN, DT, and NN were 0.858 with 8
genes, 0.853 with 48 genes, 0.879 with 41 genes, 0.878 with 59 genes,
0.871 with 69 genes, 0.842 with 174 genes. 3NN performed best. Therefore,
the corresponding 41 genes were finally selected.
February 2020 | Volume 11 | Article 17
http://home.penglab.com/proj/mRMR/
http://home.penglab.com/proj/mRMR/
https://string-db.org
https://www.frontiersin.org/journals/genetics
http://www.frontiersin.org/
https://www.frontiersin.org/journals/genetics#articles
Zhang et al. Functional Effects of KRAS Mutations
had direct interactions with KRAS. The STRING network results
supported that most of the 41 genes had direct interactions
with KRAS.
The Biological Significance of the Selected
Genes in Lung Cancer
As mentioned earlier, we used mRMR algorithm and IFS program
to screen out 41 genes which may be molecular markers for
identifying KARS mutations. Subsequently, we reviewed studies
of these genes in lung cancer and other cancers with high frequency
of KARS mutations such as colorectal and pancreatic cancer. In the
study of Zhang X et al., Tribbles-3 (TRIB3) pseudokinase can
activate the b-catenin signal pathway, which in turn promotes the
proliferation and migration of NSCLC cells (Zhang et al., 2019). In
addition, blocking the activity of TRIB3 may be one of the
mechanisms for the treatment of lung cancer (Ding et al., 2018).
Wang X et al. have found that PAK4 is significantly associated with
poor prognosis of NSCLC (Wang et al., 2016b), and LIMK1
phosphorylation mediated by it regulates the migration and
invasion of NSCLC. Therefore, PAK4 may be an important
prognostic indicator and a potential molecular target for
treatment of NSCLC (Cai et al., 2015). HMGA2 affects apoptosis
and is highly expressed in metastatic LUAD through Caspase 3/9
and Bcl-2. It is also considered to be a biomarker and potential
therapeutic target for lung cancer therapy (Kumar et al., 2014; Gao
Frontiers in Genetics | www.frontiersin.org 4
et al., 2017b). A meta-analysis of lung cancer showed that metallo-
proteinase 1 (MMP1)-16071G/2G polymorphism was a risk factor
for lung cancer in Asians (Li et al., 2015). In addition, DUSP6
rs2279574 gene polymorphism is thought to predict the survival
time of NSCLC patients after chemotherapy (Wang et al., 2016a).
Cyclin D3 gene (CCND3) is a key cell cycle gene of NSCLC, which
can promote the growth of LUAD (Zhang et al., 2017). Casein
kinase I epsilon (CSNK1E), a circadian rhythm gene, whose genetic
variation has a very significant correlation with the risk of lung
cancer (Ortega and Mas-Oliva, 1986). CEPBA, can be used as a
new tumor suppressor factor, Lu H et al. through clinical
experiments, it was found that up-regulation of CEBPA is an
effective method for the treatment of human NSCLC (Halmos
et al., 2002; Lu et al., 2015). In addition, a comprehensive analysis
of lung cancer genes by, Lv M shows that CEPBD may be involved
in the development of lung cancer (Lv and Wang, 2015). TP53
mutation is very common in NSCLC and is considered to be a
marker of poor prognosis and a prognostic indicator of lung cancer
(Gao et al., 2017a; Labbe et al., 2017). Methylenetetrahydrofolate
dehydrogenase 2 (MTHFD2) has redox homeostasis and can be
used in the treatment of lung cancer (Nishimura et al., 2019).
NR3C1 is reported to be involved in the pathways related to the
biological process of lung cancer, and as a gene marker has a
significant correlation with the survival of LUAD (Zhao et al., 2015;
Luo et al., 2018). Cathepsin L1, as a protein was encoded by the
CTSL1 gene, could reduce the cellular matrix and proteolytic
cascades which resulting to promote invasion or metastatic
activity (Duffy, 1996; Turk et al., 2012). Elevated expression of
extracellular Cathepsin L was related with cancer progression of
lung cancer cells (Okudela et al., 2016). Moreover, Cathepsin L is
viewed as a downstream target of oncogenic KRAS mutations.
The above genes have not only been proved to be closely
related to the prognosis, diagnosis, and treatment of lung cancer,
but also have a direct interaction with KRAS. Some of the 41
selected genes have no direct interaction with KRAS, but are
considered to be involved in the occurrence and development of
lung cancer. RBM6 protein is located at 3p21.3, and its
expression changes regulate many of the most common
abnormal splicing events in lung cancer (Sutherland et al.,
2010; Coomer et al., 2019). The double up-regulation of RGS2
gene is related to the poor overall survival rate of patients with
lung adenocarcinoma (Yin et al., 2016). Epigenetic silencing of
BAMBI has been identified as a marker of NSCLC, and
overexpression of BAMBI may become a new target for the
treatment of this cancer (Marwitz et al., 2016; Wang et al.,
2017b). Overexpression of PAFA-H1B1 can lead to the
occurrence and poor prognosis of lung cancer (Lo et al., 2012).
Collagen alpha-1(IV) chain (COL4A1), encoded by the COL4A1
gene, was found previously to play a crucial role in the
coordinating alveolar morphogenesis and formatting the
epithelium vasculature lung tissue (Abe et al., 2017).
The Potential Roles of the Selected Genes
in Other Cancers
KRAS related genes are likely to be diagnostic, prognostic
markers and therapeutic targets of lung cancer. We also
TABLE 1 | The 41 genes selected by mRMR and IFS.
Rank Gene Rank Gene
1 CTSL1 22 CCDC92
2 GNPDA1 23 BRP44
3 TRIB3 24 CDK19
4 STX1A 25 CD320
5 PHKA1 26 ATP1B1
6 CSNK1E 27 DRAP1
7 COL4A1 28 DUSP6
8 CEBPA 29 RAP1GAP
9 CEBPD 30 GALE
10 NSDHL 31 SSBP2
11 TP53 32 UBE2L6
12 MTHFD2 33 CCND3
13 RGS2 34 PAFAH1B1
14 NR3C1 35 RBM6
15 PPIC 36 C5
16 BAMBI 37 SDHB
17 PAK4 38 GRB10
18 FEZ2 39 UFM1
19 KTN1 40 ARL4C
20 HMGA2 41 PMAIP1
21 MMP1
TABLE 2 | The confusion matrix of actual sample classes and predicted sample
classes using 3NN.
Predicted KRAS mutation + Predicted KRAS
mutation −
Actual KRAS mutation + 131 25
Actual KRAS mutation − 10 3572
MCC = 0.879 Sensitivity = 0.840 Specificity = 0.997
February 2020 | Volume 11 | Article 17
https://www.frontiersin.org/journals/genetics
http://www.frontiersin.org/
https://www.frontiersin.org/journals/genetics#articles
Zhang et al. Functional Effects of KRAS Mutations
looked for studies of these genes and KRAS high-frequency
mutations in other cancers, mainly in colorectal and
pancreatic cancer. According to Hua F et al., TRIB 3 gene
knockout can reduce the occurrence of colon tumors in mice,
reduce the migration of colorectal cancer cells, and reduce
their growth in mouse transplanted tumors. The strategy of
blocking the activity of TRIB3 can be used to treat colorectal
cancer (Hua et al., 2019). Tyagi N et al. have found that PAK4
can maintain the stem cell phenotype of pancreatic cancer cells by
activating STAT3 signal, which can be used as a new therapeutic
target (Tyagi et al., 2016). TP53 mutation is associated with early
stage of colorectal cancer (Laurent et al., 2011). There was a
significant correlation between MMP1 and colon cancer mortality
(Slattery and Lundgreen, 2014).
Frontiers in Genetics | www.frontiersin.org 5
DATA AVAILABILITY STATEMENT
We downloaded the blood gene expression profiles of 156 KRAS
mutations as positive samples and other 3582 mutations as
negative samples from publicly available GEO (Gene
Expression Omnibus) under accession number of GSE83744.
AUTHOR CONTRIBUTIONS
JZha conceived and designed the study. HH and SX performed
data analysis. HJ wrote the paper. JZhu, EC and ZH reviewed and
edited the manuscript. JZha approved final version of the
manuscript. All authors read and approved the manuscript.
FIGURE 2 | The functional association network of KRAS and the selected genes based on STRING database. Twenty out of 41 genes (CCND3, CDK19, CEBPA,
CEBPD, CSNK1E, CTSL, DUSP6, GRB10, HMGA2, MMP1, MTHFD2, NR3C1, PAK4, PMAIP1, RAP1GAP, SDHB, STX1A, TP53, TRIB3, UBE2L6) had direct
interactions with KRAS. Each line represented an interaction supported by different evidences. The skype-blue, purple, green, red, blue, grass green, black, and
navy-blue edges were interactions from curated databases, experiment, gene neighborhood, gene fusions, gene co-occurrence, text mining, co-expression, and
protein homology, respectively. For more detailed explanations, please refer to STRING database (https://string-db.org).
February 2020 | Volume 11 | Article 17
https://string-db.org
https://www.frontiersin.org/journals/genetics
http://www.frontiersin.org/
https://www.frontiersin.org/journals/genetics#articles
Zhang et al. Functional Effects of KRAS Mutations
FUNDING
This study was supported by the Funds from Science Technology
Department of Zhejiang Province (LGF19H010010), Medical
and Health Research Foundation of Zhejiang Province
Frontiers in Genetics | www.frontiersin.org 6
(2016ZDB005, 2017ZD020), China, WU JIEPING MEDICAL
foundation (320.6750.19092-12), Beijing Xisike Clinical
Oncology Research Foundation (Y-HS2017-037) and Medical
Health and Scientific Technology Project of Zhejiang
Province (2019RC182).
Cox, A. D., Fesik, S. W., Kimmelman, A. C., Luo, J., and Der, C. J. (2014). Drugging
REFERENCES
Abe, Y., Matsuduka, A., Okanari, K., Miyahara, H., Kato, M., Miyatake, S., et al.
(2017). A severe pulmonary complication in a patient with COL4A1-related
disorder: a case report. Eur. J. Med. Genet. 60 (3), 169–171. doi: 10.1016/
j.ejmg.2016.12.008
Berger, A. H., Brooks, A. N., Wu, X., Shrestha, Y., Chouinard, C., Piccioni, F., et al.
(2016). High-throughput phenotyping of lung cancer somatic mutations.
Cancer Cell 30 (2), 214–228. doi: 10.1016/j.ccell.2016.06.022
Best, M. G., Sol, N., In ‘t Veld, S., Vancura, A., Muller, M., Niemeijer, A. N., et al.
(2017). Swarm intelligence-enhanced detection of non-small-cell lung cancer
using tumor-educated platelets. Cancer Cell 32 (2), 238–252.e239. doi: 10.1016/
j.ccell.2017.07.004
Cai, S., Ye, Z., Wang, X., Pan, Y., Weng, Y., Lao, S., et al. (2015). Overexpression of
P21-activated kinase 4 is associated with poor prognosis in non-small cell lung
cancer and promotes migration and invasion. J. Exp. Clin. Cancer Res. 34, 48.
doi: 10.1186/s13046-015-0165-2
Chen, L., Chu, C., Huang, T., Kong, X., and Cai, Y. D. (2015). Prediction and
analysis of cell-penetrating peptides using pseudo-amino acid composition and
random forest models. Amino Acids 47 (7), 1485–1493. doi: 10.1007/s00726-
015-1974-5
Chen, L., Zhang, Y. H., Huang, T., and Cai, Y. D. (2016). Gene expression profiling
gut microbiota in different races of humans. Sci. Rep. 6, 23075. doi: 10.1038/
srep23075
Chen, L., Li, J., Zhang, Y. H., Feng, K., Wang, S., Zhang, Y., et al. (2018a).
Identification of gene expression signatures across different types of neural
stem cells with the Monte-Carlo feature selection method. J. Cell Biochem. 119
(4), 3394–3403. doi: 10.1002/jcb.26507
Chen, L., Zhang, Y.-H., Pan, X., Liu, M., Wang, S., Huang, T., et al. (2018b). Tissue
Expression difference between mRNAs and lncRNAs. Int. J. Mol. Sci. 19 (11),
3416. doi: 10.3390/ijms19113416
Chen, L., Zhang, Y. H., Huang, G., Pan, X., Wang, S., Huang, T., et al. (2018c).
Discriminating cirRNAs from other lncRNAs using a hierarchical extreme
learning machine (H-ELM) algorithm with feature selection. Mol. Genet.
Genomics 293 (1), 137–149. doi: 10.1007/s00438-017-1372-7
Chen, L., Pan, X., Zeng, T., Zhang, Y., Huang, T., and Cai, Y. (2019a). Identifying
essential signature genes and expression rules associated with distinctive
development stages of early embryonic cells. IEEE Access 7, 128570–128578.
doi: 10.1109/ACCESS.2019.2939556
Chen, L., Pan, X., Zhang, Y.-h., Hu, X., Feng, K., Huang, T., et al. (2019b). Primary
tumor site specificity is preserved in patient-derived tumor xenograft models.
Front. In Genet. doi: 10.3389/fgene.2019.00738
Chen, L., Pan, X., Zhang, Y.-H., Huang, T., and Cai, Y.-D. (2019c). Analysis of
gene expression differences between different pancreatic cells. ACS Omega 4
(4), 6421–6435. doi: 10.1021/acsomega.8b02171
Chen, L., Pan, X., Zhang, Y.-H., Kong, X., Huang, T., and Cai, Y.-D. (2019d).
Tissue differences revealed by gene expression profiles of various cell lines.
J. Cell. Biochem. 120 (5), 7068–7081. doi: 10.1002/jcb.27977
Chen, L., Pan, X., Zhang, Y.-H., Liu, M., Huang, T., and Cai, Y.-D. (2019e).
Classification of widely and rarely expressed genes with recurrent neural
network. Comput. Struct. Biotechnol. J. 17, 49–60. doi: 10.1016/j.csbj.2018.12.002
Chen, L., Zhang, S., Pan, X., Hu, X., Zhang, Y. H., Yuan, F., et al. (2019f). HIV
infection alters the human epigenetic landscape. Gene Ther. 26 (1-2), 29–39.
doi: 10.1038/s41434-018-0051-6
Chou, K. C. (2011). Some remarks on protein attribute prediction and pseudo
amino acid composition. J. Theor. Biol. 273 (1), 236–247. doi: 10.1016/
j.jtbi.2010.12.024
Coomer, A. O., Black, F., Greystoke, A., Munkley, J., and Elliott, D. J. (2019).
Alternative splicing in lung cancer. Biochim. Biophys. Acta Gene Regul. Mech.
1862 (11-12), 194388. doi: 10.1016/j.bbagrm.2019.05.006
the undruggable RAS: mission possible? Nat. Rev. Drug Discovery 13 (11), 828–
851. doi: 10.1038/nrd4389
Cui, W., Chen, L., Huang, T., Gao, Q., Jiang, M., Zhang, N., et al. (2013).
Computationally identifying virulence factors based on KEGG pathways.
Mol. Biosyst. 9 (6), 1447–1452. doi: 10.1039/c3mb70024k
Ding, C. Z., Guo, X. F., Wang, G. L., Wang, H. T., Xu, G. H., Liu, Y. Y., et al. (2018).
High glucose contributes to the proliferation and migration of non-small cell
lung cancer cells via GAS5-TRIB3 axis. Biosci. Rep. 38 (2), BSR20171014. doi:
10.1042/BSR20171014
Duffy, M. J. (1996). PSA as a marker for prostate cancer: a critical review. Ann.
Clin. Biochem. 33 (Pt 6), 511–519. doi: 10.1177/000456329603300604
Ferrer, I., Zugazagoitia, J., Herbertz, S., John, W., Paz-Ares, L., and Schmid-
Bindert, G. (2018). KRAS-Mutant non-small cell lung cancer: From biology to
therapy. Lung Cancer 124, 53–64. doi: 10.1016/j.lungcan.2018.07.013
Gao, J., Aksoy, B. A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S. O., et al.
(2013). Integrative analysis of complex cancer genomics and clinical profiles
using the cBioPortal. Sci. Signal 6 (269), pl1. doi: 10.1126/scisignal.2004088
Gao, W., Jin, J., Yin, J., Land, S., Gaither-Davis, A., Christie, N., et al. (2017a).
KRAS and TP53 mutations in bronchoscopy samples from former lung cancer
patients. Mol. Carcinog. 56 (2), 381–388. doi: 10.1002/mc.22501
Gao, X., Dai, M., Li, Q., Wang, Z., Lu, Y., and Song, Z. (2017b). HMGA2 regulates
lung cancer proliferation and metastasis. Thorac. Cancer 8 (5), 501–510. doi:
10.1111/1759-7714.12476
Gautschi, O., Huegli, B., Ziegler, A., Gugger, M., Heighway, J., Ratschiller, D., et al.
(2007). Origin and prognostic value of circulating KRAS mutations in lung
cancer patients. Cancer Lett. 254 (2), 265–273. doi: 10.1016/
j.canlet.2007.03.008
Graziano, S. L., Gamble, G. P., Newman, N. B., Abbott, L. Z., Rooney, …
ARTICLE
Diversity spectrum analysis identifies mutation-
specific effects of cancer driver genes
Xiaobao Dong 1*, Dandan Huang2, Xianfu Yi3, Shijie Zhang4, Zhao Wang4, Bin Yan5,6, Pak Chung Sham 6,
Kexin Chen7 & Mulin Jun Li1,4*
Mutation-specific effects of cancer driver genes influence drug responses and the success of
clinical trials. We reasoned that these effects could unbalance the distribution of each
mutation across different cancer types, as a result, the cancer preference can be used to
distinguish the effects of the causal mutation. Here, we developed a network-based frame-
work to systematically measure cancer diversity for each driver mutation. We found that half
of the driver genes harbor cancer type-specific and pancancer mutations simultaneously,
suggesting that the pervasive functional heterogeneity of the mutations from even the same
driver gene. We further demonstrated that the specificity of the mutations could influence
patient drug responses. Moreover, we observed that diversity was generally increased in
advanced tumors. Finally, we scanned potentially novel cancer driver genes based on the
diversity spectrum. Diversity spectrum analysis provides a new approach to define driver
mutations and optimize off-label clinical trials.
https://doi.org/10.1038/s42003-019-0736-4 OPEN
1 Department of Genetics, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and
Hospital, Tianjin Medical University, Tianjin, China. 2 Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical
University, Tianjin, China. 3 School of Biomedical Engineering, Tianjin Medical University, Tianjin, China. 4 Department of Pharmacology, Tianjin Key Laboratory
of Inflammation Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University,
Tianjin, China. 5 School of Biomedical Sciences, Department of Anesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
6 Centre of Genomics Sciences, State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong SAR, China. 7 Department of
Epidemiology and Biostatistics, Tianjin Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Center for Cancer, Tianjin Medical
University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China. *email: [email protected]; [email protected]
COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio 1
12
3
4
5
6
7
8
9
0
()
:,;
http://orcid.org/0000-0003-1652-117X
http://orcid.org/0000-0003-1652-117X
http://orcid.org/0000-0003-1652-117X
http://orcid.org/0000-0003-1652-117X
http://orcid.org/0000-0003-1652-117X
http://orcid.org/0000-0002-2533-7270
http://orcid.org/0000-0002-2533-7270
http://orcid.org/0000-0002-2533-7270
http://orcid.org/0000-0002-2533-7270
http://orcid.org/0000-0002-2533-7270
mailto:[email protected]
mailto:[email protected]
www.nature.com/commsbio
www.nature.com/commsbio
C
ancer-promoted genetic events and related genes (or so-
called driver mutations and driver genes) have been not
only successfully identified in most types of cancer but
also linked to novel therapeutic opportunities, such as EGFR
mutations to lung cancer, BRAF mutations to melanoma, and KIT
mutations to gastrointestinal stromal tumors1,2. Off-label-
targeted therapies, such as NCI-MATCH, aim at treating
tumors across anatomical sites based on cancer genomic altera-
tions3. However, cancer type-specific and mutation-specific
oncogenic signaling has been observed in a number of recent
clinical and preclinical studies4,5. The quantitative characteriza-
tion of cancer type preference of driver mutations and their
biological and clinical significance remains inadequate.
Mutation-specific effects of driver mutations have been
demonstrated in multiple well-characterized cancer driver
genes6–13, which implies that the functional heterogeneities of
driver mutations in the same cancer gene could be very common.
For example, NRAS mutations at codons 12, 13, and 61 were
characterized as driver mutations in many cancers. However, only
the NRAS Q61 mutation can efficiently promote melanoma9.
Recently, BRAF driver mutations were categorized into at least
three classes with different kinase activity, RAS dependency, and
dimer dependency6. More importantly, these mutation-specific
effects seem tightly connected with the clinical features of
patients. A multicenter clinical study10 on the efficacy of the HER
kinase inhibitor neratinib showed that the responses of patients
were determined by both cancer types and mutations, which is
consistent with the conclusion of a previous clinical study14 in
which the BRAF inhibitor vemurafenib was tested on patients
from different cancer types but harboring BRAF V600 mutation.
Thus, compared with sophisticated studies at the driver gene
level, the development of a unified approach to define the role of
each driver mutation will be important to deepen our under-
standing of cancer genomics and guide clinical trial designs15,16.
Much work has been done to characterize cancer drivers at a
subgene resolution, including at the protein linear sequence,
protein domain, protein 3D structure, and protein–protein
interface levels17. While these methods can provide mutation-
level classifications of driver mutations, all of them classify
mutations based only on the molecular information of the gene/
protein itself and neglect their cancer context, thus may lead to
misleading of the effects of mutations. Specifically, the roles of
driver genes may vary with different cancer types18. Genome-
wide screen experiments19 and a pancancer analysis of the evo-
lutionary selection on driver mutations20 showed that this phe-
nomenon exists widely. To precisely understand the functions of
driver mutations, both the subgene resolution and cancer-context
information need to be integrated.
The mutation-specific effects, if they are functional, may
unbalance the distribution of each driver mutation in different
cancer types, such as NRAS Q61R, which is almost exclusively
observed in melanoma. Given the cancer distributions of multiple
driver mutations from one driver gene, we could distinguish their
potential functional differences by comparing their cancer
preferences.
In this study, we developed a network-based framework to
quantify and compare the cancer preference of driver mutations.
By projecting mutations onto a cancer diversity spectrum, we can
classify them into three categories, including cancer-specific
(SPM), relatively specific (RSM), and pancancer mutations
(PCM). The distribution of these mutations in protein domains,
genes, and cellular pathways as well as their comutation patterns
were systematically characterized. To demonstrate the potential
value of the cancer diversity spectrum for clinical and biological
problems, we leveraged this information to predict patient drug
responses and identify new cancer driver genes. We finally
developed a web portal to visualize the cancer diversity for driver
mutations at http://mulinlab.org/firework.
Results
Network-based measurement of driver mutation specificity.
We first characterized a compendium of driver mutations across 33
TCGA cancer types (see legend of Fig. 1) using more than three
million somatic mutations from 10,429 patients. To maximally keep
with the conventions of clinical genomic literature and minimize
the influence of biased curation in the existing cancer genomics
databases, we applied a rule-based approach to identify driver
mutations (Supplementary Data 1) in well-characterized cancer
driver genes (according to the records of the Cancer Gene Cen-
sus18), which has been widely used in many clinical cancer
studies21,22. For instance, a missense mutation in an oncogene (OG)
would be taken as a driver mutation if it is highly recurrent in
cancer patients (recurrence rule). In contrast, a frameshift insertion
or damaging missense mutation would be selected as a driver only if
this mutation is in a tumor suppressor gene (TSG) (damaging rule).
We constructed a bipartite network (Fig. 1a) to summarize the
relationships among patients and 33 cancer types from TCGA
project, in which each patient or driver mutation was represented as
a node and a patient and a driver mutation were connected if this
mutation was detected in the patient. To improve the reliability of
subsequent analyses for cancer diversity of mutations, mutations
that occur less than three times on the whole TCGA dataset were
removed from the network. The final patient–mutation network
(Supplementary Fig. 1, Supplementary Files) contains 1570
mutations, 6286 patients (Fig. 1b), and 12,924 edges between them.
These mutations belong to 314 cancer driver genes (Fig. 1c), and the
highest contribution (16\%) is from TP53, which is the most
frequently mutated gene in cancers23. However, there are no
individual genes or cancer types that dominate the network.
By compressing all patients from the same cancer type into one
node (Fig. 2a), we investigated and visualized the similarity of
mutations among all cancer types with force-directed layout
algorithm24. This algorithm is an intuitive method to spatially
organize network data within, usually, a two-dimensional plane.
Nodes in the network will repel each other as they were like
charged bubbles. On the other hand, each edge will act like a
spring to pull a pair of connected nodes together. As the result,
cancer types associated with similar driver mutation sets will be
clustered and pushed away from other cancer types with different
mutation profiles in the final network (Fig. 2a), which allows us to
observe the similarity among these cancer types in a globally and
flexible manner. The results showed that 79\% (26/33) of cancer
types shared at least two driver mutations with other cancer types,
and 54\% (18/33) of cancer types contained at least two private
mutations. Cancer types belonging to the same tissues or organs
were clustered together, such as two squamous cell carcinomas
LUSC and HNSC or two brain cancers GBM and LGG,
suggesting that the driver mutation profile can partly reflect the
origin of cancers. Few driver mutations were shared with others
for relatively rare cancer types, including ACC, CHOL, KICH,
PCPG, SARC, THYM, and UVM, which might be attributed to
both the small size of the patient cohorts and the distinct
molecular characteristics of these cancers, such as the KICH
compared with other kidney cancers25. Thus, shared and distinct
driver mutations composed the patient–mutation networks,
which motivated us to precisely quantify the tumor preference
of each mutation.
Specificity-based classification of driver mutations. We fol-
lowed a network diversity approach26 to compute the preference
of each mutation (Supplementary Data 2). The network diversity
ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4
2 COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio
http://mulinlab.org/firework
www.nature.com/commsbio
is an entropy-based index initially proposed to measure the
relationship diversity of an individual in social networks. In our
measurement, the network diversity values start from 0 to 1, and
a higher value indicates that the mutation is observed in patients
of multiple cancer types with a more similar possibility. If a
mutation occurs in multiple cancer types and a cancer type
dominates the cancer type composition, the network diversity
value will be low. On the contrary, if the mutation occurrences
among multiple cancer types are similar, the network diversity
value will be high. For example, although both KRAS G12V and
KRAS G12R occur in >5 different cancer types, their probabilistic
distributions of cancer types are different. There are total 37
patients associated KRAS G12R in our data and above 75\% of
them are PADD patients. In contrast, for the 176 patients asso-
ciated with KRAS G12V, there are three cancer types occupy
much of the composition (23\% of PADD, 22\% of LUAD, and
19\% of COAD). Thus, the network diversity value of it (G12V,
network diversity = 0.40) is relatively high than KRAS G12R
(network diversity = 0.28), representing a different cancer speci-
ficity. Note that the network diversity was normalized so that a
mutation with high frequency could be compared with a rare
mutation directly, which is a merit required for the long-tailed
distributed cancer mutation frequency. A continuum of network
diversity values formed a cancer diversity spectrum comprising all
driver mutations, allowing us to systematically classify and
characterize the biological and clinical implications of these
mutations.
We found that there are three dominant peaks in the cancer
diversity spectrum, which are distributed near network diversity
values of 0, 0.5, and 1.0. This trimodal distribution suggests that
Fig. 1 Measurement of the cancer distribution of driver mutations with network diversity (network diversity). a Driver mutations identified from
patients of 33 cancer types are used to construct a patient–mutation bipartite network. The 33 cancer types include adrenocortical carcinoma (ACC),
bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma, and endocervical adenocarcinoma (CESC),
cholangiocarcinoma (CHOL), colon adenocarcinoma (COAD), lymphoid neoplasm diffuse large B-cell lymphoma (DLBC), esophageal carcinoma (ESCA),
glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC),
kidney renal papillary cell carcinoma (KIRP), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung
adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), mesothelioma (MESO), ovarian serous cystadenocarcinoma (OV), pancreatic
adenocarcinoma (PAAD), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), sarcoma
(SARC), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), testicular germ cell tumors (TGCT), thyroid carcinoma (THCA), thymoma
(THYM), uterine corpus endometrial carcinoma (UCEC), uterine carcinosarcoma (UCS), and uveal melanoma (UVM). Based on this network, the network
diversity (ND) value of each mutation is calculated and mapped onto the cancer diversity spectrum. According to the spectrum, driver mutations are
classified into specific, relatively specific and pancancer mutations. b The overall composition of cancer types in the patient–mutation network related to
1570 analyzed driver mutations in the study. c The genes that harbor the 1570 mutations and their relative contributions.
COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4 ARTICLE
COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio 3
www.nature.com/commsbio
www.nature.com/commsbio
Fig. 2 Classification of drive mutations and corresponding functional analysis. a The compressed patient–mutation network in which patients from same
cancer types are summarized on a red node. Mutations have same connection pattern with cancer types are compressed into one blue node. The number
in a blue node represents the number of mutations included in this node. Note that only node includes at least two mutations are shown. b The distribution
of network diversity values on cancer diversity spectrum and classification of driver mutations. The mutations above the bar plot are the cases from
corresponding categories. Different color nodes connected with a mutation represent patients from different cancer types. c The overlap of genes harboring
the three types of driver mutations. The GO biological process enrichment results of the SPM (d), RSM (e), and PCM (f) enriched gene network are shown.
ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4
4 COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio
www.nature.com/commsbio
driver mutations could be split into three distinct populations
(Fig. 2b). Consequently, we classified the mutations into three
categories using two theoretically estimated network diversity
cutoffs 0.3 and 0.64 (see Methods for details), and generated three
types mutations, 230 specific mutations (network diversity <0.3,
SPMs), 622 RSMs (0.3 ≤ network diversity < 0.64), and 718 PCMs
(network diversity ≥ 0.64). Of note, APC, EGFR, PTEN, SPOP,
and LRP1B are the most frequent driver genes in the SPM
category (Supplementary Fig. 2A). This category also includes
many known biomarkers for cancer diagnosis or targeted
treatment, such as APC Q1291* (for COAD), EGFR L858R (for
LUAD), BRAF V600E (for THCA and SKCM), DNMT3A R882H
and NPM1 W288Cfs*12 (for LAML). RSMs are exemplified by
SF3B1 K700E, which was mostly observed in BRCA patients (9/
15), but sporadic cases were observed in other cancer types
(LAML, PRAD, SARC, SKCM and THYM) with low frequency.
For this RSM category, TP53, PIK3CA, APC, and PTEN
mutations were most common (Supplementary Fig. 2B). In
contrast to the other two mutation classes, TP53 mutations
significantly dominated the PCM spectrum (Fisher’s exact test, p
value <0.001), which is consistent with a previous integrative
study23 in 12 major cancer types that demonstrated that TP53
was the only gene mutated near half of the tumors (Supplemen-
tary Fig. 2C). Driver genes that harbor multiple types of
mutations are common. A total of 18\% of genes harbor three
types of mutations and 50\% of genes harbor at least two types of
mutations (Fig. 2c). Except for TP53 (Fisher’s exact test, p value
<0.01, q < 0.01, Benjamini & Hochberg correction), there was no
other driver gene significantly enriched in any specific category
after multiple hypothesis correction (Supplementary Data 3).
Thus, the functional heterogeneity of the mutations could be a
common phenomenon from even the same cancer driver gene.
More details about the associated cancer types of each mutation
can be found in our web portal or Supplementary Data.
To explore biological pathways involved in different categories,
we constructed gene subnetworks by mapping the enriched genes
of each category onto protein functional networks using the
STRING database27 and performed a Gene Ontology (GO)
enrichment analysis to nominate related pathways or biological
processes (Fig. 2d–f, Supplementary Fig. 3, Supplementary
Data 4–6). The functional analysis showed that DNA repair
and cell cycle processes were generally observed in all three
categories. However, some processes were specific, including
signaling transduction processes, such as the ERK cascade and
peptidyl-tyrosine modification, which are mainly enriched in the
SPM gene network. Immune response genes are only enriched in
the RSM gene network, and chromatin remodeling is the most
prominent process for the PCM gene network. These results
suggest that certain biological pathways could influence tumor-
igenesis in specific tissues, while some pathways, such as
epigenetic processes, might have a wide impact on tumorigenesis
across many cancer types.
Cancer diversity spectrum and patients’ drug responses. Pre-
sumably, even if one driver gene contains multiple driver muta-
tions with varied specificities, then these mutations should appear
in separate protein domains corresponding to their specificity
categories. To test this hypothesis, we annotated driver mutations
in the functional protein domains of the driver gene by using the
Uniprot database28. Although some domains were enriched with
driver mutations, we unexpectedly found that the majority of
them harbored more than two types of mutations in the same
region (Fig. 3a). A typical example is the protein kinase domain of
BRAF protein. In this domain, S467L, G469V, V600M, V600G,
and V600E are SPMs, but K601E, G466E, G466V, G469R,
G469A, N581S, and D594N are RSMs or PCMs. One possible
explanation is that the annotations of the protein domain are
either incomplete or inaccurate. However, to reject the previous
hypothesis, we have to explain why mutations located at the same
position could belong to different categories, as exemplified by the
BRAF mutations G469V (SPM), G469R (RSM), and G469A
(PCM) and the KRAS mutations G12C (SPM), G12R (SPM),
G12D (RSM), and G12V (RSM). Previous biochemical studies
on BRAF and SPOP mutations showed that driver mutations
could induce very different biochemical behaviors of a protein
and exhibit opposite pharmaceutical effects, although these
mutations were very closed in linear sequence6,11. Our analysis
also revealed that cancer diversity classification could distinguish
drug response-related mutation effects in the same protein
domain. For example, BRAF mutations that were sensitive to
vemurafenib were classified as SPMs (V600M and V600E), and
insensitive mutations were classified as RSMs or PCMs (G469A,
G469R, G466V, G466E, N581S, D594N, and K601E)6. Similar to
vemurafenib, SPOP mutations showed BET inhibitor sensitivities
that were also consistent with our network diversity-based clas-
sifications but in a reverse relationship. Ishikawa cells over-
expressing the SPMs of SPOP, including Y87C, W131G, and
F133L, were resistant to treatment with the BET inhibitor JQ1,
while RSMs (R121Q and D140N) were sensitive11.
To comprehensively investigate the association between the cancer
diversity of mutations and antineoplastic therapy, we integrated the
cancer diversity spectrum with drug response data predicted by an
imputed drug-wide association study (IDWAS)29. IDWAS learned
statistical models from cell line-based drug response data and gene
expression profiles to predict 138 cancer drug responses for 5548
TCGA patients, which allows us to analyze the relationships of the
cancer diversity of mutations and drug responses in an unbiased
manner. Moreover, IDWAS only uses gene expression data, and its
results are independent of gene mutation information.
We evaluated whether there were different drug responses
among patients harboring SPMs, RSMs, and PCMs in the same
drug target (see Methods for details). Note that because the drug
response data from IDWAS are predicted from a gene expression-
based statistical model, the drug response values from IDWAS
have no clearly defined biological meaning and are not directly
comparable with traditional drug sensitivity values such as IC50
(drug concentration that reduces cell viability by 50\%); however,
lower value means greater drug sensitivity. In approximately one-
third of the tested drug–gene pairs (30/89), the drug response
seemed influenced by the cancer diversity of mutations (ANOVA,
p < 0.2, Supplementary Data 7), such as temsirolimus-BRAF
(ANOVA, p = 0.005), afatinib-EGFR (ANOVA, p = 2.92 × 10−8),
gemcitabine-KRAS (ANOVA, p = 0.0009), and AZD6482-PTEN
(ANOVA, p = 0.118) (Fig. 3b). We also observed that drug
sensitivity decreased as the cancer diversity of mutation increased
in multiple cases. For example, patients with SPMs of KRAS
were sensitive to gemcitabine, but the resistance was shown in
patients with RSMs and PCMs. The same trend was observed
in EGFR-mutated patients to erlotinib, BRAF-mutated patients
to PLX4720, and PTEN-mutated patients to AZD6482. One
exception is paclitaxel-KRAS, in which the drug sensitivity
increased with mutation cancer diversity. When compared with
the mutation-negative group (i.e., patients who did not harbor
driver mutations on the corresponding drug target), the largest
number of significantly differential drug responses (two-sided t-
test, p < 0.05) were from SPMs, which were nearly twice or more
than the observed number from RSMs or PCMs (Fig. 3c). We also
overlapped driver mutations with actionable mutations collected
from OncoKB30 and found that a majority of the actionable
mutations belonged to SPMs (Fig. 3d, Supplementary Data 8).
Overall, our results suggest that cancer diversity of mutations,
COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4 ARTICLE
COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio 5
www.nature.com/commsbio
www.nature.com/commsbio
Fig. 3 Distribution of three types of driver mutations in functional protein domains and the association of cancer diversity and drug sensitivity. a The
distribution of driver mutations in the functional domains of three representative genes. The functional protein domains are annotated according to Uniport
records. Three types of driver mutations were distinguished by the color and height of the dots in the lollipop plots. SPMs (blue and short), RSMs (green and
middle height), and PCMs (red and high). b The drug sensitivity of patients harboring SPMs, RSMs, and PCMs, respectively. The red stars mark statistically
significant groups when compared with corresponding negative groups (*p < 0.05, **p < 0.01, two-sided t-test). Drug sensitivity is predicted by IDWAS.
c The number of drug-mutation combinations that are significantly associated with drug response. Drug sensitivity data are from IDWAS. d The composition
of OncoKB evidence level in three types of mutations. From levels 1 to 4, the strength of evidence for clinical recommendation gradually decreased.
ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0736-4
6 COMMUNICATIONS BIOLOGY | (2020) 3:6 | https://doi.org/10.1038/s42003-019-0736-4 | www.nature.com/commsbio
www.nature.com/commsbio
especially for SPMs, are more correlated with patient drug
responses, and such effects cannot be readily inferred from the
functional domains of mutations.
Cancer diversity spectrum and cancer evolution. To understand
the impact of cancer evolution on the cancer diversity spectrum,
we first examined the correlation between cancer diversity spec-
trum and variant allele frequency (VAF) of driver mutations.
VAF represents the burden of mutations in a patient and is used
as an agent to quantify the relative size of tumor clones harboring
certain driver mutations. A high VAF value in primary tumors
usually implies that the corresponding mutation was from an
early/founder clone. To exclude the confounders that might dis-
tort VAF, we selected tumors with cancer cell purity >70\% and
mutation data from copy number neutral regions. We computed
the Pearson correlation coefficient (PCC) between mutations’
VAFs and network diversity values for genes with ten or more
mutations (Fig. 4a, Supplementary Fig. 4). Among significant
correlations, cancer diversity of mutations negatively correlates
with VAFs in BRAF, KIT, PREX2, NRAS, and SF3B1 but posi-
tively correlates with VAFs in FBXW7, KMT2D, NF1, and SPOP.
After examining the mode of action of these driver genes, we
found that OGs involved more negative correlation relationships,
and TSGs included more positive correlation relationships
(Fig. 4b). The average PCC values for OGs and TSGs are –0.08
and 0.01, respectively, but the difference between them is not
significant (Wilcoxon sum-rank test, p value = 0.079). Con-
sidering that high VAF generally indicates an early tumor clone,
our results imply that a part of OG-related SPMs and tumor
suppressor-related PCMs tend to occur in the early stage of
tumorigenesis.
To explore the pattern of different mutation types in the long-
term cancer evolution, we compared the network diversity values
of driver mutations between primary and advanced tumors. We
used MSK-IMPACT data31 that include genetic aberrations of
approximately 400 cancer-related genes from more than 10,000
patients with advanced tumors, representing the mutation
landscape of the late stage of tumorigenesis. The mutational
frequencies of genes in TCGA and MSK-IMPACT cohorts are
highly consistent31. We calculated and compared the network
diversity values of 625 common driver mutations between the
TCGA and MSK-IMPACT groups (Fig. 4c, Supplementary
Data 9). The results showed that 57\% (359/625) of the cancer
diversity classifications of driver mutations were conserved.
Nevertheless, 140 RSMs in TCGA increase their cancer diversity
and covert to PCMs in MSK-IMPACT (Fig. 4d). Overall, the
cancer diversity of mutations in advanced tumors was signifi-
cantly higher than those in primary tumors (Fig. 4e). Interest-
ingly, we found three mutations, EGFR L861Q, MAP2K4 S184L,
and TP53 E285V, that were PCMs in TCGA but became SPMs in
MSK-IMPACT tumors, suggesting that cancer-specific selection
may drive them during the continuous progression of related
tumors. A previous study related EGFR L861Q to the resistance of
EGFR-TKI therapy in lung cancer7, which suggests that this
improved cancer specificity in advanced tumors might be
attributed to the result of selection during targeted cancer
therapies. Taken together, the cancer diversity results of driver
mutations not only can influence clonal evolution but also can be
reshaped in cancer progression.
Comutation patterns between mutations from different classes.
It has been demonstrated that there are complex dependencies
among driver mutations and that they are related to clonal evo-
lution and the clinical prognosis of tumors32. We asked whether
there are unique dependencies in mutations with different cancer
type specificities. To answer this question, we performed comu-
tation analysis for all driver mutation pairs and constructed …
CATEGORIES
Economics
Nursing
Applied Sciences
Psychology
Science
Management
Computer Science
Human Resource Management
Accounting
Information Systems
English
Anatomy
Operations Management
Sociology
Literature
Education
Business & Finance
Marketing
Engineering
Statistics
Biology
Political Science
Reading
History
Financial markets
Philosophy
Mathematics
Law
Criminal
Architecture and Design
Government
Social Science
World history
Chemistry
Humanities
Business Finance
Writing
Programming
Telecommunications Engineering
Geography
Physics
Spanish
ach
e. Embedded Entrepreneurship
f. Three Social Entrepreneurship Models
g. Social-Founder Identity
h. Micros-enterprise Development
Outcomes
Subset 2. Indigenous Entrepreneurship Approaches (Outside of Canada)
a. Indigenous Australian Entrepreneurs Exami
Calculus
(people influence of
others) processes that you perceived occurs in this specific Institution Select one of the forms of stratification highlighted (focus on inter the intersectionalities
of these three) to reflect and analyze the potential ways these (
American history
Pharmacology
Ancient history
. Also
Numerical analysis
Environmental science
Electrical Engineering
Precalculus
Physiology
Civil Engineering
Electronic Engineering
ness Horizons
Algebra
Geology
Physical chemistry
nt
When considering both O
lassrooms
Civil
Probability
ions
Identify a specific consumer product that you or your family have used for quite some time. This might be a branded smartphone (if you have used several versions over the years)
or the court to consider in its deliberations. Locard’s exchange principle argues that during the commission of a crime
Chemical Engineering
Ecology
aragraphs (meaning 25 sentences or more). Your assignment may be more than 5 paragraphs but not less.
INSTRUCTIONS:
To access the FNU Online Library for journals and articles you can go the FNU library link here:
https://www.fnu.edu/library/
In order to
n that draws upon the theoretical reading to explain and contextualize the design choices. Be sure to directly quote or paraphrase the reading
ce to the vaccine. Your campaign must educate and inform the audience on the benefits but also create for safe and open dialogue. A key metric of your campaign will be the direct increase in numbers.
Key outcomes: The approach that you take must be clear
Mechanical Engineering
Organic chemistry
Geometry
nment
Topic
You will need to pick one topic for your project (5 pts)
Literature search
You will need to perform a literature search for your topic
Geophysics
you been involved with a company doing a redesign of business processes
Communication on Customer Relations. Discuss how two-way communication on social media channels impacts businesses both positively and negatively. Provide any personal examples from your experience
od pressure and hypertension via a community-wide intervention that targets the problem across the lifespan (i.e. includes all ages).
Develop a community-wide intervention to reduce elevated blood pressure and hypertension in the State of Alabama that in
in body of the report
Conclusions
References (8 References Minimum)
*** Words count = 2000 words.
*** In-Text Citations and References using Harvard style.
*** In Task section I’ve chose (Economic issues in overseas contracting)"
Electromagnetism
w or quality improvement; it was just all part of good nursing care. The goal for quality improvement is to monitor patient outcomes using statistics for comparison to standards of care for different diseases
e a 1 to 2 slide Microsoft PowerPoint presentation on the different models of case management. Include speaker notes... .....Describe three different models of case management.
visual representations of information. They can include numbers
SSAY
ame workbook for all 3 milestones. You do not need to download a new copy for Milestones 2 or 3. When you submit Milestone 3
pages):
Provide a description of an existing intervention in Canada
making the appropriate buying decisions in an ethical and professional manner.
Topic: Purchasing and Technology
You read about blockchain ledger technology. Now do some additional research out on the Internet and share your URL with the rest of the class
be aware of which features their competitors are opting to include so the product development teams can design similar or enhanced features to attract more of the market. The more unique
low (The Top Health Industry Trends to Watch in 2015) to assist you with this discussion.
https://youtu.be/fRym_jyuBc0
Next year the $2.8 trillion U.S. healthcare industry will finally begin to look and feel more like the rest of the business wo
evidence-based primary care curriculum. Throughout your nurse practitioner program
Vignette
Understanding Gender Fluidity
Providing Inclusive Quality Care
Affirming Clinical Encounters
Conclusion
References
Nurse Practitioner Knowledge
Mechanics
and word limit is unit as a guide only.
The assessment may be re-attempted on two further occasions (maximum three attempts in total). All assessments must be resubmitted 3 days within receiving your unsatisfactory grade. You must clearly indicate “Re-su
Trigonometry
Article writing
Other
5. June 29
After the components sending to the manufacturing house
1. In 1972 the Furman v. Georgia case resulted in a decision that would put action into motion. Furman was originally sentenced to death because of a murder he committed in Georgia but the court debated whether or not this was a violation of his 8th amend
One of the first conflicts that would need to be investigated would be whether the human service professional followed the responsibility to client ethical standard. While developing a relationship with client it is important to clarify that if danger or
Ethical behavior is a critical topic in the workplace because the impact of it can make or break a business
No matter which type of health care organization
With a direct sale
During the pandemic
Computers are being used to monitor the spread of outbreaks in different areas of the world and with this record
3. Furman v. Georgia is a U.S Supreme Court case that resolves around the Eighth Amendments ban on cruel and unsual punishment in death penalty cases. The Furman v. Georgia case was based on Furman being convicted of murder in Georgia. Furman was caught i
One major ethical conflict that may arise in my investigation is the Responsibility to Client in both Standard 3 and Standard 4 of the Ethical Standards for Human Service Professionals (2015). Making sure we do not disclose information without consent ev
4. Identify two examples of real world problems that you have observed in your personal
Summary & Evaluation: Reference & 188. Academic Search Ultimate
Ethics
We can mention at least one example of how the violation of ethical standards can be prevented. Many organizations promote ethical self-regulation by creating moral codes to help direct their business activities
*DDB is used for the first three years
For example
The inbound logistics for William Instrument refer to purchase components from various electronic firms. During the purchase process William need to consider the quality and price of the components. In this case
4. A U.S. Supreme Court case known as Furman v. Georgia (1972) is a landmark case that involved Eighth Amendment’s ban of unusual and cruel punishment in death penalty cases (Furman v. Georgia (1972)
With covid coming into place
In my opinion
with
Not necessarily all home buyers are the same! When you choose to work with we buy ugly houses Baltimore & nationwide USA
The ability to view ourselves from an unbiased perspective allows us to critically assess our personal strengths and weaknesses. This is an important step in the process of finding the right resources for our personal learning style. Ego and pride can be
· By Day 1 of this week
While you must form your answers to the questions below from our assigned reading material
CliftonLarsonAllen LLP (2013)
5 The family dynamic is awkward at first since the most outgoing and straight forward person in the family in Linda
Urien
The most important benefit of my statistical analysis would be the accuracy with which I interpret the data. The greatest obstacle
From a similar but larger point of view
4 In order to get the entire family to come back for another session I would suggest coming in on a day the restaurant is not open
When seeking to identify a patient’s health condition
After viewing the you tube videos on prayer
Your paper must be at least two pages in length (not counting the title and reference pages)
The word assimilate is negative to me. I believe everyone should learn about a country that they are going to live in. It doesnt mean that they have to believe that everything in America is better than where they came from. It means that they care enough
Data collection
Single Subject Chris is a social worker in a geriatric case management program located in a midsize Northeastern town. She has an MSW and is part of a team of case managers that likes to continuously improve on its practice. The team is currently using an
I would start off with Linda on repeating her options for the child and going over what she is feeling with each option. I would want to find out what she is afraid of. I would avoid asking her any “why” questions because I want her to be in the here an
Summarize the advantages and disadvantages of using an Internet site as means of collecting data for psychological research (Comp 2.1) 25.0\% Summarization of the advantages and disadvantages of using an Internet site as means of collecting data for psych
Identify the type of research used in a chosen study
Compose a 1
Optics
effect relationship becomes more difficult—as the researcher cannot enact total control of another person even in an experimental environment. Social workers serve clients in highly complex real-world environments. Clients often implement recommended inte
I think knowing more about you will allow you to be able to choose the right resources
Be 4 pages in length
soft MB-920 dumps review and documentation and high-quality listing pdf MB-920 braindumps also recommended and approved by Microsoft experts. The practical test
g
One thing you will need to do in college is learn how to find and use references. References support your ideas. College-level work must be supported by research. You are expected to do that for this paper. You will research
Elaborate on any potential confounds or ethical concerns while participating in the psychological study 20.0\% Elaboration on any potential confounds or ethical concerns while participating in the psychological study is missing. Elaboration on any potenti
3 The first thing I would do in the family’s first session is develop a genogram of the family to get an idea of all the individuals who play a major role in Linda’s life. After establishing where each member is in relation to the family
A Health in All Policies approach
Note: The requirements outlined below correspond to the grading criteria in the scoring guide. At a minimum
Chen
Read Connecting Communities and Complexity: A Case Study in Creating the Conditions for Transformational Change
Read Reflections on Cultural Humility
Read A Basic Guide to ABCD Community Organizing
Use the bolded black section and sub-section titles below to organize your paper. For each section
Losinski forwarded the article on a priority basis to Mary Scott
Losinksi wanted details on use of the ED at CGH. He asked the administrative resident