You are asked to write an essay to discuss the topic of data mining. Please address each of the following:1.Based on the assigned readings(3 research papers)and textbook, what is data mining? Are the definitions consistent or contradictory?2.Describe the - Management
Business and Management
Topic:
Data Mining
Type of work:
Term Paper
Level:
College
Number of pages:
4 pages = 250/-
Grade:
High Quality (Normal Charge)
Formatting style:
APA
Language Style:
English (U.S.)
Sources:
4
Website Region:
United States
Customer Time:
You are asked to write an essay to discuss the topic of data mining. Please address each of the following:1.Based on the assigned readings(3 research papers)and textbook, what is data mining? Are the definitions consistent or contradictory?2.Describe the various applications of data mining based on the following perspectives: a. Biomedical Data Mining b. Educational Technology Classroom Research c. Public Internet Data Mining For each perspective, address the benefits, issues, and challenges.
Guidelines1.APA style2.References: make sure you reference all research papers into your essay. You are also expected to reference the course textbook. When you do, indicate page numbers.3.All referenced material must be paraphrased, not quoted. If you need more information on how to paraphrase, refer to the “Avoiding Plagiarism” section of the course website.4.Your essay will have at least 1,000 words excluding references and title page.5.Your essay will be submitted to Turnitin.com. Since you are not allowed to quote, only paraphrase, your similarity report should be blue or green (see diagram below). Yellow, orange, or red outcome, depending on its severity, may result in a failing grade.6. You will submit one Word document. No other document types will be allowed.Rubric1. 40% Content: Did you answer the questions?2. 30% References: How well did you incorporate references in your essay?3. 30% Writing: Is your document free from spelling and grammatical errors?
Here is the link to the text book https://www.sendspace.com/file/6qzqsi
--
Quality is Not an Option
Recent Advances and Emerging Applications in Text
and Data Mining for Biomedical Discovery
Graciela H. Gonzalez, Tasnia Tahsin, Britton C. Goodale, Anna C. Greene and
Casey S. Greene
Corresponding author. Casey S. Greene, Institute for Translational Medicine and Therapeutics, 10-131 Smilow Center for Translational Research, 3400
Civic Center Boulevard, Building 421, Philadelphia, PA 19104-5158, USA. Tel.: 215-573-2991; Fax: 215-573-9135; E-mail: [email protected]
Abstract
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of
precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease.
We are starting to address this challenge through automatic approaches for information extraction, representation and
analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in
genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data
mining, as well as recent advances and emerging applications toward precision medicine.
Key words: text mining; data mining; biomedical discovery; gene prioritization; pharmacogenomics; toxicology
Introduction
Technologies that resulted in the successful completion of the
Human Genome project and those that have followed it afford an
unprecedented breadth of data collection avenues (whole-genome
expression data, chip-based comparative genomic hybridization
and proteomics of signal transduction pathways, among many
others) and have resulted in exceptional opportunities to advance
the understanding of the genetic basis of human disease.
However, high-throughput results are usually only the first step in
a long discovery process, with subsequent and much more time-
consuming experiments that, in the best of cases, culminate in the
publication of results in journals and conference proceedings.
Rather than stopping at the publication stage, the challenge for
precision medicine is then to translate all of these research results
into better treatments and improved health. To achieve this goal,
a range of analytic methods and computational approaches have
evolved from other domains and have been applied to an ever-
growing set of specific problem areas. It would be impossible to
enumerate the numerous biological questions targeted by
computational approaches. We will focus here on an overview of
text and data mining methods and their applications to discovery
in a broad range of biomedical areas, including biological pathway
extraction and reasoning, gene prioritization, precision medicine,
pharmacogenomics and toxicology. The advances are plenty and
the specific areas of application diverse, but the fundamental
Graciela H. Gonzalez is an Associate Professor in the Department of Biomedical Informatics at Arizona State University, Scottsdale, Arizona, United States.
Tasnia Tahsin is a PhD student in the Department of Biomedical Informatics at Arizona State University, Scottsdale, Arizona, United States.
Britton C. Goodale is a postdoctoral fellow in the Department of Microbiology and Immunology at the Geisel School of Medicine at Dartmouth College,
Hanover, New Hampshire, United States.
Anna C. Greene is the Assistant Curriculum Director for the Graduate Program in Quantitative Biomedical Sciences at Dartmouth College, Hanover, New
Hampshire, United States.
Casey S. Greene is an Assistant Professor in the Department of Systems Pharmacology and Translational Therapeutics in the Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pennsylvania, United States.
Submitted: 17 February 2015; Received (in revised form): 26 August 2015
VC The Author 2015. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact [email protected]
33
Briefings in Bioinformatics, 17(1), 2016, 33–42
doi: 10.1093/bib/bbv087
Advance Access Publication Date: 29 September 2015
Paper
D
ow
nloaded from
https://academ
ic.oup.com
/bib/article/17/1/33/2240575 by Libraries of the C
larem
ontC
olleges user on 10 O
ctober 2020
,
,
http://www.oxfordjournals.org/
motivation is to aid scientists in analyzing available data to sug-
gest a road to discovery, to precise predictions that lead to better
health.
Background
Data mining
Data mining is the act of computationally extracting new infor-
mation from large amounts of data [1], and the biological sci-
ences are generating enormous quantities of data, ushering in
the era of ‘big data’. Stephens et al. state that sequencing data
alone constitutes �35 petabases/year and will grow to �1 zetta-
base/year by 2025 [2]. This creates a large opportunity for the
development and deployment of novel mining algorithms, and
two recent reviews on data and text mining in the era of big
data are found in Che et al. [3] and Herland et al. [4]. A wide
variety of methods for extracting value from different types and
models of data fall under the umbrella of ‘data mining’.
Classification algorithms (decision trees, naı̈ve Bayesian
classification and other classifiers), frequent pattern algorithms
(association rule mining, sequential pattern mining and others),
clustering algorithms (including methods to cluster continuous
and categorical data) and graph and network algorithms have
all evolved to present a diverse landscape for research and an
arsenal to deploy against the toughest data challenges. Most
researchers consider some other areas, including text mining,
as being under the data mining umbrella. For example,
Piatetsky-Shapiro state: ‘Data Mining in my opinion includes:
text mining, image mining, web mining, predictive analytics,
and much of the techniques we use for dealing with massive
data sets, now known as Big Data’ [5]. The methods applied to
text mining, however, are specialized to such a degree that it is
common to view it as a separate area of specialty. Data mining
courses do not usually include any text mining material, but ra-
ther there are separate courses dedicated to it, and the same
applies to textbooks.
A complete coverage of data mining techniques is beyond
the scope of this article though we have included some import-
ant resources that cover this topic. Kernel Methods in
Computational Biology by Schölkopf, Tsuda and Vert [6] covers
methods specific to Computational Biology. Introduction to Data
Mining [7] and Data Mining: Concepts and Techniques, 3rd edn [8] are
two popular textbooks in data mining and give an excellent
overview of the field. A more concise presentation can be found
in the paper by Xindong Wu et al., Top 10 algorithms in data mining
[9], which were identified in December 2006 as C4.5, k-Means,
SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes and
CART, covering clustering, classification and association
analysis, which are among the most important topics in data
mining research:
• According to Jain et al. in ‘Data clustering: a review’, ‘Clustering is
the unsupervised classification of patterns (observations, data
items, or feature vectors) into groups (clusters)’ [10].
• Classification is akin to clustering because it segments data into
groups called classes, but unlike clustering, classification
analyses require knowledge and specification of how classes are
defined.
• Statistical learning theory seeks ‘to provide a framework for
studying the problem of inference that is, of gaining knowledge,
making predictions, making decisions or constructing models
from a set of data’ states Bousquet et al. [11]. A textbook on statis-
tical learning expands on these notions [12].
• Association analysis facilitates the unmasking of hidden
relationships in large data sets. The discovered associations are
then expressed as rules or sets of items that frequently occur to-
gether. Challenges to association analysis methods include that
discovering such patterns can be computationally expensive
given a large input data set and that there could potentially be
many spurious associations ‘discovered’ that simply occur by
chance. A well-known introduction to the topic is found in [13],
and in particular, a seminal paper on mining association rules
from clinical databases is found in Stilou et al. [14].
• Link analysis analyzes hyperlinks and the graph structure of the
Web for the ranking of web search results. PageRank is perhaps
the best-known algorithm for link analysis [15].
In a notable transition showing the power of new algorithms
and data, data mining approaches are now being used to learn,
not just the primary features but also context-specific features.
For example, initial data mining approaches that constructed
gene–gene networks built a single network [16]. In contrast,
recent approaches learn multiple context-specific networks,
allowing the construction of process-specific [17] and
tissue-specific networks [18–20]. An individual is made up of a
personalized combination of such context-specific networks, so
we anticipate that continued advances in the context specificity
of data mining approach will play an important role in the
broad implementation of precision medicine.
Text mining
Text mining is a subfield of data mining that seeks to extract
valuable new information from unstructured (or semi-
structured) sources [21]. Text mining extracts information from
within those documents and aggregates the extracted pieces
over the entire collection of source documents to uncover or de-
rive new information. This is the preferred view of the field that
allows one to distinguish text mining from natural language pro-
cessing (NLP) [22, 23]. Thus, given as input a set of documents,
text mining methods seek to discover novel patterns, relation-
ships and trends contained within the documents. Aiding the
overall goal of discovering new information are NLP programs
that go from the relatively simple text processing tasks at the
lexical or grammatical levels (such as a tokenizing or a part-
of-speech tagger), to relatively complex information extraction
algorithms [like named entity recognition (NER) to find concepts
such as genes or diseases, normalization to map them to their
unique identifiers or relationship extraction and sentiment ana-
lysis systems, among others]. The greater the complexity of the
task, the more likely it is to integrate methods from data mining
(such as classification or statistical learning).
Although there is no current textbook that can be considered
the definite guide on text mining as defined above, there are a
couple of classic textbooks that cover fundamental NLP
techniques and at least the first covers some of the analytics
required to discover information: Speech and Language Processing
by Jurafsky and Martin [24] and Foundations of Statistical Natural
Language Processing by Manning and Schuetze [25]. The biomed-
ical domain is one of the most interesting application areas for
text mining, given both the potential impact of the information
that can be discovered and the specific characteristics and
volume of information available. The textbook Text mining for
biology and medicine [26] offers an overview of the fundamental
approaches to biomedical NLP, emphasizing different sub-areas
in each chapter, although overall it does not totally adhere to the
definition of text mining as a means for discovery given by
34 | Gonzalez et al.
D
ow
nloaded from
https://academ
ic.oup.com
/bib/article/17/1/33/2240575 by Libraries of the C
larem
ontC
olleges user on 10 O
ctober 2020
in order
``
''
``
''
,
,
,
``
''
paper
,
,
very
,
,
``
''
``
''
``
''
``
''
-
,
natural language processing
(
,
)
natural language processing
Hearst [23]. A good non-textbook review of the different subareas
is the article ‘Frontiers of biomedical text mining: current
progress’ [27]. For those just starting in the area, the article
‘Getting Started in Text Mining’ [28] is a good starting point. A
more in-depth treatment of automated techniques applied to
the biomedical literature and its contribution to innovative
biomedical research can be found in ‘Text-mining solutions for
biomedical research: enabling integrative biology’ [29].
Text mining sub-areas, briefly summarized, include:
• Information Retrieval deals with the problem of finding relevant
documents in response to a specific information need (query).
An overview of tools for information retrieval from the biomed-
ical literature can be found in [30].
• NER is at the core of the automatic extraction of information
from text and deals with the problem of finding references to
entities (mentions) such as genes, drugs and diseases present in
natural language text and tagging them with their location and
type. NER is also referred to as ‘entity tagging’ or ‘concept extrac-
tion’. This is a basic building block for almost all other extraction
tasks. NER in the biomedical domain is generally considered to
be more difficult than other domains, such as geography or news
reports. This is owing to inconsistency in how known entities,
such as symptoms or drugs, are named (e.g. nonstandard abbre-
viations and new ways of referring to them). An open-source
NER engine, BANNER [31], with models to recognize genes and
diseases mentioned in biomedical text, is currently available for
gene and disease NER, and LINNAEUS is available for species
[32]. Rebholz-Schuhmann et al. [33] present an overview of the
NER solutions for the second CALBC task, including protein,
disease, chemical (drug) and species entities. Campos et al. [34]
discuss a recent survey of tools for biomedical NER. A system
assigning text to a wide range of semantic classes using linguis-
tic rules is presented in [35], illustrating a slightly different than
standard NER because classes potentially overlap. Verspoor et al.
[36] use the CRAFT corpus to improve the evaluation of gene NER
(and some lower-level tasks like part-of-speech and sentence
segmentation). Recent work in [37] presents an NER system for
extracting gene and protein sequence variants from the
biomedical literature. For locating chemical compounds,
Krallinger et al. [38] summarize the task that was part of
BioCreative IV and give a short overview of some of the
techniques used.
• Named Entity Identification allows the linkage of objects of
interest, such as genes, to information that is not detailed in a
publication (such as their Entrez Gene identifier) [39]. Two open-
source systems using largely dictionary-based approaches to
normalize gene names appear in [39–41]. For normalizing disease
names, [42] introduces DNorm, a new normalization framework
using machine learning, with strong results.
• Association extraction is one of the higher-level tasks still
considered purely an information extraction application. It uses
the output from the prior subtasks to produce a list of (binary or
higher) associations among the different entities of interest.
Catalysts for advances in this area have been the Biocreative and
BioNLP shared tasks, with excellent teams from around the
world putting their systems to the test against carefully
annotated data sets. A survey of submissions to Biocreative III
[43] and BioNLP [44, 45] shows a good overview of approaches re-
sponsive to the respective shared tasks. Putting together associ-
ations into networks of molecular interactions that can explain
complex biological processes is the next logical step, and one
that still is considered the ‘holy grail’ of automatic biomolecular
extraction. Ananiadou et al. [46] and Li et al. [47] discuss
comprehensive surveys of methods for the extraction of network
information from the scientific literature and the evaluation of
extraction methods against reference corpora. Semantic-based
approaches such as [48] will make their mark in the coming
years.
• Event extraction is similar to association extraction but instead
of separately extracting various relations between different
entities in text, this task focuses on identifying specific events
and the various players involved in it (arguments). For instance,
the arguments of a transport event will include the molecule
being transported, the cell to which it is being transported and
the cell from which it is being transported. Event extraction was
a key component of the BioNLP Shared Tasks in both 2011 [45]
and 2013 [49], challenging the biomedical community to expand
and cultivate their approaches in this area and leading to stead-
ily improving results.
• Pathway extraction is a budding branch of biomedical text
mining closely following the footsteps of event extraction. It
involves the automated construction of biological pathways
through the extraction and ordering of pathway-related events
from text. Although, like [50] and [51], the majority of researchers
in this domain have been focusing their efforts on supporting
pathway curation through event extraction, rather than entirely
automating the process. Tari et al. was able to achieve promising
results for the automated synthesis of pharmacokinetic
pathways by applying an automated reasoning-based approach
for event ordering [52]. The first shared task on Pathway
Curation was organized by BioNLP in 2013 [49] to establish
the current state-of-the-art performance level for extract-
ing pathway-relevant events such as phosphorylation and
transport.
In the end, a set of the different subtask solutions are used
in a pipeline that allows information to be integrated and
analyzed toward knowledge discovery. However, this multiplies
the effects of errors down the pipeline, leaving systems highly
vulnerable.
An overarching challenge for biomedical text mining is to
incorporate the many knowledge resources that are available to
us into the NLP pipeline. In the biomedical domain, unlike the
general text mining domain, we have access to large numbers
of extensive, well-curated ontologies and knowledge bases.
Biomedical ontologies provide an explicit characterization of a
given domain of interest. The quality of data mining efforts
would likely increase if existing ontologies (e.g. UMLS [53] and
BioPortal [54]) were used as sources of terms in building
lexicons, for figuring out what concept subsumes another, and
as a way of normalizing alternative names to one identifier. For
example, using ontologies as described enabled the use of
unstructured clinical notes for generating practice-based
evidence on the safety of a highly effective, generic drug for
peripheral vascular disease [55].
Today, the data being generated is massive, complex and
increasingly diverse owing to recent technological innovations.
However, the impact of this data revolution on our lives is
hampered by the limited amount of data that has been
analyzed. This necessitates data mining tools and methods that
can match the scale of the data and support timely decision-
making through integration of multiple heterogeneous data
sources.
Finally, another area in which the field has fallen short is
that of making text mining applications that are easily
adaptable by end users. Many researchers have developed
systems that can be adapted by other text mining specialists,
Rcent advances and emerging applications | 35
D
ow
nloaded from
https://academ
ic.oup.com
/bib/article/17/1/33/2240575 by Libraries of the C
larem
ontC
olleges user on 10 O
ctober 2020
-
",0,0,2
",0,0,2
``
''
",0,0,2
",0,0,2
(IR)
Named Entity Recognition (
)
,
``
''
``
''
Named entity recognition
due
-
,
es
s
s
which
s
employ
employing
-
``
''
-
natural language processing
,
due
but applications that can be tuned by bench scientists are
mostly lacking.
Application areas
Pathway extraction and reasoning
Analyzing the intricate network of biological pathways is an
essential precursor to understanding the molecular mechan-
isms of complex diseases affecting humans. Without acquiring
a deeper insight into the underlying mechanisms behind such
diseases, we cannot advance in our efforts to design effective
solutions for preventing and treating them. However, given the
vast amount of data currently available on biological pathways
in biomedical publications and databases and the highly inter-
connected nature of these pathways, any attempt to manually
reason over them will invariably prove to be largely ineffective
and inefficient. As a result, there is a growing need for computa-
tional approaches to address this demanding task through
automated pathway analysis. Pathway analysis can be either
quantitative or qualitative and is a key focus of the growing field
of Systems Biology. Quantitative pathway analysis uses dy-
namic mathematical models for simulating pathways and can
be especially useful in drug discovery and the development of
patient-specific dosage guidelines [56]. Some examples of tech-
niques used in this form of analysis include ordinary differen-
tial equations [57], Petri Nets [58], and p-calculus [59].
Qualitative pathway analysis uses static, structural representa-
tions of pathways to answer qualitative questions about them;
for instance it may be used to explain why a certain phenom-
enon occurs in the pathway based on existing pathway know-
ledge. Artificial intelligence paradigms, such as symbolic (i.e.
explicit representations) or connectionist (i.e. massively paral-
lelized) approaches, can greatly inform this type of pathway
analysis [60]. Although some of the techniques principally ad-
dressing quantitative pathway analysis, such as Petri Nets and
p-calculus, may also be used to perform qualitative pathway
analysis, they typically tend to provide limited functionality
[61]. Therefore, richer languages such as Maude [62], BioCham
[63] and action languages [52, 64, 65] are more popular in this
domain. In recent years, hybrid approaches have been applied
for qualitative pathway reasoning. For instance, [66] presents a
qualitative pathway reasoning system that uses Petri net se-
mantics as the pathway specification language and action lan-
guages as the query language. Pathway reasoning, as a
technique, relies on either humans defining the pathway
information needed or the development of new algorithms to
extract, represent and reason over biological pathways, which is
an area of growing interest.
Gene prioritization and gene function prediction
Complex diseases present diverse symptoms because they are
caused by multiple genes and environmental factors that differ
for each individual and can diverge at different stages of the
disease process. This complexity is reflective of epistatic effects
where causative genes have an impact on the expression of
many other genes. Because variant expression levels vary
across the genome, it is difficult to determine true causative
genes or distinguish key sets affected by the disease from high-
throughput experiments. For example, the Affimetrix U133 Plus
2.0 microarray chip from the Repository of Molecular Brain
Neoplasia Data shows >7500 2-fold differentially expressed
genes in brain cancer tissue when compared with normal brain
tissue [67]. The validation of a single causative gene is a long
and expensive process [68], often taking up to a year and even
longer, which necessitates using gene prioritization to pare
down the list of potential gene targets to a manageable size.
Gene prioritization methods that suggest the most significant
prospects for further validation are critically needed, and
method development in this area would greatly facilitate
discovery.
Many gene prioritization algorithms have been developed to
address this problem, such as GeneWanderer [69], GeneSeeker
[70], GeneProspector [71], SUSPECTS [72], G2D [73] and
Endeavour [74], among others [75, 76]. A comparative review of
these methods can be found in Tranchevent et al. [77]. The gen-
eral premise of these methods is to rank genes based on the
similarity between a set of candidate genes compared with
genes already known to be associated with the disease (usually
called the training set). Similarity is established based on
different parameters (depending on the specific method) and
may include purely biological measures (such as cytogenetic
location, expression patterns, patterns of pathogenic mutations
or DNA sequence similarity), biological measures plus
annotation of the genes using different protein databases (for
example, UniProt [78] and InterPro [79]), or other vocabularies
and ontologies (such as the Gene Ontology [80, 81], eVOC [82],
MeSH [83] and term vectors from the literature). In these
methods, the closer a gene in the candidate list coincides with
the profile of the training genes, the higher it is ranked.
Gene prioritization includes the areas of gene function
prediction. The Critical Assessment of protein Function
Annotation experiment was the first large community-wide
evaluation of 54 methods that were compared on a core set of
annotations using evaluation metrics to ascertain the top meth-
ods [84]. Earlier computational methods for prioritization were
compared through a large-scale biological assay of yeast
mitochondrial phenotypes and found to be effective [85, 86]. A
related but distinct gene prioritization problem is the identifica-
tion of genes with tissue-specific expression patterns [87].
Existing webservers such as GeneMANIA [88, 89] and IMP [90]
allow biologists to perform gene prioritization by network
connectivity, and servers such as PILGRM allow for prioritiza-
tion directly by gene expression [91]. Predicted functions, in
addition to curated functions, have also shown promise for in-
terpreting the results of genome-wide association studies,
which aim to pair genetic variants with associated genes and
pathways [92].
Precision medicine and drug repositioning
Precision medicine is determining prevention and treatment
strategies based on an individual’s predisposition in an effort to
provide more targeted and therefore effective treatments [93].
This area is poised for intense growth based on the ease of
obtaining patient data and the development of computational
methods with which to analyze this personalized data. While
precision medicine is a nascent field, there have been many ad-
vances in the personalized treatment of cancer. Some hospitals
are already using genetic data to direct treatment options for
cancer patients (e.g. BRCA1 and BRCA2 [94], BRAF [95] testing),
though drugs targeted to specific mutations lag behind and is
an area where computational drug repositioning will potentially
have a strong impact [96].
On the clinical side of translational research, the demand for
timely and accurate knowledge has the urgency of life itself.
Emily Whitehead was the first child with acute lymphoblastic
36 | Gonzalez et al.
D
ow
nloaded from
https://academ
ic.oup.com
/bib/article/17/1/33/2240575 by Libraries of the C
larem
ontC
olleges user on 10 O
ctober 2020
,
,
(REMBRANDT)
more than
two
to
,
to
,
,
(CAFA)
leukemia to be treated and cured with an experimental T cell
therapy called CAR T cell therapy at the Children’s Hospital of
Philadelphia [97]. The therapy enables the patient’s T cells to
recognize and attack malignant B cells, but this treatment can
also trigger an intense immune reaction, which Emily experi-
enced. She suffered from a high level of the interleukin 6 pro-
tein, and her doctors suggested trying tocilizumab (Actemra), a
rheumatoid arthritis drug, to combat the extraneous protein
production [97, 98]. This drug returned Emily’s vital signs back
to normal. In this case, rather than relying on the serendipity of
a team member knowing about the right drug, specialized text
mining could have been used to mine the literature for the
relevant drugs. In such a scenario, either the literature would be
mined in advance, stored in a database that extracts relation-
ships between drugs and genes or proteins or it could be
searched in real time. As an example of this, Essack et al. created
a sickle cell disease knowledgebase by mining 419 612 PubMed
abstracts related to red blood cells, anemia or this disease [99].
Some databases (such as PharmGKB) store such relationships,
but are not the result of automatic extraction. Manual curation
is still the current standard for such databases, with the value
of text mining applications yet to be fully realized. Currently,
despite notable advances in entity mention extraction and
normalization, the use of text mining is mostly limited to aiding
curators to speed up the process.
Data and text mining methods are useful for biomedical
predictions and can be successfully extended to biomedical
discoveries as well. Sirota et al. used publicly available gene
expression data for both drugs and diseases to ascertain if Food
and Drug Administration-approved drugs could be repositioned
for use in new diseases [100]. They discovered and experimen-
tally validated the use of cimetidine, generally used for
heartburn and peptic ulcers, as a treatment …
Computers & Education 113 (2017) 226e242
Contents lists available at ScienceDirect
Computers & Education
journal homepage: www.elsevier.com/locate/compedu
Data mining in educational technology classroom research:
Can it make a contribution?
Charoula Angeli a, *, Sarah K. Howard b, Jun Ma b, Jie Yang b,
Paul A. Kirschner c, d
a University of Cyprus, Cyprus
b University of Wollongong, Australia
c Open University of the Netherlands, The Netherlands
d University of Oulu, Finland
a r t i c l e i n f o
Article history:
Received 18 June 2016
Received in revised form 15 March 2017
Accepted 29 May 2017
Available online 30 May 2017
Keywords:
Educational data mining
Educational technology research
Association rules mining
Fuzzy representations
* Corresponding author. 11-13 Dramas street, P.O
E-mail address: [email protected] (C. Angeli).
http://dx.doi.org/10.1016/j.compedu.2017.05.021
0360-1315/© 2017 Elsevier Ltd. All rights reserved.
a b s t r a c t
The paper addresses and explains some of the key questions about the use of data mining
in educational technology classroom research. Two examples of use of data mining tech-
niques, namely, association rules mining and fuzzy representations are presented, from a
study conducted in Europe and another in Australia. Both of these studies examine student
learning, behaviors, and experiences within computer-supported classroom activities. In
the first study, the technique of association rules mining was used to understand better
how learners with different cognitive types interacted with a simulation to solve a prob-
lem. Association rules mining was found to be a useful method for obtaining reliable data
about learners' use of the simulation and their performance with it. The study illustrates
how data mining can be used to advance educational software evaluation practices in the
field of educational technology. In the second study, the technique of fuzzy representations
was employed to inductively explore questionnaire data. The study provides a good
example of how educational technologists can use data mining for guiding and monitoring
school-based technology integration efforts. Based on the outcomes, the implications of
the study are discussed in terms of the need to develop educational data mining tools that
can display results, information, explanations, comments, and recommendations in
meaningful ways to non-expert users in data mining. Lastly, issues related to data privacy
are addressed.
© 2017 Elsevier Ltd. All rights reserved.
1. Introduction
Data mining has long been used in marketing, advertising, health, engineering, and information systems. At its core, data
mining is an inductive, analytic, and exploratory approach, which is concerned with knowledge discovery through identi-
fication of patterns within large sets of data. In the last 10 years, the field of Educational Data Mining (EDM) has emerged as a
distinct area of research concerned with using data mining techniques to answer educational questions, such as, “What are
the difficulties students encounter during a learning activity?”, “What sequences of computer interactions lead to successful
. Box 20537, Department of Education, University of Cyprus, CY-1678, Nicosia, Cyprus.
mailto:[email protected]
http://crossmark.crossref.org/dialog/?doi=10.1016/j.compedu.2017.05.021&domain=pdf
www.sciencedirect.com/science/journal/03601315
www.elsevier.com/locate/compedu
http://dx.doi.org/10.1016/j.compedu.2017.05.021
http://dx.doi.org/10.1016/j.compedu.2017.05.021
http://dx.doi.org/10.1016/j.compedu.2017.05.021
C. Angeli et al. / Computers & Education 113 (2017) 226e242 227
problem-solving performance?”, and “What sequences of actions characterize high performers and low performers in
problem-solving activity?” EDM can also provide new insights into “wicked” educational problems, such as, “What are the
differences in the ways students experience learning,” and “How can learning designs account for variations in students'
learning experiences?”
In particular, EDM is concerned with developing methods for analyzing data from an educational system in order to detect
patterns in large datasets that would otherwise be very difficult or even impossible to analyze due to the vast volume of data
within which they exist (Romero & Ventura, 2013). Consequently, results from data mining can be used for deciding about
how to improve the teaching and learning process as well as how to design or redesign a learning environment (Ingram,1999;
Romero & Ventura, 2007). Data mining techniques have been mostly used within the context of web-based or e-learning
education in order to: (a) suggest activities, resources, learning paths, and tasks for improving learners' performance and
adapting learning experience (Tang & McCalla, 2005); (b) provide feedback to teachers and instructional designers in regards
to learners' difficulties with the content and structure of a course, so that revisions can be made to facilitate students' learning
(Merceron & Yacef, 2010; Zaiane & Luo, 2001); (c) predict learners' performance (Ahmed & Elaraby, 2014); and (d) inform
administrators about the effectiveness of instructional programs, so that better planning and allocation of human and ma-
terial resources can be achieved (Romero & Ventura, 2007).
Based on a number of reviews and meta-analyses published (Baker & Yacef, 2009; Mohamad & Tasir, 2013; Romero &
Ventura, 2007, 2010), the most popular data mining techniques include: (a) clustering (Amershi & Conati, 2009; Beal, Qu,
& Lee, 2006; He, 2013; Perera, Kay, Koprinska, Yacef, & Zaiane, 2009); (b) regression (Buja & Lee, 2001); (c) association
rules mining (Lin, Alvarez, & Ruiz, 2002); and (d) sequential pattern mining (Perera et al., 2009). In clustering, the goal is to
split the data into clusters, such that, there is homogeneity within clusters and heterogeneity between clusters (Baker &
Siemens, 2014). In educational research, clustering procedures have been used to find patterns of effective problem-
solving strategies in exploratory computer-based learning environments (Amershi & Conati, 2009; Angeli & Valanides,
2004; Beal et al., 2006; He, 2013). In regression, the goal is to develop a model that can infer or predict something about a
dataset. In a regression analysis, a variable is identified as the predicted variable and a set of other variables as the predictors
(similar to dependent and independent variables in traditional statistical analyses) (Baker & Siemens, 2014). In association
rules mining, the goal is to extract rules of the form if-then, such that if some set of variable values is found, another variable
will generally have a specific value (Baker & Siemens, 2014). In sequential pattern mining, the aim is to find temporal as-
sociations between events to determine what path of student behaviors leads to a successful group project (Perera et al.,
2009).
Currently, most work on data mining has at its base a computer science perspective rather than an educational
perspective. Within the educational domain, data mining techniques have been mostly used in e-learning or web-based
research, because of the ease of accessing student log data and performing automatic analyses of data. There is, however,
also a need to investigate the uses of EDM in real classrooms in order to understand better students' interactions with
technology as well as the complexities entailed in investigating how students with diverse needs and cognitive characteristics
perform with technology in these settings. The issue then becomes whether EDM can make a contribution to educational
technology classroom research in terms of providing tools and techniques that educational technology researchers can easily
grasp and apply to their own research in order to answer questions that cannot be easily answered by traditional statistical
techniques.
In view of that, in this paper, the authors, within the context of two different studies, describe their efforts in using data
mining procedures in educational technology classroom research, and, identify difficulties in applying data mining tech-
niques and tools in this research context. The first study was carried out in a European country and sought to investigate how
field-dependent and field-independent learners solved a problem using a stand-alone simulation tool. For the purposes of the
first study, the authors used a sequence, association, and link analysis for capturing and analyzing learners' interactions with
the simulation. The analysis provided a detailed and analytic description of the differences in field-dependent and field-
independent learners' problem-solving processes, providing at the same time clear understanding of field-dependent
learners' difficulties to take full advantage of the affordances of the simulation in order to maximize learning benefits. The
study contributes to educational technology research by presenting evidence about the effectiveness of EDM as an approach
for extracting useful process-related knowledge and actual student learning data that can be used for improving the learning
design of educational software and systems (Abdous, He, & Yen, 2012; Romero & Ventura, 2013). In turn, EDM can replace
traditional approaches to software evaluation, which mostly depend on surveys of students' perceptions of the system
(Bayram & Nous, 2004), by providing detailed data about what software features are or are not so successful with learners
that instructional designers can use in order to decide how to go about improving their learning designs. Consequently, data
mining techniques can become extremely useful in terms of providing ideas for implementing personalized learning to meet
students' individual needs (Chen, 2008; Lin, Yeh, Hung, & Chang, 2013). Some preliminary work in this area has been reported
by Hsu (2008) who applied association rules algorithms in the development of a personalized English Learning Recom-
mendation System, as well as by Chen and Duh (2008) who used a fuzzy technique to determine the difficulty parameters of
courseware and decide thereafter the content of courseware for personalized recommendation services.
The second study addresses the use of educational technology in Australian secondary schools. The research considers
variations in student experiences in an integrated learning environment and how this may relate to learning. The aim of the
study was to understand better the range of students' experiences with technology and accordingly to inform teachers' in-
tegrated learning designs. Due to the complexity of the learning environment and the large number of key factors affecting
C. Angeli et al. / Computers & Education 113 (2017) 226e242228
students' experiences in the classroom, association rules mining and fuzzy representations were used to explore relations
among students' questionnaire responses and national assessment outcomes. The results showed significantly different
patterns of key technology integration factors related to literacy and numeracy outcomes. The findings provide guidance for
learning design in relation to how teachers may provide different experiences in technology-integrated learning to support all
learners. The study contributes to educational technology research by providing evidence of EDM as a useful approach for (a)
understanding school-based technology-related change initiatives, (b) determining where to focus classroom resources and
informing choices of technology tools, and (c) developing a deeper understanding of student technology-related experiences
(Abdous et al., 2012).
In the general discussion section of the paper, the authors discuss the contribution of data mining in educational tech-
nology classroom research, within the context of the two studies, while at the same time they also consider obstacles related
to the intrinsic difficulty associated with learning how to use data mining tools and apply EDM techniques to educational
data. Research directions aiming at making data mining tools and techniques more accessible to educational researchers are
discussed. Lastly, data-privacy issues are also addressed.
2. Study 1
2.1. Theoretical framework and research questions
In the first study, the authors used a data mining technique called sequence, association, and link analysis to understand
and best describe how the cognitive style of field dependence-independence (FD-I) affected undergraduate students' ability
to solve a problem using a glass-box simulation (Clariana & Strobel, 2008; Landriscina, 2013). According to Landriscina (2013),
simulations are distinguished into black-box or model-opaque simulations, and, glass-box or model-transparent simulations.
In black-box or model-opaque simulations, learners explore a system's behavior, but the underlying conceptual and
computational model of the simulation remains hidden. Thus, learners can only observe the results of the causal relationships
between the variables (Landriscina, 2013). Glass-box or model-transparent simulations, on the other hand, make the
structure of the model underlying the simulation visible to the learners in the form of a diagram with nodes and connecting
links between them (Landriscina, 2013).
FD-I is a cognitive style directly related to how humans perceive, organize, and process information (Morgan, 1997; Price,
2004; Witkin, Moore, Goodenough, & Cox, 1977). It is distinguished from learning styles, in that learning styles are subjective
accounts of individuals' instructional preferences across specific domains and tasks (Messick, 1987). FD-I was defined by
Witkin et al. (1977) as “the extent to which a person perceives part of a field as discreet from the surrounding field as a whole,
rather than embedded in the field; or the extent to which the organization of the prevailing field determines perception of its
components; or, to put it in everyday terminology, the extent to which the person perceives analytically” (pp. 6e7). Witkin et al.
(1977) conceptualized FD-I as a construct with two discrete modes of perception, such that, at the one extreme end
perception is dominated by the prevailing field and is designated as field dependent (FD), and at the other extreme end,
perception is more or less separate from the surrounding field and is designated as field independent (FI).
Contemporary research studies have examined the effects of learning with glass-box (model-transparent) simulations on
FI and FD learners' performance, and, found that FI learners outperformed FD learners during problem solving with this type
of simulation (Angeli, Valanides, & Kirschner, 2009; Burnett, 2010; Dragon, 2009). However, these investigations have pri-
marily focused on identifying quantitative differences in performance between FD and FI learners without providing detailed
information about FD and FI learners' interactions with the simulation, as well as related difficulties that learners encountered
during the problem-solving process with the simulation. While quantitative investigations are in general useful, they do not
provide enough insight about how to help those learners, such as for example FD learners, who usually encounter problems
during problem solving and need to be supported by the teacher so they can also have successful learning experiences with
technology.
Therefore, given the limitations of the existing body of research on FD and FI learners' problem solving with simulations,
the present study applied sequence, association, and link analyses to assess and compare FD and FI learners' interactions with
a glass-box simulation in order to solve a problem about immigration policy. The research purpose of the study was to identify
sequences of interactions with the simulation that were associated with successful performance and whether they differed
between FD and FI learners. Analytically, the research questions were stated as follows:
1. What sequences of interactions with the simulation lead to successful problem-solving performance?
2. How do the sequences of interactions with the simulation differ between FD and FI learners?
3. What are the learning difficulties that FD learners encounter during the problem-solving process with the simulation?
Evidently, traditional statistical techniques cannot provide the means for answering these questions, and, thus, the issue
becomes whether data mining, and in particular the sequence, association, and link analysis that was employed here, can
answer these questions in informative and useful ways for the educational technology researchers.
C. Angeli et al. / Computers & Education 113 (2017) 226e242 229
2.2. Method
2.2.1. Participants
One hundred and fifteen freshmen from a teacher education department were recruited to participate in the study.
Students were initially screened based on their scores on the Hidden Figures Test (HFT; French, Ekstrom, & Price, 1963). The
HFT was used for identifying students' FD-I. The highest possible score on the HFT is 32 and the lowest zero. In accordance
with other research studies (Angeli & Valanides, 2004; Chen & Macredie, 2002; Daniels & Moore, 2000; Khine,1996), the cut-
off points for this study were set to two levels of FD-I, namely FD and FI. Students who scored 18 or lower on the HFT were
classified as FD learners, while students who scored 19 or higher were classified as FI. Of the 115 students, 45 of them were
found to be FI learners, and the remaining 70 FD. Of the 115 participants, 94 (82%) were females, and 21 (18%) males. The
average age of the participants was 17.86 years (SD ¼ 0.45). All students had basic computing skills, but no prior experience
with problem solving with simulations.
2.2.2. The simulation task
All research participants were asked to interact with a glass-box simulation that was specifically developed for the pur-
poses of this study, in order to solve a problem about immigration policy. The researchers explained to the participants that
nowadays a lot of people move from one country to another in search of a better life for their children and themselves.
Students were given a scenario about people from country A who wanted to move to country B due to a high unemployment
rate in country A. The students had to interact with the simulation in order to test hypotheses, and, decide about whether and
under what conditions country B could accept immigrants from country A.
The underlying model of the glass-box simulation is depicted in Fig. 1. The model shows how an increase in the number of
births in country A will cause an increase in the population of country A. This, in turn, and provided that not enough
employment opportunities are created in the interim to cover the new demands for employment in country A, will eventually
lead to an increase in the unemployment rate of country A. In contrast, an increase in the number of deaths in country A will
eventually cause a decrease in the unemployment rate of country A. In the case of an increase in the unemployment rate of
country A, people from country A will eventually seek employment in another country - country B. A movement of people
from country A to country B will eventually cause an increase in the unemployment rate of country B, if country B does not
create in the meantime enough employment opportunities to cover the increased demand for employment. The model shows
how an increase in the number of businesses in country B will cause a decrease in country's B unemployment rate, while a
Fig. 1. The underlying model about immigration policy of the glass-box simulation.
C. Angeli et al. / Computers & Education 113 (2017) 226e242230
movement of businesses from country B to A will cause a decrease in country's A unemployment rate, but in the long run a
possible increase in country's B unemployment rate. In total, the tool simulated the phenomenon of immigration using five
independent variables, namely number of births in country A, number of births in country B, number of deaths in country A,
number of deaths in country B, and movement of businesses from country B to country A. The students had to change the
values of the independent variables one at a time to observe the effects on the dependent variables in order to decide, and,
propose in writing if and under what conditions country B could possibly accept immigrants from country A.
When the learners run the model, the simulation opens a meter for each dependent and independent variable. As shown
in Fig. 2, each meter displays the initial value of each variable and the range of values it can take. At each run time, the learner
can change the value of one independent variable at a time and observe how the meters of the affected dependent variables
change.
2.2.3. Research instruments
2.2.3.1. Hidden figures test. The Hidden Figures Test (HFT) was administered to determine research participants' field type
(French et al., 1963). The test consists of two parts, and each part contains 16 questions. The time allotted for answering each
part is 12 min. The scores on the HFT range from zero to 32. Basically, each question on the HFT presents five simple geometric
figures and a more complex one. Students are instructed to discover which one of the five simpler figures is embedded in the
more complex one. According to Rittschof (2010), the HFT is the most reliable and widely used test for measuring FD-I. It is
also highly correlated with the Group Embedded Figures Test (r ¼ 0.67 - 0.88), another popular test for determining FD-I
(Witkin, Oltman, Raskin, & Karp, 1971).
2.2.3.2. Assessment rubric. A rubric that was inductively constructed was used to assess the quality of learners' written an-
swers to the immigration problem. The scoring rubric assessed three levels of quality ranging from 1 (poor quality) to 3 (high
quality). The specific criteria for each level are shown in Table 1. Two independent raters evaluated students' answers to the
immigration problem, and Cohen's kappa was used to measure interrater reliability. A satisfactory interrater reliability of
k ¼ 0.87 was computed, while noted discrepancies between the two raters were resolved after discussion.
Fig. 2. Simulation run.
Table 1
Rubric for assessing the quality of learners' answers.
3 - High
a. The learner's answer is based on a correct interpretation of the simulated outcomes.
b. The learner's answer takes into consideration pros and cons of different possible answers.
c. The learner's answer takes into consideration possible long-term effects.
2 - Medium
a. The learner's answer is based on a correct interpretation of the simulated outcomes.
b. The learner's answer takes into consideration pros and cons of different possible answers.
c. The learner's answer does not take into consideration possible long-term effects.
1 - Poor
a. The learner's answer is not based on a correct interpretation of the simulated outcomes.
b. The learner's answer does not take into consideration pros and cons of different possible answers.
c. The learner's answer does not take into consideration possible long-term effects.
C. Angeli et al. / Computers & Education 113 (2017) 226e242 231
2.2.4. Research procedures
Research data were collected in three different sessions. During the first 25-min research session, the researchers
administered the HFT in order to determine learners' field type. In a follow-up 60-min session, the researchers demonstrated
a glass-box simulation, different than the one that was used for collecting research data for this study, and, showed how to use
it in order to solve a problem. The students interacted with the simulation individually in order to explore various problem-
solving scenarios and learn how to control variables. The researchers explicitly explained the differences between dependent
and independent variables, and, demonstrated how changes in the independent variables affected the dependent variables.
During the last 60-min session, the researchers collected the data that were used for the analyses of this study. During the
session, the participants interacted with the glass-box simulation, observed, organized, and interpreted the simulated out-
comes of the system for the purpose of solving the problem about immigration policy.
2.2.5. Data structure and analysis
Students' interactions with the simulation were captured into video files with River Past Screen Recorder, a screen
capturing software. Each video file had an average duration of 50 min and a size of about 4GB. A scheme was used for coding
learners' interactions in a log file, which took the form of a table with three columns including Student_ID, Time, and Action.
Student_ID referred to students' research ID number, Time denoted the start/end time of an event, and Action described what
the interaction entailed in terms of a sequence of computer actions. The total number of entries in this table/log file, which
constituted the data for the data mining analysis, was 4570 entries. Regarding the Action field in the data table, the simulation
afforded five computer actions that the students could employ in order to explore the relationships between all dependent
and independent variables, as depicted in Fig. 1, in order to decide if and under what conditions country B could accept
immigrants from country A. The first action was about displaying all variables and the relationships amongst them, as
represented in the model shown in Fig. 1. The second was about using the test tools in order to run the simulation. The third
was about opening the meter of each variable to change the values of the independent variables while observing at the same
time the effects on the dependent variables. The fourth was about using the play button for running the simulation, and,
lastly, the stop button for stopping the simulation. Thus, the following computer interactions were coded: B for viewing all
variables and the relationships between them; T for accessing the test tools needed for a simulation test; M for opening the
meter of each variable; P for running/playing the simulation; and S for terminating/stopping the simulation. Additionally, the
codes IV1, IV2, IV3, IV4, and IV5 were used for denoting the five independent variables.
A sequence, association, and link analysis (Nisbet, Elder, & Miner, 2009) was used in order to identify unique differences
between the FD and FI learners. Specifically, the sequence, association, and link analysis was used for extracting association
rules in order to determine which simulation actions were closely associated together. The technique was also used for
extracting an immediate subsequent action given a previous one, and for mining patterns of interaction between individuals
of different field types and computer actions. In association rules mining, relationships and patterns are expressed in the form
of an association rule:
If A then (likely) C
Each rule includes an antecedent (A) and a consequent (C). This can be understood as “IF A then C.” Rules may contain single
or multiple antecedents and consequents, such as “IF A and B, then C.” The importance of a rule is determined through critical
measurements: support, confidence, and lift (Tan, Kumar, & Srivastava, 2004). The extent to which the antecedent(s) and
consequent(s) occur simultaneously in the dataset is indicated through support. The extent to which the consequent(s) oc-
cur(s) given the antecedent(s) is indicated through confidence. The correlation between the antecedent(s) and consequent(s)
is indicated through lift. For the two sequence, association, and link analyses that were performed, the minimum support was
set to 0.55 and the confidence level to 0.95.
The authors employed Statistica Data Miner for conducting the sequence, association, and link analyses. While we
experimented with a number of other data mining tools, we ended up using Statistica, because compared to other tools we
found it easier to use in preparing the data for mining, as well as easier to integrate with the R programming environment.
C. Angeli et al. / Computers & Education 113 (2017) 226e242232
Statistica Sequence, Association, and Link Analysis is an implementation of several advanced techniques designed for mining
rules from datasets that are generally described as “market-baskets”. The “market-basket” metaphor assumes that customers
buy products either in a single transaction or in a sequence of transactions. A transaction relates with a subsequent purchase
of a product or products given a previous buy. For example, a purchase of flashlights usually coincides with a purchase of
batteries in the same basket. In education, the “market-basket” metaphor can be applied to situations where individuals
engage in different actions during learning with others or with a computer system. The analysis reveals items in a dataset that
occur together extracting patterns and associations between individuals and actions.
2.3. Results and discussion
The quality of FD learners' answers to the immigration problem was found to be 1.43 (SD ¼ 0.63), while the quality of FI
learners' answers was found to be 2.10 (SD ¼ 0.75). The time that FD and FI learners spent with the simulation was also
measured and no significant differences were found between the two groups …
ORIGINAL PAPER
Public Internet Data Mining Methods in Instructional Design,
Educational Technology, and Online Learning Research
Royce Kimmons1 & George Veletsianos2
Published online: 7 June 2018
# Association for Educational Communications & Technology 2018
Abstract
We describe the benefits and challenges of engaging in public data mining methods and situate our discussion in the context of
studies that we have conducted. Practical, methodological, and scholarly benefits include the ability to access large amounts of
data, randomize data, conduct both quantitative and qualitative analyses, connect educational issues with broader issues of
concern, identify subgroups/subpopulations of interest, and avoid many biases. Technical, methodological, professional, and
ethical issues that arise by engaging in public data mining methods include the need for multifaceted expertise and rigor, focused
research questions and determining meaning, and performative and contextual considerations of public data. As the scientific
complexity facing research in instructional design, educational technology, and online learning is expanding, it is necessary to
better prepare students and scholars in our field to engage with emerging research methodologies.
Keywords Public internet data mining . Innovative methods
Data mining of the public internet has been an emerging
research method for the past two decades as it has been ap-
plied to a variety of fields to help solve persistent problems
like developing webpage recommender systems (Niwa et al.
2006), combating infectious diseases (Brownstein et al.
2008), identifying cybersecurity threats (Maloof 2006), im-
proving network traffic (Wang et al. 2002), and predicting
political orientations (Colleoni et al. 2014), just to name a
few. Previous work has pointed out some of the technical
opportunities and challenges of such methods (Andersen
and Feamster 2006), but public internet data mining has not
yet been widely applied to addressing issues facing the field
of instructional design and technology (IDT), and we do not
fully understand the benefits and challenges of its application
to our field. Furthermore, though some data mining methods
are eagerly being applied in the realms of learning analytics
and data dashboard visualization (Baker and Inventado
2014), we have not as a field begun exploring the potentials
and ramifications of using massive amounts of disorganized,
publicly-available data to address persistent IDT challenges
or determining how we must train new professionals to make
use of the wealth of data available to them via the public
internet. Data mining of the public Internet affords IDT re-
searchers the ability to answer important questions that they
have henceforth been either unable to answer or unable to
explore using non-invasive methods on a large scale. To il-
lustrate, in Table 1 we provide a list of potential questions that
are of interest to IDT that researchers may be able to address
using data mining methods.
Over the last two years, we (the two authors of this paper)
have conducted more than 10 studies using public data mining
methods in IDT. These studies included extracting and ana-
lyzing publicly-available data from Websites (e.g., K-12
websites), social media (e.g., Twitter), and discussion fora
(e.g., YouTube comments). They generated massive datasets
and allowed us to conduct research pertaining to technology
use, social media prevalence, equity, and civility in online
discussions. In this paper, we will describe the benefits and
challenges we encountered while engaging in public data min-
ing and situate our discussion in the context of studies that we
Royce Kimmons and George Veletsianos contributed equally to this work.
* Royce Kimmons
[email protected]
George Veletsianos
[email protected]
1 Brigham Young University, 150J MCKB, BYU, Provo, UT 84602,
USA
2 Royal Roads University, 2005 Sooke Rd, Victoria, BC V9B 5Y2,
Canada
TechTrends (2018) 62:492–500
https://doi.org/10.1007/s11528-018-0307-4
http://crossmark.crossref.org/dialog/?doi=10.1007/s11528-018-0307-4&domain=pdf
http://orcid.org/0000-0001-7744-2315
mailto:[email protected]
have conducted in order to present authentic examples of the
ways that public data mining can be used in our field.
As we have put processes in place to collect and analyze
public social media data, we have reached out to colleagues
and secured funding for graduate students at other universities
to conduct collaborative work with us. To date, we have col-
laborated with 17 scholars on these projects representing 10
universities in the U.S. and Canada, and our collaborators have
included undergraduate, master’s, and doctoral students as well
as tenure-track faculty. These efforts have allowed us to take on
the role of mentors in public data mining methods to our col-
leagues, to expand the horizons of our own research, and to
train young researchers in these emerging methods. By doing
so, we have identified a curricular need facing our field that we
will also discuss here. While the practice of IDT traditionally
involves multidisciplinary collaboration (e.g., instructional de-
signers, subject matter experts, assessment experts, and faculty
may collaborate to create an educational intervention), the sci-
entific complexity facing IDT research and practice is increas-
ingly expanding. For instance, the infusion of technology in all
aspects of education has provided access to a deluge of digital
data that was previously unfathomable (Selwyn 2015), and in-
structional designers may nowadays collaborate with even
more actors, such as data scientists and learning analytics re-
searchers. Thus, it is necessary for researchers in our field to
explore and understand emerging research methodologies. This
paper will conclude by arguing that doctoral preparation pro-
grams in our field should include interdisciplinary methodolog-
ical training for IDT researchers as a core component.
Some Benefits of Public Internet Data Mining
As interest in data mining takes hold in many industries, from
healthcare to e-commerce, education researchers have started
exploring the ways that both large and public datasets can con-
tribute to making sense of issues facing educational practice and
the science of learning. While substantial literature exists on the
use of learning analytics in education (e.g., in Massive Open
Online Learning [MOOC] contexts), much less is written about
the use of public online data. The benefits or opportunities that
mining of public Internet data engenders are numerous. These
opportunities are practical and methodological, as well as schol-
arly. We organize these in the following themes:
& providing large amounts of data and allowing easy
randomization;
& empowering both quantitative and qualitative analyses;
& connecting educational issues with larger public issues;
& enabling identification of subgroups/subpopulations for
further research;
& and avoiding many biases.
Providing Large Amounts of Data
The data generated by contemporary Internet platforms, and
made available to researchers through various means, are un-
precedented. For instance, the data associated with posting
one single tweet includes information about the person post-
ing the tweet (e.g., username, name, biographic information,
location, account creation date, and various statistics associat-
ed with the account holder such as total tweets posted and total
followers), data associated with the actual tweet (e.g., the text
of the tweet, the hashtags included in the text of the tweet, the
time it was posted, the location associated with the device it
was posted from, the application used to post the tweet, and
various metrics associated with it such as number of times this
particular tweet was retweeted), and similar data for any other
accounts interacting with that particular tweet. In other words,
a single tweet is associated with copious data points that IDT
researchers have rarely seen. This data deluge present in
Twitter is typical of online platforms. A similar situation exists
with a variety of platforms that are used for teaching, training,
and learning purposes (e.g., blogs, YouTube, Reddit, public
websites, etc). To illustrate the magnitude of the data
Table 1 A selection of typical IDT research questions that may be answered via data mining methods
Research question Public internet data source
What sorts of IDT skills do employers require? Job ad postings
What challenges do teachers face in integrating technology in K-12 classrooms? Public discussion forums
What kinds of peer-support do online learners provide to one another? Discussion forums found in public online courses
In what ways are particular web-based technologies used in k-12 courses? Blog networks, wiki networks, etc
How do instructional designers describe the field to others? Personal portfolios, discussion forums
What motivates individuals to contribute to informal learning communities? Discussion forums
What sentiments does the public express toward particular educational and technological innovations
(e.g., MOOCs, artificial intelligence, online education, adaptive learning, etc)?
Discussion forums, newspaper comments
What is the relationship between demographic variables (e.g., gender) and achievement in STEM
courses?
Secondary data made available in public repositories
TechTrends (2018) 62:492–500 493
available, in a recent paper we sought to investigate time pat-
terns in social media use (Veletsianos et al. under review) and
were able to identify a sample of academics on Twitter (n =
3996) and retrieve more than 9 million tweets they posted
along with associated metadata, yielding more than 100 mil-
lion raw data points.
Good data enable one to answer the research questions he/
she poses. While abundant data are not synonymous with
good data, large amounts of data provide a number of oppor-
tunities for IDT researchers. Large-scale data allow re-
searchers to examine whether the results generated by
smaller-scale studies (e.g., case studies) hold up to scrutiny,
investigate questions that can only be answered by larger
datasets (e.g., investigations of populations vis-a-vis samples),
and enable investigations of samples drawn at random from
large populations.
Empowering Both Quantitative and Qualitative
Analyses
Though data mining is often associated with analyses involv-
ing quantitative data, mining the public internet enables re-
searchers to collect and analyze both quantitative and qualita-
tive data. This method, therefore, accommodates a diverse
range of research questions, data analysis methods, and ap-
proaches. In other words, as part of the IDT researcher’s meth-
odological toolkit, data mining methods may enable the col-
lection and analyses of different kinds of data in relation to the
research questions being asked. Such versatility is important
because it enables IDT researchers to use data mining methods
across research paradigms, enabling the use of qualitative data
to generate detailed and rich descriptions of phenomena, as
well as the use of quantitative data to draw generalizable con-
clusions. For example, in investigating ways to scaffold stu-
dent learning when interacting with a chatbot, data mining
methods may enable IDT researchers to (a) code student
prompts in order to develop a taxonomy of help-seeking ques-
tions, and (b) compute the frequency with which students ask
different types of questions.
To illustrate, we were interested in examining the ways
higher education institutions used social media for educational
purposes with students and the broader public (Kimmons et al.
2017b; Veletsianos et al. 2017). In order to explore this topic,
we gathered quantitative data (e.g., number of tweets posted)
and qualitative data (e.g., individual tweets and images) asso-
ciated with the Twitter accounts of Canadian and US univer-
sities. We computed new variables using these data (e.g., num-
ber of replies, replies as a proportion of all tweets, number of
tweets that included audiovisual elements) and also conducted
descriptive, inferential, and qualitative analyses on them.
Using this dataset, quantitative analyses enabled us to identify
that higher education institutions in both countries mostly
used Twitter to broadcast information rather than engage in
dialogue. Qualitative analysis of a sample of tweets enabled us
to discover that those broadcasted messages portrayed an
overwhelmingly positive picture of institutional life. In other
words, quantitative analyses enabled us to discover the fre-
quency and type of Twitter use, while qualitative analyses
allowed us to describe what such participation looked like.
Data mining enabled us to develop a multi-layered under-
standing of institutional social media use, highlighting a find-
ing that is core to IDT, namely that technologies are rarely
neutral in their use (e.g., Twitter prompts users to broadcast
messages) and that they can be appropriated to serve different
needs (e.g., Twitter seemed to be used for promotion rather
than educative purposes).
Connecting Educational Issues with Larger Public
Issues
One of the pressing challenges facing our field is in pursuing
an understanding of sociocultural and public issues pertaining
to education, teaching, learning, scholarship, and technology
(Veletsianos and Kimmons 2012). Such issues may involve
access, equity, civility, socioeconomic divides, and
sociotechnical issues (e.g., the impact of social media algo-
rithms on opportunities for informal learning). While some of
the field’s research examines issues of broader concern, by
and large the focus is on pedagogical applications of technol-
ogy, with little attention being paid to the social, cultural, and
political aspects and implications of instructional design and
educational technology use. We need to pay close attention to
these issues because of their societal significance and impli-
cations for practice. What is the public concerned about with
regards to teaching and learning? In what ways can IDT re-
imagine teaching and learning on a massive scale? In what
ways are racism and sexism evident in our designs and edu-
cational offerings, and what does the field need to do in order
to alleviate these problems? We believe that these types of
questions (amongst many others) should be central to the field
for they aim toward developing a more just and fair society.
Public Internet data mining methods may provide
opportunities for researchers to examine societal issues of
broad concern, and enable the field to take a more active
role in societal conversations of interest. For instance, in the
same way that Rowe (2015) examined (in)civility in online
political discussions occurring on the Washington Post
Facebook account, IDT researchers might use data mining
methods to investigate (in)civility on public platforms hosting
educational interactions such as CrashCourse and Physics
Girl on YouTube and develop ways to address this problem.
To illustrate how IDT research can be connected to issues
of broader concern via data mining, consider the research we
reported in Authors (2018). In that study, we sought to connect
the educational uses of YouTube to gender issues. While typ-
ical IDT research might examine the pedagogical
494 TechTrends (2018) 62:492–500
implications, opportunities, promises, drawbacks, and
affordances of video-sharing technologies, we were interested
in the sentiment that individuals faced when they asked to go
online to share their research or to post their course assign-
ments. We were also interested in examining whether different
people faced different sentiment. By examining the sentiment
expressed in response to TEDx and TED-Ed talks posted on
YouTube we found that videos of male presenters showed
greater neutrality, while videos of female presenters saw sig-
nificantly greater polarity in replies. Such findings have sig-
nificant implications for our field, because they question the
oft-repeated optimistic narratives of contemporary technolo-
gies as necessarily positive for all people.
Enabling Identification of Subpopulations for Further
Research
Due to the massive amounts of data available online, public
Internet data mining methods enable researchers to identify
particular subpopulations for further inquiry. Granular ap-
proaches to identifying participants are important, because they
enable researchers to focus on typical, unique, or otherwise
significant subpopulations of interest. For instance, consider-
ing Twitter as a platform of interest, data mining methods
enable researchers to identify and study IDT issues pertaining
to professors who tweet frequently (e.g., Kimmons and
Veletsianos 2016), educators who engage with a particular top-
ic or affinity space (e.g., Paskevicius et al. 2018; Veletsianos
2017b), community members who comment on educational
content (e.g., Veletsianos et al. in press), doctoral students
who have a large number of followers, teachers who reside in
a particular geographic area, faculty members who mention
their teaching evaluations, undergraduate engineering students
who tweet about positive/negative learning experiences, or
IDT faculty who attend both IDT and Learning Sciences con-
ferences. Further, the identification of specific subpopulations
enables comparisons between groups. For instance, one could
examine whether there are differences between science stu-
dents’ perceptions of positive learning experiences and human-
ities students’ perceptions of said experiences.
In one of our research studies, we sought to understand
how the content MOOC participants post on social media
varies with the role they espouse (Veletsianos 2017a). After
identifying a MOOC provider that included hashtags with
every course offering, we examined what messages were
posted to the course hashtags and how those varied by user
role. Following traditional content analysis methods and cat-
egorization according to roles, we identified variations in the
messages posted by different groups of users. For instance, we
found that institutions and the MOOC provider posted more
promotional messages than faculty and learners, while
MOOC-dedicated accounts and instructors posted more in-
structional messages. Such results highlight the need for
looking deeper into participant subpopulations to identify
and examine the differential practices that subpopulations
may employ, especially in the context of open-ended and flex-
ible learning environments.
Avoiding Many Biases
It is widely recognized and acknowledged that conscious and
unconscious biases have significant impacts in research out-
comes. To mention a few, such biases might include
Hawthorne effects (e.g., a teacher engages in behaviors per-
ceived to be desired by a researcher observing their instruc-
tion), self-reporting biases (e.g., a student provides biased self-
assessed measures of the time they spent studying for an ex-
am), and self-selection biases (e.g., faculty in support of open
access publishing in IDT self-select to participate in a study
examining open access publishing in the field). Such biases
adversely affect our understanding of issues related to IDT,
and, even though researchers are trained to recognize and
account for them, we are not always able to control for them.
Public Internet data mining approaches avoid many such
biases. For instance, researchers are able to unobtrusively ob-
serve behavior in situ, mitigating the potential for Hawthorne
effects, and self-reporting and self-selection biases. As an ex-
ample, our investigation of the types of messages posted by IDT
departments on social media sites (Romero-Hall et al. 2018),
relied on identifying and categorizing the actual messages al-
ready posted by IDT departments online. Thus, IDT department
behavior was not impacted by virtue of the study being con-
ducted, and self-reporting and self-selection biases were
avoided because all available actual messages were collected
and analyzed rather than depending on analyzing IDT depart-
ments’ perceptions about those messages. It is important to note,
however, that it is impossible to account for all potential biases.
For instance, in the aforementioned study results are based on
the sample of IDT departments identified, and the methods used
to identify the specific departments to include in the study may
have led to some departments being included/excluded.
Some Challenges of Public Internet Data
Mining
Despite these benefits, public internet data mining as a re-
search method presents a variety of noteworthy challenges.
These challenges revolve around technical, methodological,
professional, and ethical issues that arise from using massive
amounts of public observation data from people and organi-
zations. We have organized these challenges into the four
following themes:
& multifaceted expertise and rigor requirements;
& focused questions and determining meaning;
TechTrends (2018) 62:492–500 495
& performative and contextual considerations of public data;
& and emergent ethical dilemmas.
Multifaceted Expertise and Rigor Requirements
The first challenge and largest barrier to entry for most edu-
cation researchers who might have an interest in public inter-
net data mining is that collecting, cleaning, organizing, and
analyzing these data at any scale relies upon various technical
skills that are interdisciplinary (at best) or not taught at all in
most education research programs. This is in part due to the
relative newness and ever-evolving nature of the internet (e.g.,
the emergence of APIs) but is also due to the siloed and spe-
cializing nature of the academy, which requires education re-
searchers to utilize increasingly specialized methods of inqui-
ry in order for their work to be considered valid. For instance,
researchers who have already devoted years to becoming ex-
pert at phenomenological inquiry or structural equation
modeling might understandably be slow to venture into a
new realm of inquiry that might require them to learn equally
specialized technical methods such as website scripting, API
querying, tokenization, and so forth. In the reverse situation,
however, web developers, data scientists, and internet market-
ing professionals might have a variety of skills necessary to do
public internet data mining, but they will equally lack the
content area expertise necessary to ask meaningful questions
of the data and will make various assumptions about educa-
tional phenomena, institutions, and stakeholders that are con-
troversial, unwarranted, or just wrong. Thus, especially in the
case of small-budget projects (such as theses and disserta-
tions), it becomes very difficult for a single researcher or even
a small group of researchers to have all of the expertise nec-
essary to do this kind of work in a way that will be viewed
rigorously by education, web development, and data science
communities alike.
To illustrate some of the expertise required, we will briefly
explain some of the data collection steps that we undertook in a
recent study of U.S. university Twitter accounts (see Kimmons
et al. 2017a, 2017b for a complete explanation of all steps
undertaken). After identifying two pre-existing lists of univer-
sity websites, we used keyword identifiers and manual coding
to merge the lists into a relational database to match Carnegie
classifications with university website addresses. We then
wrote a series of scripts that systematically opened and parsed
the contents of all the university website homepages, searching
for embedded Twitter feeds, links, or keyword references to an
institutional Twitter account (e.g., BFollow us
@OurUniversity^). The script stored all referenced accounts
in the relational database with a unique university identifier.
Another script we wrote queried the Twitter RESTAPI, retriev-
ing the Twitter user objects for all university accounts and
storing them in the relational database. Next, we read through
all account information (e.g., screen name, location, descrip-
tion) and manually coded accounts as either the primary insti-
tutional account or other (e.g., athletics department, registrar).
This resulted in a maximum of one primary institutional
Twitter account for each university (n = 2411), and we exclud-
ed other accounts from further analysis. We then wrote another
set of scripts to again query the Twitter REST API for all
available Twitter activity for each account and stored returned
tweet objects in the relational database (n = 5.7 million tweets).
Following these data collection steps, we developed scripts to
clean the data, developed scripts to identify multimedia in
tweets, used an open-source sentiment analyzer, operationaled
items of theoretical interest, identified representative samples,
and conducted descriptive, inferential, and content analyses.
As this highly abridged narrative of some of the steps taken
suggests, this one study required many technical steps to com-
plete that required web scripting, quantitative analysis, quali-
tative coding, SQL querying, API querying, JSON parsing,
keyword searching, database management, image analysis,
sentiment analysis, and so forth. Furthermore, each study that
is undergone in this way may have many unique elements to it
that prevents the development of a one-size-fits-all approach
to data collection and analysis. These challenges may be alle-
viated most readily by building functional teams of re-
searchers (e.g., a web programmer, a quantitative methodolo-
gist, and a qualitative methodologist), but they also introduce
challenges of getting the work published, because just as it is
highly infeasible for one researcher to have all of the expertise
necessary to conduct a study like this, it is equally infeasible
that a single reviewer or editor can meaningfully evaluate a
completed study’s significance and rigor.
This last point is important for any researcher who is ex-
pected to publish their work in certain types of venues, be-
cause all journals have a niche audience and rely upon re-
viewers that have a unique set of beliefs, attitudes, and skills.
When submitting studies like the one described above to the
journals we are most interested in publishing in, we have
found that reviewers and editors typically come at the study
either from an education perspective (and thereby want to see
rich, meaningful results in terms of students’ and educators’
lives) or from a computer science or methodological perspec-
tive (and thereby want to see conformity to expected norms of
data collection and classification as well as methodological
insights). This can require the researcher to essentially serve
two masters wherein one wants more qualitative examples and
less technical jargon while the other wants the opposite and is
exacerbated by word limit requirements that essentially re-
quire the researcher to choose one over the other. We have
found that this issue must be navigated on a study-by-study
basis wherein the researchers must iteratively work with the
editor and reviewers to determine which elements of the study
should be emphasized and which elements can be effectively
summarized, placed in an online supplement, or ignored.
496 TechTrends (2018) 62:492–500
Focused Questions and Determining Meaning
Second, when working with a pre-existing, massive dataset
like the internet, as researchers it is sometimes difficult to
navigate the relationship between our research questions and
the data. The traditional social science research approach, for
instance, is for the research question to come first and for it to
guide the collection and analysis of our data. However, with a
pre-existing dataset this approach often feels inappropriate,
because the researchers are simultaneously constrained and
empowered by the parameters of the data, which may not
allow them to answer questions that they are interested in
but may also empower them to answer new questions that they
did not know were possible to answer. It has been our experi-
ence that often when embarking on these studies our initial
questions become reshaped or somewhat refined as we im-
merse ourselves in the data and contemplate their possibilities,
but at the same time this often leads to scope creep, wherein
we quickly try to tackle too much because we feel that the data
are so rich, and theoretical drift, wherein we move away from
our theoretically-grounded emphasis to focus on disconnect-
ed, emergent issues that we thought were novel and interest-
ing. Both scope creep and theoretical drift are problematic for
a variety of reasons not least of which is that they lead to
studies that overreach or that can delve into areas far outside
the researcher’s realm of expertise, and discerning audiences
are quick to point this out.
This situation has led us to enter these types of studies with
focused research questions at the outset and to be much more
careful in safeguarding against drastic changes late into the
research process. Though we feel that there should always
be some flexibility to refocus research questions in light of
emergent data issues, those embarking on studies like these
should never approach a massive dataset with a Bwe’ll see
what the data can tell us^ attitude, because the data are often
so rich that they can become more of a distraction than a tool
of inquiry.
A related issue is how we think about significance and
meaning and how our qualitative or quantitative traditions
might prepare us to approach massive pre-existing data in
inappropriate ways. For instance, in a traditional education
research study that employs a quasi-experimental design, a
researcher might study as …
CATEGORIES
Economics
Nursing
Applied Sciences
Psychology
Science
Management
Computer Science
Human Resource Management
Accounting
Information Systems
English
Anatomy
Operations Management
Sociology
Literature
Education
Business & Finance
Marketing
Engineering
Statistics
Biology
Political Science
Reading
History
Financial markets
Philosophy
Mathematics
Law
Criminal
Architecture and Design
Government
Social Science
World history
Chemistry
Humanities
Business Finance
Writing
Programming
Telecommunications Engineering
Geography
Physics
Spanish
ach
e. Embedded Entrepreneurship
f. Three Social Entrepreneurship Models
g. Social-Founder Identity
h. Micros-enterprise Development
Outcomes
Subset 2. Indigenous Entrepreneurship Approaches (Outside of Canada)
a. Indigenous Australian Entrepreneurs Exami
Calculus
(people influence of
others) processes that you perceived occurs in this specific Institution Select one of the forms of stratification highlighted (focus on inter the intersectionalities
of these three) to reflect and analyze the potential ways these (
American history
Pharmacology
Ancient history
. Also
Numerical analysis
Environmental science
Electrical Engineering
Precalculus
Physiology
Civil Engineering
Electronic Engineering
ness Horizons
Algebra
Geology
Physical chemistry
nt
When considering both O
lassrooms
Civil
Probability
ions
Identify a specific consumer product that you or your family have used for quite some time. This might be a branded smartphone (if you have used several versions over the years)
or the court to consider in its deliberations. Locard’s exchange principle argues that during the commission of a crime
Chemical Engineering
Ecology
aragraphs (meaning 25 sentences or more). Your assignment may be more than 5 paragraphs but not less.
INSTRUCTIONS:
To access the FNU Online Library for journals and articles you can go the FNU library link here:
https://www.fnu.edu/library/
In order to
n that draws upon the theoretical reading to explain and contextualize the design choices. Be sure to directly quote or paraphrase the reading
ce to the vaccine. Your campaign must educate and inform the audience on the benefits but also create for safe and open dialogue. A key metric of your campaign will be the direct increase in numbers.
Key outcomes: The approach that you take must be clear
Mechanical Engineering
Organic chemistry
Geometry
nment
Topic
You will need to pick one topic for your project (5 pts)
Literature search
You will need to perform a literature search for your topic
Geophysics
you been involved with a company doing a redesign of business processes
Communication on Customer Relations. Discuss how two-way communication on social media channels impacts businesses both positively and negatively. Provide any personal examples from your experience
od pressure and hypertension via a community-wide intervention that targets the problem across the lifespan (i.e. includes all ages).
Develop a community-wide intervention to reduce elevated blood pressure and hypertension in the State of Alabama that in
in body of the report
Conclusions
References (8 References Minimum)
*** Words count = 2000 words.
*** In-Text Citations and References using Harvard style.
*** In Task section I’ve chose (Economic issues in overseas contracting)"
Electromagnetism
w or quality improvement; it was just all part of good nursing care. The goal for quality improvement is to monitor patient outcomes using statistics for comparison to standards of care for different diseases
e a 1 to 2 slide Microsoft PowerPoint presentation on the different models of case management. Include speaker notes... .....Describe three different models of case management.
visual representations of information. They can include numbers
SSAY
ame workbook for all 3 milestones. You do not need to download a new copy for Milestones 2 or 3. When you submit Milestone 3
pages):
Provide a description of an existing intervention in Canada
making the appropriate buying decisions in an ethical and professional manner.
Topic: Purchasing and Technology
You read about blockchain ledger technology. Now do some additional research out on the Internet and share your URL with the rest of the class
be aware of which features their competitors are opting to include so the product development teams can design similar or enhanced features to attract more of the market. The more unique
low (The Top Health Industry Trends to Watch in 2015) to assist you with this discussion.
https://youtu.be/fRym_jyuBc0
Next year the $2.8 trillion U.S. healthcare industry will finally begin to look and feel more like the rest of the business wo
evidence-based primary care curriculum. Throughout your nurse practitioner program
Vignette
Understanding Gender Fluidity
Providing Inclusive Quality Care
Affirming Clinical Encounters
Conclusion
References
Nurse Practitioner Knowledge
Mechanics
and word limit is unit as a guide only.
The assessment may be re-attempted on two further occasions (maximum three attempts in total). All assessments must be resubmitted 3 days within receiving your unsatisfactory grade. You must clearly indicate “Re-su
Trigonometry
Article writing
Other
5. June 29
After the components sending to the manufacturing house
1. In 1972 the Furman v. Georgia case resulted in a decision that would put action into motion. Furman was originally sentenced to death because of a murder he committed in Georgia but the court debated whether or not this was a violation of his 8th amend
One of the first conflicts that would need to be investigated would be whether the human service professional followed the responsibility to client ethical standard. While developing a relationship with client it is important to clarify that if danger or
Ethical behavior is a critical topic in the workplace because the impact of it can make or break a business
No matter which type of health care organization
With a direct sale
During the pandemic
Computers are being used to monitor the spread of outbreaks in different areas of the world and with this record
3. Furman v. Georgia is a U.S Supreme Court case that resolves around the Eighth Amendments ban on cruel and unsual punishment in death penalty cases. The Furman v. Georgia case was based on Furman being convicted of murder in Georgia. Furman was caught i
One major ethical conflict that may arise in my investigation is the Responsibility to Client in both Standard 3 and Standard 4 of the Ethical Standards for Human Service Professionals (2015). Making sure we do not disclose information without consent ev
4. Identify two examples of real world problems that you have observed in your personal
Summary & Evaluation: Reference & 188. Academic Search Ultimate
Ethics
We can mention at least one example of how the violation of ethical standards can be prevented. Many organizations promote ethical self-regulation by creating moral codes to help direct their business activities
*DDB is used for the first three years
For example
The inbound logistics for William Instrument refer to purchase components from various electronic firms. During the purchase process William need to consider the quality and price of the components. In this case
4. A U.S. Supreme Court case known as Furman v. Georgia (1972) is a landmark case that involved Eighth Amendment’s ban of unusual and cruel punishment in death penalty cases (Furman v. Georgia (1972)
With covid coming into place
In my opinion
with
Not necessarily all home buyers are the same! When you choose to work with we buy ugly houses Baltimore & nationwide USA
The ability to view ourselves from an unbiased perspective allows us to critically assess our personal strengths and weaknesses. This is an important step in the process of finding the right resources for our personal learning style. Ego and pride can be
· By Day 1 of this week
While you must form your answers to the questions below from our assigned reading material
CliftonLarsonAllen LLP (2013)
5 The family dynamic is awkward at first since the most outgoing and straight forward person in the family in Linda
Urien
The most important benefit of my statistical analysis would be the accuracy with which I interpret the data. The greatest obstacle
From a similar but larger point of view
4 In order to get the entire family to come back for another session I would suggest coming in on a day the restaurant is not open
When seeking to identify a patient’s health condition
After viewing the you tube videos on prayer
Your paper must be at least two pages in length (not counting the title and reference pages)
The word assimilate is negative to me. I believe everyone should learn about a country that they are going to live in. It doesnt mean that they have to believe that everything in America is better than where they came from. It means that they care enough
Data collection
Single Subject Chris is a social worker in a geriatric case management program located in a midsize Northeastern town. She has an MSW and is part of a team of case managers that likes to continuously improve on its practice. The team is currently using an
I would start off with Linda on repeating her options for the child and going over what she is feeling with each option. I would want to find out what she is afraid of. I would avoid asking her any “why” questions because I want her to be in the here an
Summarize the advantages and disadvantages of using an Internet site as means of collecting data for psychological research (Comp 2.1) 25.0\% Summarization of the advantages and disadvantages of using an Internet site as means of collecting data for psych
Identify the type of research used in a chosen study
Compose a 1
Optics
effect relationship becomes more difficult—as the researcher cannot enact total control of another person even in an experimental environment. Social workers serve clients in highly complex real-world environments. Clients often implement recommended inte
I think knowing more about you will allow you to be able to choose the right resources
Be 4 pages in length
soft MB-920 dumps review and documentation and high-quality listing pdf MB-920 braindumps also recommended and approved by Microsoft experts. The practical test
g
One thing you will need to do in college is learn how to find and use references. References support your ideas. College-level work must be supported by research. You are expected to do that for this paper. You will research
Elaborate on any potential confounds or ethical concerns while participating in the psychological study 20.0\% Elaboration on any potential confounds or ethical concerns while participating in the psychological study is missing. Elaboration on any potenti
3 The first thing I would do in the family’s first session is develop a genogram of the family to get an idea of all the individuals who play a major role in Linda’s life. After establishing where each member is in relation to the family
A Health in All Policies approach
Note: The requirements outlined below correspond to the grading criteria in the scoring guide. At a minimum
Chen
Read Connecting Communities and Complexity: A Case Study in Creating the Conditions for Transformational Change
Read Reflections on Cultural Humility
Read A Basic Guide to ABCD Community Organizing
Use the bolded black section and sub-section titles below to organize your paper. For each section
Losinski forwarded the article on a priority basis to Mary Scott
Losinksi wanted details on use of the ED at CGH. He asked the administrative resident