advance statistical learning - Statistics
need to solve following set of problems
An Introduction to Statistical Learning with Applications in R, Second Edition
which is available in the below file
Chapter 2: #1 all parts, #6, and #8 all parts
Chapter 3: #5, #6, #9 (a), (b), (c), (e), (f), #10 (a)-(g), #11 (a), (b), (c), (f), #13 all parts, #15 all parts
For the applied problems, you need to provide R codes .
Gareth James • Daniela Witten •
Trevor Hastie • Robert Tibshirani
An Introduction to Statistical
Learning
with Applications in R
Second Edition
123
Daniela Witten
First Printing: August 4, 2021
To our parents:
Alison and Michael James
Chiara Nappi and Edward Witten
Valerie and Patrick Hastie
Vera and Sami Tibshirani
and to our families:
Michael, Daniel, and Catherine
Tessa, Theo, Otto, and Ari
Samantha, Timothy, and Lynda
Charlie, Ryan, Julie, and Cheryl
Preface
Statistical learning refers to a set of tools for making sense of complex
datasets. In recent years, we have seen a staggering increase in the scale and
scope of data collection across virtually all areas of science and industry.
As a result, statistical learning has become a critical toolkit for anyone who
wishes to understand data — and as more and more of today’s jobs involve
data, this means that statistical learning is fast becoming a critical toolkit
for everyone.
One of the first books on statistical learning — The Elements of Statisti-
cal Learning (ESL, by Hastie, Tibshirani, and Friedman) — was published
in 2001, with a second edition in 2009. ESL has become a popular text not
only in statistics but also in related fields. One of the reasons for ESL’s
popularity is its relatively accessible style. But ESL is best-suited for indi-
viduals with advanced training in the mathematical sciences.
An Introduction to Statistical Learning (ISL) arose from the clear need
for a broader and less technical treatment of the key topics in statistical
learning. The intention behind ISL is to concentrate more on the applica-
tions of the methods and less on the mathematical details. Beginning with
Chapter 2, each chapter in ISL contains a lab illustrating how to implement
the statistical learning methods seen in that chapter using the popular sta-
tistical software package R. These labs provide the reader with valuable
hands-on experience.
ISL is appropriate for advanced undergraduates or master’s students in
Statistics or related quantitative fields, or for individuals in other disciplines
who wish to use statistical learning tools to analyze their data. It can be
used as a textbook for a course spanning two semesters.
vii
The first edition of ISL covered a number of important topics, including
sparse methods for classification and regression, decision trees, boosting,
support vector machines, and clustering. Since it was published in 2013, it
has become a mainstay of undergraduate and graduate classrooms across
the United States and worldwide, as well as a key reference book for data
scientists.
In this second edition of ISL, we have greatly expanded the set of topics
covered. In particular, the second edition includes new chapters on deep
learning (Chapter 10), survival analysis (Chapter 11), and multiple testing
(Chapter 13). We have also substantially expanded some chapters that were
part of the first edition: among other updates, we now include treatments
of naive Bayes and generalized linear models in Chapter 4, Bayesian addi-
tive regression trees in Chapter 8, and matrix completion in Chapter 12.
Furthermore, we have updated the R code throughout the labs to ensure
that the results that they produce agree with recent R releases.
We are grateful to these readers for providing valuable comments on the
first edition of this book: Pallavi Basu, Alexandra Chouldechova, Patrick
Danaher, Will Fithian, Luella Fu, Sam Gross, Max Grazier G’Sell, Court-
ney Paulson, Xinghao Qiao, Elisa Sheng, Noah Simon, Kean Ming Tan,
Xin Lu Tan. We thank these readers for helpful input on the second edi-
tion of this book: Alan Agresti, Iain Carmichael, Yiqun Chen, Erin Craig,
Daisy Ding, Lucy Gao, Ismael Lemhadri, Bryan Martin, Anna Neufeld, Ge-
off Tims, Carsten Voelkmann, Steve Yadlowsky, and James Zou. We also
thank Anna Neufeld for her assistance in reformatting the R code through-
out this book. We are immensely grateful to Balasubramanian “Naras”
Narasimhan for his assistance on both editions of this textbook.
It has been an honor and a privilege for us to see the considerable impact
that the first edition of ISL has had on the way in which statistical learning
is practiced, both in and out of the academic setting. We hope that this new
edition will continue to give today’s and tomorrow’s applied statisticians
and data scientists the tools they need for success in a data-driven world.
It’s tough to make predictions, especially about the future.
-Yogi Berra
viii Preface
Contents
Preface vii
1 Introduction 1
2 Statistical Learning 15
2.1 What Is Statistical Learning? . . . . . . . . . . . . . . . . . 15
2.1.1 Why Estimate f? . . . . . . . . . . . . . . . . . . . 17
2.1.2 How Do We Estimate f? . . . . . . . . . . . . . . . 21
2.1.3 The Trade-Off Between Prediction Accuracy
and Model Interpretability . . . . . . . . . . . . . . 24
2.1.4 Supervised Versus Unsupervised Learning . . . . . 26
2.1.5 Regression Versus Classification Problems . . . . . 28
2.2 Assessing Model Accuracy . . . . . . . . . . . . . . . . . . 29
2.2.1 Measuring the Quality of Fit . . . . . . . . . . . . 29
2.2.2 The Bias-Variance Trade-Off . . . . . . . . . . . . . 33
2.2.3 The Classification Setting . . . . . . . . . . . . . . 37
2.3 Lab: Introduction to R . . . . . . . . . . . . . . . . . . . . 42
2.3.1 Basic Commands . . . . . . . . . . . . . . . . . . . 43
2.3.2 Graphics . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.3 Indexing Data . . . . . . . . . . . . . . . . . . . . . 47
2.3.4 Loading Data . . . . . . . . . . . . . . . . . . . . . 48
2.3.5 Additional Graphical and Numerical Summaries . . 50
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Linear Regression 59
3.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . 60
3.1.1 Estimating the Coefficients . . . . . . . . . . . . . 61
3.1.2 Assessing the Accuracy of the Coefficient
Estimates . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.3 Assessing the Accuracy of the Model . . . . . . . . 68
3.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . 71
3.2.1 Estimating the Regression Coefficients . . . . . . . 72
3.2.2 Some Important Questions . . . . . . . . . . . . . . 75
3.3 Other Considerations in the Regression Model . . . . . . . 83
ix
3.3.1 Qualitative Predictors . . . . . . . . . . . . . . . . 83
3.3.2 Extensions of the Linear Model . . . . . . . . . . . 87
3.3.3 Potential Problems . . . . . . . . . . . . . . . . . . 92
3.4 The Marketing Plan . . . . . . . . . . . . . . . . . . . . . . 103
3.5 Comparison of Linear Regression with K-Nearest
Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.6 Lab: Linear Regression . . . . . . . . . . . . . . . . . . . . 110
3.6.1 Libraries . . . . . . . . . . . . . . . . . . . . . . . . 110
3.6.2 Simple Linear Regression . . . . . . . . . . . . . . . 111
3.6.3 Multiple Linear Regression . . . . . . . . . . . . . . 114
3.6.4 Interaction Terms . . . . . . . . . . . . . . . . . . . 116
3.6.5 Non-linear Transformations of the Predictors . . . 116
3.6.6 Qualitative Predictors . . . . . . . . . . . . . . . . 119
3.6.7 Writing Functions . . . . . . . . . . . . . . . . . . . 120
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4 Classification 129
4.1 An Overview of Classification . . . . . . . . . . . . . . . . . 130
4.2 Why Not Linear Regression? . . . . . . . . . . . . . . . . . 131
4.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . 133
4.3.1 The Logistic Model . . . . . . . . . . . . . . . . . . 133
4.3.2 Estimating the Regression Coefficients . . . . . . . 135
4.3.3 Making Predictions . . . . . . . . . . . . . . . . . . 136
4.3.4 Multiple Logistic Regression . . . . . . . . . . . . . 137
4.3.5 Multinomial Logistic Regression . . . . . . . . . . . 140
4.4 Generative Models for Classification . . . . . . . . . . . . . 141
4.4.1 Linear Discriminant Analysis for p = 1 . . . . . . . 142
4.4.2 Linear Discriminant Analysis for p >1 . . . . . . . 145
4.4.3 Quadratic Discriminant Analysis . . . . . . . . . . 152
4.4.4 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . 153
4.5 A Comparison of Classification Methods . . . . . . . . . . 158
4.5.1 An Analytical Comparison . . . . . . . . . . . . . . 158
4.5.2 An Empirical Comparison . . . . . . . . . . . . . . 161
4.6 Generalized Linear Models . . . . . . . . . . . . . . . . . . 164
4.6.1 Linear Regression on the Bikeshare Data . . . . . . 164
4.6.2 Poisson Regression on the Bikeshare Data . . . . . 167
4.6.3 Generalized Linear Models in Greater Generality . 170
4.7 Lab: Classification Methods . . . . . . . . . . . . . . . . . . 171
4.7.1 The Stock Market Data . . . . . . . . . . . . . . . 171
4.7.2 Logistic Regression . . . . . . . . . . . . . . . . . . 172
4.7.3 Linear Discriminant Analysis . . . . . . . . . . . . 177
4.7.4 Quadratic Discriminant Analysis . . . . . . . . . . 179
4.7.5 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . 180
4.7.6 K-Nearest Neighbors . . . . . . . . . . . . . . . . . 181
4.7.7 Poisson Regression . . . . . . . . . . . . . . . . . . 185
x CONTENTS
CONTENTS xi
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5 Resampling Methods 197
5.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . 198
5.1.1 The Validation Set Approach . . . . . . . . . . . . 198
5.1.2 Leave-One-Out Cross-Validation . . . . . . . . . . 200
5.1.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . 203
5.1.4 Bias-Variance Trade-Off for k-Fold
Cross-Validation . . . . . . . . . . . . . . . . . . . 205
5.1.5 Cross-Validation on Classification Problems . . . . 206
5.2 The Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.3 Lab: Cross-Validation and the Bootstrap . . . . . . . . . . 212
5.3.1 The Validation Set Approach . . . . . . . . . . . . 213
5.3.2 Leave-One-Out Cross-Validation . . . . . . . . . . 214
5.3.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . 215
5.3.4 The Bootstrap . . . . . . . . . . . . . . . . . . . . 216
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6 Linear Model Selection and Regularization 225
6.1 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . 227
6.1.1 Best Subset Selection . . . . . . . . . . . . . . . . . 227
6.1.2 Stepwise Selection . . . . . . . . . . . . . . . . . . 229
6.1.3 Choosing the Optimal Model . . . . . . . . . . . . 232
6.2 Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . 237
6.2.1 Ridge Regression . . . . . . . . . . . . . . . . . . . 237
6.2.2 The Lasso . . . . . . . . . . . . . . . . . . . . . . . 241
6.2.3 Selecting the Tuning Parameter . . . . . . . . . . . 250
6.3 Dimension Reduction Methods . . . . . . . . . . . . . . . . 251
6.3.1 Principal Components Regression . . . . . . . . . . 252
6.3.2 Partial Least Squares . . . . . . . . . . . . . . . . . 259
6.4 Considerations in High Dimensions . . . . . . . . . . . . . 261
6.4.1 High-Dimensional Data . . . . . . . . . . . . . . . . 261
6.4.2 What Goes Wrong in High Dimensions? . . . . . . 262
6.4.3 Regression in High Dimensions . . . . . . . . . . . 264
6.4.4 Interpreting Results in High Dimensions . . . . . . 266
6.5 Lab: Linear Models and Regularization Methods . . . . . . 267
6.5.1 Subset Selection Methods . . . . . . . . . . . . . . 267
6.5.2 Ridge Regression and the Lasso . . . . . . . . . . . 274
6.5.3 PCR and PLS Regression . . . . . . . . . . . . . . 279
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
7 Moving Beyond Linearity 289
7.1 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . 290
7.2 Step Functions . . . . . . . . . . . . . . . . . . . . . . . . . 292
7.3 Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . 294
7.4 Regression Splines . . . . . . . . . . . . . . . . . . . . . . . 295
7.4.1 Piecewise Polynomials . . . . . . . . . . . . . . . . 295
7.4.2 Constraints and Splines . . . . . . . . . . . . . . . 295
7.4.3 The Spline Basis Representation . . . . . . . . . . 297
7.4.4 Choosing the Number and Locations
of the Knots . . . . . . . . . . . . . . . . . . . . . . 298
7.4.5 Comparison to Polynomial Regression . . . . . . . 300
7.5 Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . . 301
7.5.1 An Overview of Smoothing Splines . . . . . . . . . 301
7.5.2 Choosing the Smoothing Parameter λ . . . . . . . 302
7.6 Local Regression . . . . . . . . . . . . . . . . . . . . . . . . 304
7.7 Generalized Additive Models . . . . . . . . . . . . . . . . . 306
7.7.1 GAMs for Regression Problems . . . . . . . . . . . 307
7.7.2 GAMs for Classification Problems . . . . . . . . . . 310
7.8 Lab: Non-linear Modeling . . . . . . . . . . . . . . . . . . . 311
7.8.1 Polynomial Regression and Step Functions . . . . . 312
7.8.2 Splines . . . . . . . . . . . . . . . . . . . . . . . . . 317
7.8.3 GAMs . . . . . . . . . . . . . . . . . . . . . . . . . 318
7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
8 Tree-Based Methods 327
8.1 The Basics of Decision Trees . . . . . . . . . . . . . . . . . 327
8.1.1 Regression Trees . . . . . . . . . . . . . . . . . . . 328
8.1.2 Classification Trees . . . . . . . . . . . . . . . . . . 335
8.1.3 Trees Versus Linear Models . . . . . . . . . . . . . 338
8.1.4 Advantages and Disadvantages of Trees . . . . . . 339
8.2 Bagging, Random Forests, Boosting, and Bayesian Additive
Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . 340
8.2.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . 340
8.2.2 Random Forests . . . . . . . . . . . . . . . . . . . . 343
8.2.3 Boosting . . . . . . . . . . . . . . . . . . . . . . . . 345
8.2.4 Bayesian Additive Regression Trees . . . . . . . . . 348
8.2.5 Summary of Tree Ensemble Methods . . . . . . . . 351
8.3 Lab: Decision Trees . . . . . . . . . . . . . . . . . . . . . . 353
8.3.1 Fitting Classification Trees . . . . . . . . . . . . . . 353
8.3.2 Fitting Regression Trees . . . . . . . . . . . . . . . 356
8.3.3 Bagging and Random Forests . . . . . . . . . . . . 357
8.3.4 Boosting . . . . . . . . . . . . . . . . . . . . . . . . 359
8.3.5 Bayesian Additive Regression Trees . . . . . . . . . 360
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
9 Support Vector Machines 367
9.1 Maximal Margin Classifier . . . . . . . . . . . . . . . . . . 368
9.1.1 What Is a Hyperplane? . . . . . . . . . . . . . . . . 368
9.1.2 Classification Using a Separating Hyperplane . . . 369
xii CONTENTS
CONTENTS xiii
9.1.3 The Maximal Margin Classifier . . . . . . . . . . . 371
9.1.4 Construction of the Maximal Margin Classifier . . 372
9.1.5 The Non-separable Case . . . . . . . . . . . . . . . 373
9.2 Support Vector Classifiers . . . . . . . . . . . . . . . . . . . 373
9.2.1 Overview of the Support Vector Classifier . . . . . 373
9.2.2 Details of the Support Vector Classifier . . . . . . . 375
9.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . 379
9.3.1 Classification with Non-Linear Decision
Boundaries . . . . . . . . . . . . . . . . . . . . . . 379
9.3.2 The Support Vector Machine . . . . . . . . . . . . 380
9.3.3 An Application to the Heart Disease Data . . . . . 383
9.4 SVMs with More than Two Classes . . . . . . . . . . . . . 385
9.4.1 One-Versus-One Classification . . . . . . . . . . . . 385
9.4.2 One-Versus-All Classification . . . . . . . . . . . . 385
9.5 Relationship to Logistic Regression . . . . . . . . . . . . . 386
9.6 Lab: Support Vector Machines . . . . . . . . . . . . . . . . 388
9.6.1 Support Vector Classifier . . . . . . . . . . . . . . . 389
9.6.2 Support Vector Machine . . . . . . . . . . . . . . . 392
9.6.3 ROC Curves . . . . . . . . . . . . . . . . . . . . . . 394
9.6.4 SVM with Multiple Classes . . . . . . . . . . . . . 396
9.6.5 Application to Gene Expression Data . . . . . . . . 396
9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
10 Deep Learning 403
10.1 Single Layer Neural Networks . . . . . . . . . . . . . . . . 404
10.2 Multilayer Neural Networks . . . . . . . . . . . . . . . . . . 407
10.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . 411
10.3.1 Convolution Layers . . . . . . . . . . . . . . . . . . 412
10.3.2 Pooling Layers . . . . . . . . . . . . . . . . . . . . 415
10.3.3 Architecture of a Convolutional Neural Network . . 415
10.3.4 Data Augmentation . . . . . . . . . . . . . . . . . . 417
10.3.5 Results Using a Pretrained Classifier . . . . . . . . 417
10.4 Document Classification . . . . . . . . . . . . . . . . . . . . 419
10.5 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . 421
10.5.1 Sequential Models for Document Classification . . 424
10.5.2 Time Series Forecasting . . . . . . . . . . . . . . . 427
10.5.3 Summary of RNNs . . . . . . . . . . . . . . . . . . 431
10.6 When to Use Deep Learning . . . . . . . . . . . . . . . . . 432
10.7 Fitting a Neural Network . . . . . . . . . . . . . . . . . . . 434
10.7.1 Backpropagation . . . . . . . . . . . . . . . . . . . 435
10.7.2 Regularization and Stochastic Gradient Descent . . 436
10.7.3 Dropout Learning . . . . . . . . . . . . . . . . . . . 438
10.7.4 Network Tuning . . . . . . . . . . . . . . . . . . . . 438
10.8 Interpolation and Double Descent . . . . . . . . . . . . . . 439
10.9 Lab: Deep Learning . . . . . . . . . . . . . . . . . . . . . . 443
10.9.1 A Single Layer Network on the Hitters Data . . . . 443
10.9.2 A Multilayer Network on the MNIST Digit Data . 445
10.9.3 Convolutional Neural Networks . . . . . . . . . . . 448
10.9.4 Using Pretrained CNN Models . . . . . . . . . . . 451
10.9.5 IMDb Document Classification . . . . . . . . . . . 452
10.9.6 Recurrent Neural Networks . . . . . . . . . . . . . 454
10.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
11 Survival Analysis and Censored Data 461
11.1 Survival and Censoring Times . . . . . . . . . . . . . . . . 462
11.2 A Closer Look at Censoring . . . . . . . . . . . . . . . . . . 463
11.3 The Kaplan-Meier Survival Curve . . . . . . . . . . . . . . 464
11.4 The Log-Rank Test . . . . . . . . . . . . . . . . . . . . . . 466
11.5 Regression Models With a Survival Response . . . . . . . . 469
11.5.1 The Hazard Function . . . . . . . . . . . . . . . . . 469
11.5.2 Proportional Hazards . . . . . . . . . . . . . . . . . 471
11.5.3 Example: Brain Cancer Data . . . . . . . . . . . . 475
11.5.4 Example: Publication Data . . . . . . . . . . . . . 475
11.6 Shrinkage for the Cox Model . . . . . . . . . . . . . . . . . 478
11.7 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . 480
11.7.1 Area Under the Curve for Survival Analysis . . . . 480
11.7.2 Choice of Time Scale . . . . . . . . . . . . . . . . . 481
11.7.3 Time-Dependent Covariates . . . . . . . . . . . . . 481
11.7.4 Checking the Proportional Hazards Assumption . . 482
11.7.5 Survival Trees . . . . . . . . . . . . . . . . . . . . . 482
11.8 Lab: Survival Analysis . . . . . . . . . . . . . . . . . . . . . 483
11.8.1 Brain Cancer Data . . . . . . . . . . . . . . . . . . 483
11.8.2 Publication Data . . . . . . . . . . . . . . . . . . . 486
11.8.3 Call Center Data . . . . . . . . . . . . . . . . . . . 487
11.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
12 Unsupervised Learning 497
12.1 The Challenge of Unsupervised Learning . . . . . . . . . . 497
12.2 Principal Components Analysis . . . . . . . . . . . . . . . . 498
12.2.1 What Are Principal Components? . . . . . . . . . . 499
12.2.2 Another Interpretation of Principal Components . 503
12.2.3 The Proportion of Variance Explained . . . . . . . 505
12.2.4 More on PCA . . . . . . . . . . . . . . . . . . . . . 507
12.2.5 Other Uses for Principal Components . . . . . . . . 510
12.3 Missing Values and Matrix Completion . . . . . . . . . . . 510
12.4 Clustering Methods . . . . . . . . . . . . . . . . . . . . . . 516
12.4.1 K-Means Clustering . . . . . . . . . . . . . . . . . 517
12.4.2 Hierarchical Clustering . . . . . . . . . . . . . . . . 521
12.4.3 Practical Issues in Clustering . . . . . . . . . . . . 530
12.5 Lab: Unsupervised Learning . . . . . . . . . . . . . . . . . 532
xiv CONTENTS
CONTENTS xv
12.5.1 Principal Components Analysis . . . . . . . . . . . 532
12.5.2 Matrix Completion . . . . . . . . . . . . . . . . . . 535
12.5.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . 538
12.5.4 NCI60 Data Example . . . . . . . . . . . . . . . . . 542
12.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
13 Multiple Testing 553
13.1 A Quick Review of Hypothesis Testing . . . . . . . . . . . 554
13.1.1 Testing a Hypothesis . . . . . . . . . . . . . . . . . 555
13.1.2 Type I and Type II Errors . . . . . . . . . . . . . . 559
13.2 The Challenge of Multiple Testing . . . . . . . . . . . . . . 560
13.3 The Family-Wise Error Rate . . . . . . . . . . . . . . . . . 561
13.3.1 What is the Family-Wise Error Rate? . . . . . . . 562
13.3.2 Approaches to Control the Family-Wise Error Rate 564
13.3.3 Trade-Off Between the FWER and Power . . . . . 570
13.4 The False Discovery Rate . . . . . . . . . . . . . . . . . . . 571
13.4.1 Intuition for the False Discovery Rate . . . . . . . 571
13.4.2 The Benjamini-Hochberg Procedure . . . . . . . . 573
13.5 A Re-Sampling Approach to p-Values and False Discovery
Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
13.5.1 A Re-Sampling Approach to the p-Value . . . . . . 576
13.5.2 A Re-Sampling Approach to the False Discovery Rate578
13.5.3 When Are Re-Sampling Approaches Useful? . . . . 581
13.6 Lab: Multiple Testing . . . . . . . . . . . . . . . . . . . . . 582
13.6.1 Review of Hypothesis Tests . . . . . . . . . . . . . 582
13.6.2 The Family-Wise Error Rate . . . . . . . . . . . . . 583
13.6.3 The False Discovery Rate . . . . . . . . . . . . . . 586
13.6.4 A Re-Sampling Approach . . . . . . . . . . . . . . 588
13.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
Index 597
1
Introduction
An Overview of Statistical Learning
Statistical learning refers to a vast set of tools for understanding data. These
tools can be classified as supervised or unsupervised. Broadly speaking,
supervised statistical learning involves building a statistical model for pre-
dicting, or estimating, an output based on one or more inputs. Problems of
this nature occur in fields as diverse as business, medicine, astrophysics, and
public policy. With unsupervised statistical learning, there are inputs but
no supervising output; nevertheless we can learn relationships and struc-
ture from such data. To provide an illustration of some applications of
statistical learning, we briefly discuss three real-world data sets that are
considered in this book.
Wage Data
In this application (which we refer to as the Wage data set throughout this
book), we examine a number of factors that relate to wages for a group of
men from the Atlantic region of the United States. In particular, we wish
to understand the association between an employee’s age and education, as
well as the calendar year, on his wage. Consider, for example, the left-hand
panel of Figure 1.1, which displays wage versus age for each of the individu-
als in the data set. There is evidence that wage increases with age but then
decreases again after approximately age 60. The blue line, which provides
an estimate of the average wage for a given age, makes this trend clearer.
© Springer Science+Business Media, LLC, part of Springer Nature 2021
G. James et al., An Introduction to Statistical Learning, Springer Texts in Statistics,
https://doi.org/10.1007/978-1-0716-1418-1_1
1
http://crossmark.crossref.org/dialog/?doi=10.1007/978-1-0716-1418-1_1&domain=pdf
2 1. Introduction
20 40 60 80
5
0
1
0
0
2
0
0
3
0
0
Age
W
a
g
e
2003 2006 2009
5
0
1
0
0
2
0
0
3
0
0
Year
W
a
g
e
1 2 3 4 5
5
0
1
0
0
2
0
0
3
0
0
Education Level
W
a
g
e
FIGURE 1.1. Wage data, which contains income survey information for men
from the central Atlantic region of the United States. Left: wage as a function of
age. On average, wage increases with age until about 60 years of age, at which
point it begins to decline. Center: wage as a function of year. There is a slow
but steady increase of approximately $10,000 in the average wage between 2003
and 2009. Right: Boxplots displaying wage as a function of education, with 1
indicating the lowest level (no high school diploma) and 5 the highest level (an
advanced graduate degree). On average, wage increases with the level of education.
Given an employee’s age, we can use this curve to predict his wage. However,
it is also clear from Figure 1.1 that there is a significant amount of vari-
ability associated with this average value, and so age alone is unlikely to
provide an accurate prediction of a particular man’s wage.
We also have information regarding each employee’s education level and
the year in which the wage was earned. The center and right-hand panels of
Figure 1.1, which display wage as a function of both year and education, in-
dicate that both of these factors are associated with wage. Wages increase
by approximately $10,000, in a roughly linear (or straight-line) fashion,
between 2003 and 2009, though this rise is very slight relative to the vari-
ability in the data. Wages are also typically greater for individuals with
higher education levels: men with the lowest education level (1) tend to
have substantially lower wages than those with the highest education level
(5). Clearly, the most accurate prediction of a given man’s wage will be
obtained by combining his age, his education, and the year. In Chapter 3,
we discuss linear regression, which can be used to predict wage from this
data set. Ideally, we should predict wage in a way that accounts for the
non-linear relationship between wage and age. In Chapter 7, we discuss a
class of approaches for addressing this problem.
1. Introduction 3
Down Up
−
4
−
2
0
2
4
6
Yesterday
Today’s Direction
P
e
rc
e
n
ta
g
e
c
h
a
n
g
e
i
n
S
&
P
Down Up
−
4
−
2
0
2
4
6
Two Days Previous
Today’s Direction
P
e
rc
e
n
ta
g
e
c
h
a
n
g
e
i
n
S
&
P
Down Up
−
4
−
2
0
2
4
6
Three Days Previous
Today’s Direction
P
e
rc
e
n
ta
g
e
c
h
a
n
g
e
i
n
S
&
P
FIGURE 1.2. Left: Boxplots of the previous day’s percentage change in the S&P
index for the days for which the market increased or decreased, obtained from the
Smarket data. Center and Right: Same as left panel, but the percentage changes
for 2 and 3 days previous are shown.
Stock Market Data
The Wage data involves predicting a continuous or quantitative output value.
This is often referred to as a regression problem. However, in certain cases
we may instead wish to predict a non-numerical value—that is, a categorical
or qualitative output. For example, in Chapter 4 we examine a stock market
data set that contains the daily movements in the Standard & Poor’s 500
(S&P) stock index over a 5-year period between 2001 and 2005. We refer
to this as the Smarket data. The goal is to predict whether the index will
increase or decrease on a given day, using the past 5 days’ percentage
changes in the index. Here the statistical learning problem does not involve
predicting a numerical value. Instead it involves predicting whether a given
day’s stock market performance will fall into the Up bucket or the Down
bucket. This is known as a classification problem. A model that could
accurately predict the direction in which the market will move would be
very useful!
The left-hand panel of Figure 1.2 displays two boxplots of the previous
day’s percentage changes in the stock index: one for the 648 days for which
the market increased on the subsequent day, and one for the 602 days for
which the market decreased. The two plots look almost identical, suggest-
ing that there is no simple strategy for using yesterday’s movement in the
S&P to predict today’s returns. The remaining panels, which display box-
plots for the percentage changes 2 and 3 days previous to today, similarly
indicate little association between past and present returns. Of course, this
lack of pattern is to be expected: in the presence of strong correlations be-
tween successive days’ returns, one could adopt a simple trading strategy
4 1. Introduction
Down Up
0
.4
6
0
.4
8
0
.5
0
0
.5
2
Today’s Direction
P
re
d
ic
te
d
P
ro
b
a
b
ili
ty
FIGURE 1.3. We fit a quadratic discriminant analysis model to the subset
of the Smarket data corresponding to the 2001–2004 time period, and predicted
the probability of a stock market decrease using the 2005 data. On average, the
predicted probability of decrease is higher for the days in which the market does
decrease. Based on these results, we are able to correctly predict the direction of
movement in the market 60\% of the time.
to generate profits from the market. Nevertheless, in Chapter 4, we explore
these data using several different statistical learning methods. Interestingly,
there are hints of some weak trends in the data that suggest that, at least
for this 5-year period, it is possible to correctly predict the direction of
movement in the market approximately 60\% of the time (Figure 1.3).
Gene Expression Data
The previous two applications illustrate data sets with both input and
output variables. However, another important class of problems involves
situations in which we only observe input variables, with no …
CATEGORIES
Economics
Nursing
Applied Sciences
Psychology
Science
Management
Computer Science
Human Resource Management
Accounting
Information Systems
English
Anatomy
Operations Management
Sociology
Literature
Education
Business & Finance
Marketing
Engineering
Statistics
Biology
Political Science
Reading
History
Financial markets
Philosophy
Mathematics
Law
Criminal
Architecture and Design
Government
Social Science
World history
Chemistry
Humanities
Business Finance
Writing
Programming
Telecommunications Engineering
Geography
Physics
Spanish
ach
e. Embedded Entrepreneurship
f. Three Social Entrepreneurship Models
g. Social-Founder Identity
h. Micros-enterprise Development
Outcomes
Subset 2. Indigenous Entrepreneurship Approaches (Outside of Canada)
a. Indigenous Australian Entrepreneurs Exami
Calculus
(people influence of
others) processes that you perceived occurs in this specific Institution Select one of the forms of stratification highlighted (focus on inter the intersectionalities
of these three) to reflect and analyze the potential ways these (
American history
Pharmacology
Ancient history
. Also
Numerical analysis
Environmental science
Electrical Engineering
Precalculus
Physiology
Civil Engineering
Electronic Engineering
ness Horizons
Algebra
Geology
Physical chemistry
nt
When considering both O
lassrooms
Civil
Probability
ions
Identify a specific consumer product that you or your family have used for quite some time. This might be a branded smartphone (if you have used several versions over the years)
or the court to consider in its deliberations. Locard’s exchange principle argues that during the commission of a crime
Chemical Engineering
Ecology
aragraphs (meaning 25 sentences or more). Your assignment may be more than 5 paragraphs but not less.
INSTRUCTIONS:
To access the FNU Online Library for journals and articles you can go the FNU library link here:
https://www.fnu.edu/library/
In order to
n that draws upon the theoretical reading to explain and contextualize the design choices. Be sure to directly quote or paraphrase the reading
ce to the vaccine. Your campaign must educate and inform the audience on the benefits but also create for safe and open dialogue. A key metric of your campaign will be the direct increase in numbers.
Key outcomes: The approach that you take must be clear
Mechanical Engineering
Organic chemistry
Geometry
nment
Topic
You will need to pick one topic for your project (5 pts)
Literature search
You will need to perform a literature search for your topic
Geophysics
you been involved with a company doing a redesign of business processes
Communication on Customer Relations. Discuss how two-way communication on social media channels impacts businesses both positively and negatively. Provide any personal examples from your experience
od pressure and hypertension via a community-wide intervention that targets the problem across the lifespan (i.e. includes all ages).
Develop a community-wide intervention to reduce elevated blood pressure and hypertension in the State of Alabama that in
in body of the report
Conclusions
References (8 References Minimum)
*** Words count = 2000 words.
*** In-Text Citations and References using Harvard style.
*** In Task section I’ve chose (Economic issues in overseas contracting)"
Electromagnetism
w or quality improvement; it was just all part of good nursing care. The goal for quality improvement is to monitor patient outcomes using statistics for comparison to standards of care for different diseases
e a 1 to 2 slide Microsoft PowerPoint presentation on the different models of case management. Include speaker notes... .....Describe three different models of case management.
visual representations of information. They can include numbers
SSAY
ame workbook for all 3 milestones. You do not need to download a new copy for Milestones 2 or 3. When you submit Milestone 3
pages):
Provide a description of an existing intervention in Canada
making the appropriate buying decisions in an ethical and professional manner.
Topic: Purchasing and Technology
You read about blockchain ledger technology. Now do some additional research out on the Internet and share your URL with the rest of the class
be aware of which features their competitors are opting to include so the product development teams can design similar or enhanced features to attract more of the market. The more unique
low (The Top Health Industry Trends to Watch in 2015) to assist you with this discussion.
https://youtu.be/fRym_jyuBc0
Next year the $2.8 trillion U.S. healthcare industry will finally begin to look and feel more like the rest of the business wo
evidence-based primary care curriculum. Throughout your nurse practitioner program
Vignette
Understanding Gender Fluidity
Providing Inclusive Quality Care
Affirming Clinical Encounters
Conclusion
References
Nurse Practitioner Knowledge
Mechanics
and word limit is unit as a guide only.
The assessment may be re-attempted on two further occasions (maximum three attempts in total). All assessments must be resubmitted 3 days within receiving your unsatisfactory grade. You must clearly indicate “Re-su
Trigonometry
Article writing
Other
5. June 29
After the components sending to the manufacturing house
1. In 1972 the Furman v. Georgia case resulted in a decision that would put action into motion. Furman was originally sentenced to death because of a murder he committed in Georgia but the court debated whether or not this was a violation of his 8th amend
One of the first conflicts that would need to be investigated would be whether the human service professional followed the responsibility to client ethical standard. While developing a relationship with client it is important to clarify that if danger or
Ethical behavior is a critical topic in the workplace because the impact of it can make or break a business
No matter which type of health care organization
With a direct sale
During the pandemic
Computers are being used to monitor the spread of outbreaks in different areas of the world and with this record
3. Furman v. Georgia is a U.S Supreme Court case that resolves around the Eighth Amendments ban on cruel and unsual punishment in death penalty cases. The Furman v. Georgia case was based on Furman being convicted of murder in Georgia. Furman was caught i
One major ethical conflict that may arise in my investigation is the Responsibility to Client in both Standard 3 and Standard 4 of the Ethical Standards for Human Service Professionals (2015). Making sure we do not disclose information without consent ev
4. Identify two examples of real world problems that you have observed in your personal
Summary & Evaluation: Reference & 188. Academic Search Ultimate
Ethics
We can mention at least one example of how the violation of ethical standards can be prevented. Many organizations promote ethical self-regulation by creating moral codes to help direct their business activities
*DDB is used for the first three years
For example
The inbound logistics for William Instrument refer to purchase components from various electronic firms. During the purchase process William need to consider the quality and price of the components. In this case
4. A U.S. Supreme Court case known as Furman v. Georgia (1972) is a landmark case that involved Eighth Amendment’s ban of unusual and cruel punishment in death penalty cases (Furman v. Georgia (1972)
With covid coming into place
In my opinion
with
Not necessarily all home buyers are the same! When you choose to work with we buy ugly houses Baltimore & nationwide USA
The ability to view ourselves from an unbiased perspective allows us to critically assess our personal strengths and weaknesses. This is an important step in the process of finding the right resources for our personal learning style. Ego and pride can be
· By Day 1 of this week
While you must form your answers to the questions below from our assigned reading material
CliftonLarsonAllen LLP (2013)
5 The family dynamic is awkward at first since the most outgoing and straight forward person in the family in Linda
Urien
The most important benefit of my statistical analysis would be the accuracy with which I interpret the data. The greatest obstacle
From a similar but larger point of view
4 In order to get the entire family to come back for another session I would suggest coming in on a day the restaurant is not open
When seeking to identify a patient’s health condition
After viewing the you tube videos on prayer
Your paper must be at least two pages in length (not counting the title and reference pages)
The word assimilate is negative to me. I believe everyone should learn about a country that they are going to live in. It doesnt mean that they have to believe that everything in America is better than where they came from. It means that they care enough
Data collection
Single Subject Chris is a social worker in a geriatric case management program located in a midsize Northeastern town. She has an MSW and is part of a team of case managers that likes to continuously improve on its practice. The team is currently using an
I would start off with Linda on repeating her options for the child and going over what she is feeling with each option. I would want to find out what she is afraid of. I would avoid asking her any “why” questions because I want her to be in the here an
Summarize the advantages and disadvantages of using an Internet site as means of collecting data for psychological research (Comp 2.1) 25.0\% Summarization of the advantages and disadvantages of using an Internet site as means of collecting data for psych
Identify the type of research used in a chosen study
Compose a 1
Optics
effect relationship becomes more difficult—as the researcher cannot enact total control of another person even in an experimental environment. Social workers serve clients in highly complex real-world environments. Clients often implement recommended inte
I think knowing more about you will allow you to be able to choose the right resources
Be 4 pages in length
soft MB-920 dumps review and documentation and high-quality listing pdf MB-920 braindumps also recommended and approved by Microsoft experts. The practical test
g
One thing you will need to do in college is learn how to find and use references. References support your ideas. College-level work must be supported by research. You are expected to do that for this paper. You will research
Elaborate on any potential confounds or ethical concerns while participating in the psychological study 20.0\% Elaboration on any potential confounds or ethical concerns while participating in the psychological study is missing. Elaboration on any potenti
3 The first thing I would do in the family’s first session is develop a genogram of the family to get an idea of all the individuals who play a major role in Linda’s life. After establishing where each member is in relation to the family
A Health in All Policies approach
Note: The requirements outlined below correspond to the grading criteria in the scoring guide. At a minimum
Chen
Read Connecting Communities and Complexity: A Case Study in Creating the Conditions for Transformational Change
Read Reflections on Cultural Humility
Read A Basic Guide to ABCD Community Organizing
Use the bolded black section and sub-section titles below to organize your paper. For each section
Losinski forwarded the article on a priority basis to Mary Scott
Losinksi wanted details on use of the ED at CGH. He asked the administrative resident