advance statistical learning - Statistics
need to solve following set of problems An Introduction to Statistical Learning with Applications in R, Second Edition which is available in the below file Chapter 2: #1 all parts, #6, and #8 all parts Chapter 3: #5, #6, #9 (a), (b), (c), (e), (f), #10 (a)-(g), #11 (a), (b), (c), (f), #13 all parts, #15 all parts For the applied problems, you need to provide R codes . Gareth James • Daniela Witten • Trevor Hastie • Robert Tibshirani An Introduction to Statistical Learning with Applications in R Second Edition 123 Daniela Witten First Printing: August 4, 2021 To our parents: Alison and Michael James Chiara Nappi and Edward Witten Valerie and Patrick Hastie Vera and Sami Tibshirani and to our families: Michael, Daniel, and Catherine Tessa, Theo, Otto, and Ari Samantha, Timothy, and Lynda Charlie, Ryan, Julie, and Cheryl Preface Statistical learning refers to a set of tools for making sense of complex datasets. In recent years, we have seen a staggering increase in the scale and scope of data collection across virtually all areas of science and industry. As a result, statistical learning has become a critical toolkit for anyone who wishes to understand data — and as more and more of today’s jobs involve data, this means that statistical learning is fast becoming a critical toolkit for everyone. One of the first books on statistical learning — The Elements of Statisti- cal Learning (ESL, by Hastie, Tibshirani, and Friedman) — was published in 2001, with a second edition in 2009. ESL has become a popular text not only in statistics but also in related fields. One of the reasons for ESL’s popularity is its relatively accessible style. But ESL is best-suited for indi- viduals with advanced training in the mathematical sciences. An Introduction to Statistical Learning (ISL) arose from the clear need for a broader and less technical treatment of the key topics in statistical learning. The intention behind ISL is to concentrate more on the applica- tions of the methods and less on the mathematical details. Beginning with Chapter 2, each chapter in ISL contains a lab illustrating how to implement the statistical learning methods seen in that chapter using the popular sta- tistical software package R. These labs provide the reader with valuable hands-on experience. ISL is appropriate for advanced undergraduates or master’s students in Statistics or related quantitative fields, or for individuals in other disciplines who wish to use statistical learning tools to analyze their data. It can be used as a textbook for a course spanning two semesters. vii The first edition of ISL covered a number of important topics, including sparse methods for classification and regression, decision trees, boosting, support vector machines, and clustering. Since it was published in 2013, it has become a mainstay of undergraduate and graduate classrooms across the United States and worldwide, as well as a key reference book for data scientists. In this second edition of ISL, we have greatly expanded the set of topics covered. In particular, the second edition includes new chapters on deep learning (Chapter 10), survival analysis (Chapter 11), and multiple testing (Chapter 13). We have also substantially expanded some chapters that were part of the first edition: among other updates, we now include treatments of naive Bayes and generalized linear models in Chapter 4, Bayesian addi- tive regression trees in Chapter 8, and matrix completion in Chapter 12. Furthermore, we have updated the R code throughout the labs to ensure that the results that they produce agree with recent R releases. We are grateful to these readers for providing valuable comments on the first edition of this book: Pallavi Basu, Alexandra Chouldechova, Patrick Danaher, Will Fithian, Luella Fu, Sam Gross, Max Grazier G’Sell, Court- ney Paulson, Xinghao Qiao, Elisa Sheng, Noah Simon, Kean Ming Tan, Xin Lu Tan. We thank these readers for helpful input on the second edi- tion of this book: Alan Agresti, Iain Carmichael, Yiqun Chen, Erin Craig, Daisy Ding, Lucy Gao, Ismael Lemhadri, Bryan Martin, Anna Neufeld, Ge- off Tims, Carsten Voelkmann, Steve Yadlowsky, and James Zou. We also thank Anna Neufeld for her assistance in reformatting the R code through- out this book. We are immensely grateful to Balasubramanian “Naras” Narasimhan for his assistance on both editions of this textbook. It has been an honor and a privilege for us to see the considerable impact that the first edition of ISL has had on the way in which statistical learning is practiced, both in and out of the academic setting. We hope that this new edition will continue to give today’s and tomorrow’s applied statisticians and data scientists the tools they need for success in a data-driven world. It’s tough to make predictions, especially about the future. -Yogi Berra viii Preface Contents Preface vii 1 Introduction 1 2 Statistical Learning 15 2.1 What Is Statistical Learning? . . . . . . . . . . . . . . . . . 15 2.1.1 Why Estimate f? . . . . . . . . . . . . . . . . . . . 17 2.1.2 How Do We Estimate f? . . . . . . . . . . . . . . . 21 2.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability . . . . . . . . . . . . . . 24 2.1.4 Supervised Versus Unsupervised Learning . . . . . 26 2.1.5 Regression Versus Classification Problems . . . . . 28 2.2 Assessing Model Accuracy . . . . . . . . . . . . . . . . . . 29 2.2.1 Measuring the Quality of Fit . . . . . . . . . . . . 29 2.2.2 The Bias-Variance Trade-Off . . . . . . . . . . . . . 33 2.2.3 The Classification Setting . . . . . . . . . . . . . . 37 2.3 Lab: Introduction to R . . . . . . . . . . . . . . . . . . . . 42 2.3.1 Basic Commands . . . . . . . . . . . . . . . . . . . 43 2.3.2 Graphics . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3.3 Indexing Data . . . . . . . . . . . . . . . . . . . . . 47 2.3.4 Loading Data . . . . . . . . . . . . . . . . . . . . . 48 2.3.5 Additional Graphical and Numerical Summaries . . 50 2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3 Linear Regression 59 3.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . 60 3.1.1 Estimating the Coefficients . . . . . . . . . . . . . 61 3.1.2 Assessing the Accuracy of the Coefficient Estimates . . . . . . . . . . . . . . . . . . . . . . . 63 3.1.3 Assessing the Accuracy of the Model . . . . . . . . 68 3.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . 71 3.2.1 Estimating the Regression Coefficients . . . . . . . 72 3.2.2 Some Important Questions . . . . . . . . . . . . . . 75 3.3 Other Considerations in the Regression Model . . . . . . . 83 ix 3.3.1 Qualitative Predictors . . . . . . . . . . . . . . . . 83 3.3.2 Extensions of the Linear Model . . . . . . . . . . . 87 3.3.3 Potential Problems . . . . . . . . . . . . . . . . . . 92 3.4 The Marketing Plan . . . . . . . . . . . . . . . . . . . . . . 103 3.5 Comparison of Linear Regression with K-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.6 Lab: Linear Regression . . . . . . . . . . . . . . . . . . . . 110 3.6.1 Libraries . . . . . . . . . . . . . . . . . . . . . . . . 110 3.6.2 Simple Linear Regression . . . . . . . . . . . . . . . 111 3.6.3 Multiple Linear Regression . . . . . . . . . . . . . . 114 3.6.4 Interaction Terms . . . . . . . . . . . . . . . . . . . 116 3.6.5 Non-linear Transformations of the Predictors . . . 116 3.6.6 Qualitative Predictors . . . . . . . . . . . . . . . . 119 3.6.7 Writing Functions . . . . . . . . . . . . . . . . . . . 120 3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4 Classification 129 4.1 An Overview of Classification . . . . . . . . . . . . . . . . . 130 4.2 Why Not Linear Regression? . . . . . . . . . . . . . . . . . 131 4.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . 133 4.3.1 The Logistic Model . . . . . . . . . . . . . . . . . . 133 4.3.2 Estimating the Regression Coefficients . . . . . . . 135 4.3.3 Making Predictions . . . . . . . . . . . . . . . . . . 136 4.3.4 Multiple Logistic Regression . . . . . . . . . . . . . 137 4.3.5 Multinomial Logistic Regression . . . . . . . . . . . 140 4.4 Generative Models for Classification . . . . . . . . . . . . . 141 4.4.1 Linear Discriminant Analysis for p = 1 . . . . . . . 142 4.4.2 Linear Discriminant Analysis for p >1 . . . . . . . 145 4.4.3 Quadratic Discriminant Analysis . . . . . . . . . . 152 4.4.4 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . 153 4.5 A Comparison of Classification Methods . . . . . . . . . . 158 4.5.1 An Analytical Comparison . . . . . . . . . . . . . . 158 4.5.2 An Empirical Comparison . . . . . . . . . . . . . . 161 4.6 Generalized Linear Models . . . . . . . . . . . . . . . . . . 164 4.6.1 Linear Regression on the Bikeshare Data . . . . . . 164 4.6.2 Poisson Regression on the Bikeshare Data . . . . . 167 4.6.3 Generalized Linear Models in Greater Generality . 170 4.7 Lab: Classification Methods . . . . . . . . . . . . . . . . . . 171 4.7.1 The Stock Market Data . . . . . . . . . . . . . . . 171 4.7.2 Logistic Regression . . . . . . . . . . . . . . . . . . 172 4.7.3 Linear Discriminant Analysis . . . . . . . . . . . . 177 4.7.4 Quadratic Discriminant Analysis . . . . . . . . . . 179 4.7.5 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . 180 4.7.6 K-Nearest Neighbors . . . . . . . . . . . . . . . . . 181 4.7.7 Poisson Regression . . . . . . . . . . . . . . . . . . 185 x CONTENTS CONTENTS xi 4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 5 Resampling Methods 197 5.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . 198 5.1.1 The Validation Set Approach . . . . . . . . . . . . 198 5.1.2 Leave-One-Out Cross-Validation . . . . . . . . . . 200 5.1.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . 203 5.1.4 Bias-Variance Trade-Off for k-Fold Cross-Validation . . . . . . . . . . . . . . . . . . . 205 5.1.5 Cross-Validation on Classification Problems . . . . 206 5.2 The Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . 209 5.3 Lab: Cross-Validation and the Bootstrap . . . . . . . . . . 212 5.3.1 The Validation Set Approach . . . . . . . . . . . . 213 5.3.2 Leave-One-Out Cross-Validation . . . . . . . . . . 214 5.3.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . 215 5.3.4 The Bootstrap . . . . . . . . . . . . . . . . . . . . 216 5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 6 Linear Model Selection and Regularization 225 6.1 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . 227 6.1.1 Best Subset Selection . . . . . . . . . . . . . . . . . 227 6.1.2 Stepwise Selection . . . . . . . . . . . . . . . . . . 229 6.1.3 Choosing the Optimal Model . . . . . . . . . . . . 232 6.2 Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . 237 6.2.1 Ridge Regression . . . . . . . . . . . . . . . . . . . 237 6.2.2 The Lasso . . . . . . . . . . . . . . . . . . . . . . . 241 6.2.3 Selecting the Tuning Parameter . . . . . . . . . . . 250 6.3 Dimension Reduction Methods . . . . . . . . . . . . . . . . 251 6.3.1 Principal Components Regression . . . . . . . . . . 252 6.3.2 Partial Least Squares . . . . . . . . . . . . . . . . . 259 6.4 Considerations in High Dimensions . . . . . . . . . . . . . 261 6.4.1 High-Dimensional Data . . . . . . . . . . . . . . . . 261 6.4.2 What Goes Wrong in High Dimensions? . . . . . . 262 6.4.3 Regression in High Dimensions . . . . . . . . . . . 264 6.4.4 Interpreting Results in High Dimensions . . . . . . 266 6.5 Lab: Linear Models and Regularization Methods . . . . . . 267 6.5.1 Subset Selection Methods . . . . . . . . . . . . . . 267 6.5.2 Ridge Regression and the Lasso . . . . . . . . . . . 274 6.5.3 PCR and PLS Regression . . . . . . . . . . . . . . 279 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 7 Moving Beyond Linearity 289 7.1 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . 290 7.2 Step Functions . . . . . . . . . . . . . . . . . . . . . . . . . 292 7.3 Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . 294 7.4 Regression Splines . . . . . . . . . . . . . . . . . . . . . . . 295 7.4.1 Piecewise Polynomials . . . . . . . . . . . . . . . . 295 7.4.2 Constraints and Splines . . . . . . . . . . . . . . . 295 7.4.3 The Spline Basis Representation . . . . . . . . . . 297 7.4.4 Choosing the Number and Locations of the Knots . . . . . . . . . . . . . . . . . . . . . . 298 7.4.5 Comparison to Polynomial Regression . . . . . . . 300 7.5 Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . . 301 7.5.1 An Overview of Smoothing Splines . . . . . . . . . 301 7.5.2 Choosing the Smoothing Parameter λ . . . . . . . 302 7.6 Local Regression . . . . . . . . . . . . . . . . . . . . . . . . 304 7.7 Generalized Additive Models . . . . . . . . . . . . . . . . . 306 7.7.1 GAMs for Regression Problems . . . . . . . . . . . 307 7.7.2 GAMs for Classification Problems . . . . . . . . . . 310 7.8 Lab: Non-linear Modeling . . . . . . . . . . . . . . . . . . . 311 7.8.1 Polynomial Regression and Step Functions . . . . . 312 7.8.2 Splines . . . . . . . . . . . . . . . . . . . . . . . . . 317 7.8.3 GAMs . . . . . . . . . . . . . . . . . . . . . . . . . 318 7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 8 Tree-Based Methods 327 8.1 The Basics of Decision Trees . . . . . . . . . . . . . . . . . 327 8.1.1 Regression Trees . . . . . . . . . . . . . . . . . . . 328 8.1.2 Classification Trees . . . . . . . . . . . . . . . . . . 335 8.1.3 Trees Versus Linear Models . . . . . . . . . . . . . 338 8.1.4 Advantages and Disadvantages of Trees . . . . . . 339 8.2 Bagging, Random Forests, Boosting, and Bayesian Additive Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . 340 8.2.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . 340 8.2.2 Random Forests . . . . . . . . . . . . . . . . . . . . 343 8.2.3 Boosting . . . . . . . . . . . . . . . . . . . . . . . . 345 8.2.4 Bayesian Additive Regression Trees . . . . . . . . . 348 8.2.5 Summary of Tree Ensemble Methods . . . . . . . . 351 8.3 Lab: Decision Trees . . . . . . . . . . . . . . . . . . . . . . 353 8.3.1 Fitting Classification Trees . . . . . . . . . . . . . . 353 8.3.2 Fitting Regression Trees . . . . . . . . . . . . . . . 356 8.3.3 Bagging and Random Forests . . . . . . . . . . . . 357 8.3.4 Boosting . . . . . . . . . . . . . . . . . . . . . . . . 359 8.3.5 Bayesian Additive Regression Trees . . . . . . . . . 360 8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 9 Support Vector Machines 367 9.1 Maximal Margin Classifier . . . . . . . . . . . . . . . . . . 368 9.1.1 What Is a Hyperplane? . . . . . . . . . . . . . . . . 368 9.1.2 Classification Using a Separating Hyperplane . . . 369 xii CONTENTS CONTENTS xiii 9.1.3 The Maximal Margin Classifier . . . . . . . . . . . 371 9.1.4 Construction of the Maximal Margin Classifier . . 372 9.1.5 The Non-separable Case . . . . . . . . . . . . . . . 373 9.2 Support Vector Classifiers . . . . . . . . . . . . . . . . . . . 373 9.2.1 Overview of the Support Vector Classifier . . . . . 373 9.2.2 Details of the Support Vector Classifier . . . . . . . 375 9.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . 379 9.3.1 Classification with Non-Linear Decision Boundaries . . . . . . . . . . . . . . . . . . . . . . 379 9.3.2 The Support Vector Machine . . . . . . . . . . . . 380 9.3.3 An Application to the Heart Disease Data . . . . . 383 9.4 SVMs with More than Two Classes . . . . . . . . . . . . . 385 9.4.1 One-Versus-One Classification . . . . . . . . . . . . 385 9.4.2 One-Versus-All Classification . . . . . . . . . . . . 385 9.5 Relationship to Logistic Regression . . . . . . . . . . . . . 386 9.6 Lab: Support Vector Machines . . . . . . . . . . . . . . . . 388 9.6.1 Support Vector Classifier . . . . . . . . . . . . . . . 389 9.6.2 Support Vector Machine . . . . . . . . . . . . . . . 392 9.6.3 ROC Curves . . . . . . . . . . . . . . . . . . . . . . 394 9.6.4 SVM with Multiple Classes . . . . . . . . . . . . . 396 9.6.5 Application to Gene Expression Data . . . . . . . . 396 9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 10 Deep Learning 403 10.1 Single Layer Neural Networks . . . . . . . . . . . . . . . . 404 10.2 Multilayer Neural Networks . . . . . . . . . . . . . . . . . . 407 10.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . 411 10.3.1 Convolution Layers . . . . . . . . . . . . . . . . . . 412 10.3.2 Pooling Layers . . . . . . . . . . . . . . . . . . . . 415 10.3.3 Architecture of a Convolutional Neural Network . . 415 10.3.4 Data Augmentation . . . . . . . . . . . . . . . . . . 417 10.3.5 Results Using a Pretrained Classifier . . . . . . . . 417 10.4 Document Classification . . . . . . . . . . . . . . . . . . . . 419 10.5 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . 421 10.5.1 Sequential Models for Document Classification . . 424 10.5.2 Time Series Forecasting . . . . . . . . . . . . . . . 427 10.5.3 Summary of RNNs . . . . . . . . . . . . . . . . . . 431 10.6 When to Use Deep Learning . . . . . . . . . . . . . . . . . 432 10.7 Fitting a Neural Network . . . . . . . . . . . . . . . . . . . 434 10.7.1 Backpropagation . . . . . . . . . . . . . . . . . . . 435 10.7.2 Regularization and Stochastic Gradient Descent . . 436 10.7.3 Dropout Learning . . . . . . . . . . . . . . . . . . . 438 10.7.4 Network Tuning . . . . . . . . . . . . . . . . . . . . 438 10.8 Interpolation and Double Descent . . . . . . . . . . . . . . 439 10.9 Lab: Deep Learning . . . . . . . . . . . . . . . . . . . . . . 443 10.9.1 A Single Layer Network on the Hitters Data . . . . 443 10.9.2 A Multilayer Network on the MNIST Digit Data . 445 10.9.3 Convolutional Neural Networks . . . . . . . . . . . 448 10.9.4 Using Pretrained CNN Models . . . . . . . . . . . 451 10.9.5 IMDb Document Classification . . . . . . . . . . . 452 10.9.6 Recurrent Neural Networks . . . . . . . . . . . . . 454 10.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 11 Survival Analysis and Censored Data 461 11.1 Survival and Censoring Times . . . . . . . . . . . . . . . . 462 11.2 A Closer Look at Censoring . . . . . . . . . . . . . . . . . . 463 11.3 The Kaplan-Meier Survival Curve . . . . . . . . . . . . . . 464 11.4 The Log-Rank Test . . . . . . . . . . . . . . . . . . . . . . 466 11.5 Regression Models With a Survival Response . . . . . . . . 469 11.5.1 The Hazard Function . . . . . . . . . . . . . . . . . 469 11.5.2 Proportional Hazards . . . . . . . . . . . . . . . . . 471 11.5.3 Example: Brain Cancer Data . . . . . . . . . . . . 475 11.5.4 Example: Publication Data . . . . . . . . . . . . . 475 11.6 Shrinkage for the Cox Model . . . . . . . . . . . . . . . . . 478 11.7 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . 480 11.7.1 Area Under the Curve for Survival Analysis . . . . 480 11.7.2 Choice of Time Scale . . . . . . . . . . . . . . . . . 481 11.7.3 Time-Dependent Covariates . . . . . . . . . . . . . 481 11.7.4 Checking the Proportional Hazards Assumption . . 482 11.7.5 Survival Trees . . . . . . . . . . . . . . . . . . . . . 482 11.8 Lab: Survival Analysis . . . . . . . . . . . . . . . . . . . . . 483 11.8.1 Brain Cancer Data . . . . . . . . . . . . . . . . . . 483 11.8.2 Publication Data . . . . . . . . . . . . . . . . . . . 486 11.8.3 Call Center Data . . . . . . . . . . . . . . . . . . . 487 11.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 12 Unsupervised Learning 497 12.1 The Challenge of Unsupervised Learning . . . . . . . . . . 497 12.2 Principal Components Analysis . . . . . . . . . . . . . . . . 498 12.2.1 What Are Principal Components? . . . . . . . . . . 499 12.2.2 Another Interpretation of Principal Components . 503 12.2.3 The Proportion of Variance Explained . . . . . . . 505 12.2.4 More on PCA . . . . . . . . . . . . . . . . . . . . . 507 12.2.5 Other Uses for Principal Components . . . . . . . . 510 12.3 Missing Values and Matrix Completion . . . . . . . . . . . 510 12.4 Clustering Methods . . . . . . . . . . . . . . . . . . . . . . 516 12.4.1 K-Means Clustering . . . . . . . . . . . . . . . . . 517 12.4.2 Hierarchical Clustering . . . . . . . . . . . . . . . . 521 12.4.3 Practical Issues in Clustering . . . . . . . . . . . . 530 12.5 Lab: Unsupervised Learning . . . . . . . . . . . . . . . . . 532 xiv CONTENTS CONTENTS xv 12.5.1 Principal Components Analysis . . . . . . . . . . . 532 12.5.2 Matrix Completion . . . . . . . . . . . . . . . . . . 535 12.5.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . 538 12.5.4 NCI60 Data Example . . . . . . . . . . . . . . . . . 542 12.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 13 Multiple Testing 553 13.1 A Quick Review of Hypothesis Testing . . . . . . . . . . . 554 13.1.1 Testing a Hypothesis . . . . . . . . . . . . . . . . . 555 13.1.2 Type I and Type II Errors . . . . . . . . . . . . . . 559 13.2 The Challenge of Multiple Testing . . . . . . . . . . . . . . 560 13.3 The Family-Wise Error Rate . . . . . . . . . . . . . . . . . 561 13.3.1 What is the Family-Wise Error Rate? . . . . . . . 562 13.3.2 Approaches to Control the Family-Wise Error Rate 564 13.3.3 Trade-Off Between the FWER and Power . . . . . 570 13.4 The False Discovery Rate . . . . . . . . . . . . . . . . . . . 571 13.4.1 Intuition for the False Discovery Rate . . . . . . . 571 13.4.2 The Benjamini-Hochberg Procedure . . . . . . . . 573 13.5 A Re-Sampling Approach to p-Values and False Discovery Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 13.5.1 A Re-Sampling Approach to the p-Value . . . . . . 576 13.5.2 A Re-Sampling Approach to the False Discovery Rate578 13.5.3 When Are Re-Sampling Approaches Useful? . . . . 581 13.6 Lab: Multiple Testing . . . . . . . . . . . . . . . . . . . . . 582 13.6.1 Review of Hypothesis Tests . . . . . . . . . . . . . 582 13.6.2 The Family-Wise Error Rate . . . . . . . . . . . . . 583 13.6.3 The False Discovery Rate . . . . . . . . . . . . . . 586 13.6.4 A Re-Sampling Approach . . . . . . . . . . . . . . 588 13.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 Index 597 1 Introduction An Overview of Statistical Learning Statistical learning refers to a vast set of tools for understanding data. These tools can be classified as supervised or unsupervised. Broadly speaking, supervised statistical learning involves building a statistical model for pre- dicting, or estimating, an output based on one or more inputs. Problems of this nature occur in fields as diverse as business, medicine, astrophysics, and public policy. With unsupervised statistical learning, there are inputs but no supervising output; nevertheless we can learn relationships and struc- ture from such data. To provide an illustration of some applications of statistical learning, we briefly discuss three real-world data sets that are considered in this book. Wage Data In this application (which we refer to as the Wage data set throughout this book), we examine a number of factors that relate to wages for a group of men from the Atlantic region of the United States. In particular, we wish to understand the association between an employee’s age and education, as well as the calendar year, on his wage. Consider, for example, the left-hand panel of Figure 1.1, which displays wage versus age for each of the individu- als in the data set. There is evidence that wage increases with age but then decreases again after approximately age 60. The blue line, which provides an estimate of the average wage for a given age, makes this trend clearer. © Springer Science+Business Media, LLC, part of Springer Nature 2021 G. James et al., An Introduction to Statistical Learning, Springer Texts in Statistics, https://doi.org/10.1007/978-1-0716-1418-1_1 1 http://crossmark.crossref.org/dialog/?doi=10.1007/978-1-0716-1418-1_1&domain=pdf 2 1. Introduction 20 40 60 80 5 0 1 0 0 2 0 0 3 0 0 Age W a g e 2003 2006 2009 5 0 1 0 0 2 0 0 3 0 0 Year W a g e 1 2 3 4 5 5 0 1 0 0 2 0 0 3 0 0 Education Level W a g e FIGURE 1.1. Wage data, which contains income survey information for men from the central Atlantic region of the United States. Left: wage as a function of age. On average, wage increases with age until about 60 years of age, at which point it begins to decline. Center: wage as a function of year. There is a slow but steady increase of approximately $10,000 in the average wage between 2003 and 2009. Right: Boxplots displaying wage as a function of education, with 1 indicating the lowest level (no high school diploma) and 5 the highest level (an advanced graduate degree). On average, wage increases with the level of education. Given an employee’s age, we can use this curve to predict his wage. However, it is also clear from Figure 1.1 that there is a significant amount of vari- ability associated with this average value, and so age alone is unlikely to provide an accurate prediction of a particular man’s wage. We also have information regarding each employee’s education level and the year in which the wage was earned. The center and right-hand panels of Figure 1.1, which display wage as a function of both year and education, in- dicate that both of these factors are associated with wage. Wages increase by approximately $10,000, in a roughly linear (or straight-line) fashion, between 2003 and 2009, though this rise is very slight relative to the vari- ability in the data. Wages are also typically greater for individuals with higher education levels: men with the lowest education level (1) tend to have substantially lower wages than those with the highest education level (5). Clearly, the most accurate prediction of a given man’s wage will be obtained by combining his age, his education, and the year. In Chapter 3, we discuss linear regression, which can be used to predict wage from this data set. Ideally, we should predict wage in a way that accounts for the non-linear relationship between wage and age. In Chapter 7, we discuss a class of approaches for addressing this problem. 1. Introduction 3 Down Up − 4 − 2 0 2 4 6 Yesterday Today’s Direction P e rc e n ta g e c h a n g e i n S & P Down Up − 4 − 2 0 2 4 6 Two Days Previous Today’s Direction P e rc e n ta g e c h a n g e i n S & P Down Up − 4 − 2 0 2 4 6 Three Days Previous Today’s Direction P e rc e n ta g e c h a n g e i n S & P FIGURE 1.2. Left: Boxplots of the previous day’s percentage change in the S&P index for the days for which the market increased or decreased, obtained from the Smarket data. Center and Right: Same as left panel, but the percentage changes for 2 and 3 days previous are shown. Stock Market Data The Wage data involves predicting a continuous or quantitative output value. This is often referred to as a regression problem. However, in certain cases we may instead wish to predict a non-numerical value—that is, a categorical or qualitative output. For example, in Chapter 4 we examine a stock market data set that contains the daily movements in the Standard & Poor’s 500 (S&P) stock index over a 5-year period between 2001 and 2005. We refer to this as the Smarket data. The goal is to predict whether the index will increase or decrease on a given day, using the past 5 days’ percentage changes in the index. Here the statistical learning problem does not involve predicting a numerical value. Instead it involves predicting whether a given day’s stock market performance will fall into the Up bucket or the Down bucket. This is known as a classification problem. A model that could accurately predict the direction in which the market will move would be very useful! The left-hand panel of Figure 1.2 displays two boxplots of the previous day’s percentage changes in the stock index: one for the 648 days for which the market increased on the subsequent day, and one for the 602 days for which the market decreased. The two plots look almost identical, suggest- ing that there is no simple strategy for using yesterday’s movement in the S&P to predict today’s returns. The remaining panels, which display box- plots for the percentage changes 2 and 3 days previous to today, similarly indicate little association between past and present returns. Of course, this lack of pattern is to be expected: in the presence of strong correlations be- tween successive days’ returns, one could adopt a simple trading strategy 4 1. Introduction Down Up 0 .4 6 0 .4 8 0 .5 0 0 .5 2 Today’s Direction P re d ic te d P ro b a b ili ty FIGURE 1.3. We fit a quadratic discriminant analysis model to the subset of the Smarket data corresponding to the 2001–2004 time period, and predicted the probability of a stock market decrease using the 2005 data. On average, the predicted probability of decrease is higher for the days in which the market does decrease. Based on these results, we are able to correctly predict the direction of movement in the market 60\% of the time. to generate profits from the market. Nevertheless, in Chapter 4, we explore these data using several different statistical learning methods. Interestingly, there are hints of some weak trends in the data that suggest that, at least for this 5-year period, it is possible to correctly predict the direction of movement in the market approximately 60\% of the time (Figure 1.3). Gene Expression Data The previous two applications illustrate data sets with both input and output variables. However, another important class of problems involves situations in which we only observe input variables, with no …
CATEGORIES
Economics Nursing Applied Sciences Psychology Science Management Computer Science Human Resource Management Accounting Information Systems English Anatomy Operations Management Sociology Literature Education Business & Finance Marketing Engineering Statistics Biology Political Science Reading History Financial markets Philosophy Mathematics Law Criminal Architecture and Design Government Social Science World history Chemistry Humanities Business Finance Writing Programming Telecommunications Engineering Geography Physics Spanish ach e. Embedded Entrepreneurship f. Three Social Entrepreneurship Models g. Social-Founder Identity h. Micros-enterprise Development Outcomes Subset 2. Indigenous Entrepreneurship Approaches (Outside of Canada) a. Indigenous Australian Entrepreneurs Exami Calculus (people influence of  others) processes that you perceived occurs in this specific Institution Select one of the forms of stratification highlighted (focus on inter the intersectionalities  of these three) to reflect and analyze the potential ways these ( American history Pharmacology Ancient history . Also Numerical analysis Environmental science Electrical Engineering Precalculus Physiology Civil Engineering Electronic Engineering ness Horizons Algebra Geology Physical chemistry nt When considering both O lassrooms Civil Probability ions Identify a specific consumer product that you or your family have used for quite some time. This might be a branded smartphone (if you have used several versions over the years) or the court to consider in its deliberations. Locard’s exchange principle argues that during the commission of a crime Chemical Engineering Ecology aragraphs (meaning 25 sentences or more). Your assignment may be more than 5 paragraphs but not less. INSTRUCTIONS:  To access the FNU Online Library for journals and articles you can go the FNU library link here:  https://www.fnu.edu/library/ In order to n that draws upon the theoretical reading to explain and contextualize the design choices. Be sure to directly quote or paraphrase the reading ce to the vaccine. Your campaign must educate and inform the audience on the benefits but also create for safe and open dialogue. A key metric of your campaign will be the direct increase in numbers.  Key outcomes: The approach that you take must be clear Mechanical Engineering Organic chemistry Geometry nment Topic You will need to pick one topic for your project (5 pts) Literature search You will need to perform a literature search for your topic Geophysics you been involved with a company doing a redesign of business processes Communication on Customer Relations. Discuss how two-way communication on social media channels impacts businesses both positively and negatively. Provide any personal examples from your experience od pressure and hypertension via a community-wide intervention that targets the problem across the lifespan (i.e. includes all ages). Develop a community-wide intervention to reduce elevated blood pressure and hypertension in the State of Alabama that in in body of the report Conclusions References (8 References Minimum) *** Words count = 2000 words. *** In-Text Citations and References using Harvard style. *** In Task section I’ve chose (Economic issues in overseas contracting)" Electromagnetism w or quality improvement; it was just all part of good nursing care.  The goal for quality improvement is to monitor patient outcomes using statistics for comparison to standards of care for different diseases e a 1 to 2 slide Microsoft PowerPoint presentation on the different models of case management.  Include speaker notes... .....Describe three different models of case management. visual representations of information. They can include numbers SSAY ame workbook for all 3 milestones. You do not need to download a new copy for Milestones 2 or 3. When you submit Milestone 3 pages): Provide a description of an existing intervention in Canada making the appropriate buying decisions in an ethical and professional manner. Topic: Purchasing and Technology You read about blockchain ledger technology. Now do some additional research out on the Internet and share your URL with the rest of the class be aware of which features their competitors are opting to include so the product development teams can design similar or enhanced features to attract more of the market. The more unique low (The Top Health Industry Trends to Watch in 2015) to assist you with this discussion.         https://youtu.be/fRym_jyuBc0 Next year the $2.8 trillion U.S. healthcare industry will   finally begin to look and feel more like the rest of the business wo evidence-based primary care curriculum. Throughout your nurse practitioner program Vignette Understanding Gender Fluidity Providing Inclusive Quality Care Affirming Clinical Encounters Conclusion References Nurse Practitioner Knowledge Mechanics and word limit is unit as a guide only. The assessment may be re-attempted on two further occasions (maximum three attempts in total). All assessments must be resubmitted 3 days within receiving your unsatisfactory grade. You must clearly indicate “Re-su Trigonometry Article writing Other 5. June 29 After the components sending to the manufacturing house 1. In 1972 the Furman v. Georgia case resulted in a decision that would put action into motion. Furman was originally sentenced to death because of a murder he committed in Georgia but the court debated whether or not this was a violation of his 8th amend One of the first conflicts that would need to be investigated would be whether the human service professional followed the responsibility to client ethical standard.  While developing a relationship with client it is important to clarify that if danger or Ethical behavior is a critical topic in the workplace because the impact of it can make or break a business No matter which type of health care organization With a direct sale During the pandemic Computers are being used to monitor the spread of outbreaks in different areas of the world and with this record 3. Furman v. Georgia is a U.S Supreme Court case that resolves around the Eighth Amendments ban on cruel and unsual punishment in death penalty cases. The Furman v. Georgia case was based on Furman being convicted of murder in Georgia. Furman was caught i One major ethical conflict that may arise in my investigation is the Responsibility to Client in both Standard 3 and Standard 4 of the Ethical Standards for Human Service Professionals (2015).  Making sure we do not disclose information without consent ev 4. Identify two examples of real world problems that you have observed in your personal Summary & Evaluation: Reference & 188. Academic Search Ultimate Ethics We can mention at least one example of how the violation of ethical standards can be prevented. Many organizations promote ethical self-regulation by creating moral codes to help direct their business activities *DDB is used for the first three years For example The inbound logistics for William Instrument refer to purchase components from various electronic firms. During the purchase process William need to consider the quality and price of the components. In this case 4. A U.S. Supreme Court case known as Furman v. Georgia (1972) is a landmark case that involved Eighth Amendment’s ban of unusual and cruel punishment in death penalty cases (Furman v. Georgia (1972) With covid coming into place In my opinion with Not necessarily all home buyers are the same! When you choose to work with we buy ugly houses Baltimore & nationwide USA The ability to view ourselves from an unbiased perspective allows us to critically assess our personal strengths and weaknesses. This is an important step in the process of finding the right resources for our personal learning style. Ego and pride can be · By Day 1 of this week While you must form your answers to the questions below from our assigned reading material CliftonLarsonAllen LLP (2013) 5 The family dynamic is awkward at first since the most outgoing and straight forward person in the family in Linda Urien The most important benefit of my statistical analysis would be the accuracy with which I interpret the data. The greatest obstacle From a similar but larger point of view 4 In order to get the entire family to come back for another session I would suggest coming in on a day the restaurant is not open When seeking to identify a patient’s health condition After viewing the you tube videos on prayer Your paper must be at least two pages in length (not counting the title and reference pages) The word assimilate is negative to me. I believe everyone should learn about a country that they are going to live in. It doesnt mean that they have to believe that everything in America is better than where they came from. It means that they care enough Data collection Single Subject Chris is a social worker in a geriatric case management program located in a midsize Northeastern town. She has an MSW and is part of a team of case managers that likes to continuously improve on its practice. The team is currently using an I would start off with Linda on repeating her options for the child and going over what she is feeling with each option.  I would want to find out what she is afraid of.  I would avoid asking her any “why” questions because I want her to be in the here an Summarize the advantages and disadvantages of using an Internet site as means of collecting data for psychological research (Comp 2.1) 25.0\% Summarization of the advantages and disadvantages of using an Internet site as means of collecting data for psych Identify the type of research used in a chosen study Compose a 1 Optics effect relationship becomes more difficult—as the researcher cannot enact total control of another person even in an experimental environment. Social workers serve clients in highly complex real-world environments. Clients often implement recommended inte I think knowing more about you will allow you to be able to choose the right resources Be 4 pages in length soft MB-920 dumps review and documentation and high-quality listing pdf MB-920 braindumps also recommended and approved by Microsoft experts. The practical test g One thing you will need to do in college is learn how to find and use references. References support your ideas. College-level work must be supported by research. You are expected to do that for this paper. You will research Elaborate on any potential confounds or ethical concerns while participating in the psychological study 20.0\% Elaboration on any potential confounds or ethical concerns while participating in the psychological study is missing. Elaboration on any potenti 3 The first thing I would do in the family’s first session is develop a genogram of the family to get an idea of all the individuals who play a major role in Linda’s life. After establishing where each member is in relation to the family A Health in All Policies approach Note: The requirements outlined below correspond to the grading criteria in the scoring guide. At a minimum Chen Read Connecting Communities and Complexity: A Case Study in Creating the Conditions for Transformational Change Read Reflections on Cultural Humility Read A Basic Guide to ABCD Community Organizing Use the bolded black section and sub-section titles below to organize your paper. For each section Losinski forwarded the article on a priority basis to Mary Scott Losinksi wanted details on use of the ED at CGH. He asked the administrative resident