Fluent Essays - BIOSTAT 7

BIOSTAT 7

BIOSTAT 7 - Nursing

PLACE ORDER

1. The Test on statistical analysis techniques worth 200 points and will be graded using the designated rubric. Grading criteria include quality of content, appropriate citations, use of Standard English grammar, and overall organization and readability. 2. Create your assignment using a Microsoft Word application. The document should be saved in a .doc or .docx format. 3. There is no required length but should be specific enough to address all requirements. 4. The following test sections should be included in your Word document that you submit along with the corresponding SPSS output: · Descriptive statistics · Checking the reliability of a scale · Correlation · Partial correlation · Non-parametric tests · T-tests See below for assignment and see other attachments for sample data sets. Be sure to read each question well to determine which sample data sets to use. Question 1: Descriptive Statistics The first step in the analysis of any data file is to obtain descriptive statistics on each of your variables. These can be used to check for out-of-range cases, to explore the distribution of the scores, and to describe your sample in the Method section of a report. Use the instructions in Chapter 6 and Chapter 7 of the SPSS Survival Manual to answer the following questions concerning the variables included in the survey.sav data file. (a) What is the mean age of the sample? What is the age range of the sample (minimum and maximum values)? (b) What is the percentage of males and females in the sample? Did any of the sample fail to indicate their gender? (c) What percentage of the sample were smokers? (d) Inspect the distribution of scores on the Total Negative Affect scale. How normal is the distribution? Are there any cases that you would consider outliers? Question 2: Checking the Reliability of a Scale · If you use scales or standardized measures in your research (this is common in psychological research) it is important to assess the reliability (internal consistency) of the scores on the scale in your sample. The following exercise gives you some practice in this process. · Follow the procedure in Chapter 9 of the SPSS Survival Manual to assess the reliability of the following scales. You will need to refer to the codebook in the appendix to identify the items that make up each of the scales (survey.sav page 340). · (a) Optimism scale (op1 to op6) · (b) Perceived Control of Internal Stress scale (pc1 to pc18) · (c) Self-esteem scale (sest1 to sest10) · You will need to manipulate certain items prior to calculating the scale reliability. These items will require reversing of negatively worded items. Please refer to page 87-88 of Pallant 7th ed. The reason why these items need to be reversed is because some of the responses are worded positively while other items are negatively worded (inverse). · Read through steps 1-7. Once you recode items 2, 4, and 6, you will then be able to move on to calculating reliability (page 102-103). You will find on page 340 the items that require reversing under the Coding Instructions column. Question 3: Correlation Using the data file staffsurvey.sav follow the instructions in Chapter 11 to explore the relationship between the total satisfaction and age. Present the results in a brief report. Use the instructions in Chapter 11 to generate a full correlation matrix to check the intercorrelations among the following variables. age city service employment status Question 4: Partial Correlation Follow the procedures detailed in Chapter 12 of the SPSS Survival Manual to calculate the partial correlation between total satisfaction and city while controlling for the effects of age. Compare the zero order correlations with the partial correlation coefficients to see if controlling for age had any effect. Question 5: Non-Parametric Tests Using the depress.sav data file, choose which statistical test(s) you should use to compare each of the variables. Explain your thought process in choosing the appropriate statistical test. Question 6: T-Tests Using the sleep.sav data file, run the appropriate T-test(s) to determine how the means of each variable compare to the total variable (totsas). In the following section of the test, you will use the sleep.sav data file to compare the means of the following variables and compare them to the totas (Sleepiness and Associated Sensations Scale). Please review Pallant Appendix Part E for codebook pg. 339 1) Sex 2) Do you smoke? 3) Trouble falling asleep 4) Trouble staying asleep 5) Wake up during night? Levenes Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95\% Confidence Interval of the Difference Lower Upper sleepy & assoc sensations scale Equal variances assumed .511 .476 3.196 249 .002 4.214 1.319 1.617 6.811 Equal variances not assumed 3.237 238.202 .001 4.214 1.302 1.649 6.779 6 Descriptive statistics Once you are sure there are no errors in the data file (or at least no out-of-range values on any of the variables), you can begin the descriptive phase of your data analysis. Descriptive statistics are used to: describe the characteristics of your sample in the Method section of your report check your variables for any violation of the assumptions underlying the statistical techniques that you will use to address your research questions address specific research questions. The two procedures outlined in Chapter 5 for checking the data will also give you information for describing your sample in the Method section of your report. In studies involving human participants, it is useful to collect information on the number of people or cases in the sample, the number and percentage of males and females in the sample, the range and mean of ages, education level, and any other relevant background information. Prior to doing many of the statistical analyses (e.g. t-test, ANOVA, correlation), it is important to check that you are not violating any of the assumptions made by the individual tests. (These are covered in detail in Part Four and Part Five of this book.) Testing of assumptions usually involves obtaining descriptive statistics on your variables. These descriptive statistics include the mean, standard deviation, range of scores, skewness and kurtosis. In IBM SPSS Statistics there are several ways to obtain descriptive statistics. If all you want is a quick summary of the characteristics of the variables in your data file, you can use Codebook. To follow along with the examples in this chapter, open the survey.sav file. Procedure for obtaining Codebook 1. Click on Analyze, go to Reports and choose Codebook. 2. Select the variables you want (e.g. sex, age) and move them into the Codebook Variables box. 3. Click on the Output tab and untick (by clicking on the box with a tick) all the Options except Label, Value Labels and Missing Values. 4. Click on the Statistics tab and make sure that all the options in both sections are ticked. 5. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: CODEBOOK sex [n] age [s] /VARINFO LABEL VALUELABELS MISSING /OPTIONS VARORDER=VARLIST SORT=ASCENDING MAXCATS=200 /STATISTICS COUNT PERCENT MEAN STDDEV QUARTILES. The output is shown below. sex Value Count Percent Standard Attributes Label sex Valid Values 1 MALES 185 42.1\% 2 FEMALES 254 57.9\% age Value Standard Attributes Label <none> N Valid 439 Missing 0 Central Tendency and Dispersion Mean 37.44 Standard Deviation 13.202 Percentile 25 26.00 Percentile 50 36.00 Percentile 75 47.00 The output from the procedure shown above gives you a quick summary of the cases in your data file. If you need more detailed information this can be obtained using the Frequencies, Descriptives or Explore procedures. These are all procedures listed under the Analyze, Descriptive Statistics drop-down menu. There are, however, different procedures depending on whether you have a categorical or continuous variable. Some of the statistics (e.g. mean, standard deviation) are not appropriate if you have a categorical variable. The different approaches to be used with categorical and continuous variables are presented in the following two sections. If you would like to follow along with the examples in this chapter, open the survey.sav file. CATEGORICAL VARIABLES To obtain descriptive statistics for categorical variables, you should use Frequencies. This will tell you how many people gave each response (e.g. how many males, how many females). It doesn’t make any sense asking for means, standard deviations and so on for categorical variables, such as sex or marital status. Procedure for obtaining descriptive statistics for categorical variables 1. From the menu click on Analyze, then click on Descriptive Statistics, then Frequencies. 2. Choose and highlight the categorical variables you are interested in (e.g. sex). Move these into the Variables box. 3. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: FREQUENCIES VARIABLES=sex /ORDER= ANALYSIS. The output is shown below. sex sex Frequency Percent Valid Percent Cumulative Percent Valid 1 MALES 185 42.1 42.1 42.1 2 FEMALES 254 57.9 57.9 100.0 Total 439 100.0 100.0 Interpretation of output from Frequencies From the output shown above, we know that there are 185 males (42.1\%) and 254 females (57.9\%) in the sample, giving a total of 439 respondents. It is important to take note of the number of respondents you have in different subgroups in your sample. If you have very unequal group sizes, particularly if the group sizes are small, it may be inappropriate to run some of the parametric analyses (e.g. ANOVA). CONTINUOUS VARIABLES For continuous variables (e.g. age) it is easier to use Descriptives, which will provide you with the basic summary statistics such as mean, median and standard deviation. In some disciplines (e.g. medicine) you may be asked to provide a confidence interval around the mean. If you need this you should use Explore (this is explained later in this chapter). You can collect the descriptive information on all your continuous variables in one go using Descriptives; it is not necessary to do it variable by variable. Just transfer all the variables you are interested in into the box labelled Variables. If you have a lot of variables, however, your output will be extremely long. Sometimes, it is easier to do them in chunks and tick off each group of variables as you do them. Procedure for obtaining descriptive statistics for continuous variables 1. From the menu click on Analyze, then select Descriptive Statistics, then Descriptives. 2. Click on all the continuous variables that you wish to obtain descriptive statistics for. Click on the arrow button to move them into the Variables box (e.g. age, Total perceived stress: tpstress). 3. Click on the Options button. Make sure mean, standard deviation, minimum, maximum are ticked and then click on skewness, kurtosis. 4. Click on Continue, and then OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: DESCRIPTIVES VARIABLES=age tpstress /STATISTICS=MEAN STDDEV MIN MAX KURTOSIS SKEWNESS . The output generated from this procedure is shown below. Descriptive Statistics Skewness Kurtosis N Statistic Minimum Statistic Maximum Statistic Mean Statistic Std. Deviation Statistic Statistic Std. Error Statistic Std. Error age 439 18 82 37.44 13.202 .606 .117 -.203 .233 tpstress Total perceived stress 433 12 46 26.73 5.848 .245 .117 .182 .234 Valid N (listwise) 433 Interpretation of output from Descriptives In the output presented above, the information we requested for each of the variables is summarised. For the variable age we have information from 439 respondents, ranging in age from 18 to 82 years, with a mean of 37.44 and standard deviation of 13.202. This information may be needed for the Method section of a report to describe the characteristics of the sample. When reported in a thesis or journal article these values are usually rounded to two decimal places. Descriptives also provides some information concerning the distribution of scores on continuous variables (skewness and kurtosis). This information may be needed if these variables are to be used in parametric statistical techniques (e.g. t-tests, analysis of variance). The Skewness value provides an indication of the symmetry of the distribution. Kurtosis, on the other hand, provides information about the ‘peakedness’ of the distribution. If the distribution is perfectly normal, you would obtain a skewness and kurtosis value of 0 (rather an uncommon occurrence in the social sciences). Positive skewness values suggest that scores are clustered to the left at the low values. Negative skewness values indicate a clustering of scores at the high end (right-hand side of a graph). Positive kurtosis values indicate that the distribution is rather peaked (clustered in the centre), with long, thin tails. Kurtosis values below 0 indicate a distribution that is relatively flat (too many cases in the extremes). With reasonably large samples, skewness will not ‘make a substantive difference in the analysis’ (Tabachnick & Fidell 2013, p. 80). Kurtosis can result in an underestimate of the variance, but this risk is also reduced with a large sample (200+ cases; see Tabachnick & Fidell 2013, p. 80). While there are tests that you can use to evaluate skewness and kurtosis values, these are too sensitive with large samples. Tabachnick and Fidell (2013, p. 81) recommend inspecting the shape of the distribution (e.g. using a histogram). The procedure for further assessing the normality of the distribution of scores is provided later in this chapter. When you have skewed data you should report non-parametric descriptive statistics which do not assume a normal distribution (discussed in more detail later in this chapter). The mean (a parametric statistic) can be distorted when you have very skewed data, and it is generally recommended that you present the median instead (a non-parametric statistic). The median is the value that cuts the distribution of scores in half—50 per cent fall above and below this point. Whenever you present a median value you should also provide an indication of the spread, or dispersion, of your scores. The non-parametric statistic appropriate here is the interquartile range (IQR), which represents the 25th percentile and the 75th percentile values. The easiest way to get these values is to use the Codebook procedure outlined earlier in this chapter. These results show as ‘Percentile 25’, ‘Percentile 50’ (this is actually the median) and ‘Percentile 75’. You can also obtain the same values from the Frequencies procedure by requesting Quartiles under the Statistics button. Using the example of age presented earlier in this chapter, you would present the information shown in the output in a thesis or article as Md = 36 (IQR: 26, 47). MISSING DATA When you are doing research, particularly with human beings, it is rare that you will obtain complete data from every case. It is important that you inspect your data file for missing data. Run Descriptives and find out what percentage of values is missing for each of your variables. If you find a variable with a lot of unexpected missing data, you need to ask yourself why. You should also consider whether your missing values occur randomly, or whether there is some systematic pattern (e.g. lots of women over 30 years of age failing to answer the question about their age!). You also need to consider how you will deal with missing values when you come to do your statistical analyses. The Options button in many of the IBM SPSS Statistics statistical procedures offers you choices for how you want to deal with missing data. It is important that you choose carefully, as it can have dramatic effects on your results. This is particularly important if you are including a list of variables and repeating the same analysis for all variables (e.g. correlations among a group of variables, t-tests for a series of dependent variables). The Exclude cases listwise option will include cases in the analysis only if they have full data on all of the variables listed in your Variables box for that case. A case will be totally excluded from all the analyses if it is missing even one piece of information. This can severely, and unnecessarily, limit your sample size. The Exclude cases pairwise option excludes the case (person) only if they are missing the data required for the specific analysis. They will still be included in any of the analyses for which they have the necessary information. The Replace with mean option, which is available in some IBM SPSS Statistics statistical procedures (e.g. multiple regression), calculates the mean value for the variable and gives every missing case this value. This option should never be used, as it can severely distort the results of your analysis, particularly if you have a lot of missing values. Always press the Options button for any statistical procedure you conduct, and check which of these options is ticked (the default option varies across procedures). I would suggest that you use pairwise exclusion of missing data, unless you have a pressing reason to do otherwise. The only situation where you might need to use listwise exclusion is when you want to refer only to a subset of cases that provided a full set of results. For more experienced researchers, there are more advanced options available in IBM SPSS Statistics for estimating missing values (e.g. imputation). These are included in the Missing Value Analysis procedure. This can also be used to detect patterns within missing data. I recommend you read Chapter 4 in Tabachnick and Fidell (2013) for more detailed coverage of missing data. ASSESSING NORMALITY Many of the statistical techniques presented in Part Four and Part Five of this book assume that the distribution of scores on the dependent variable is normal. Normal is used to describe a symmetrical, bell-shaped curve, which has the greatest frequency of scores in the middle with smaller frequencies towards the extremes. Normality can be assessed to some extent by obtaining skewness and kurtosis values (as described earlier in this chapter). However, other techniques are also available in IBM SPSS Statistics using the Explore option of the Descriptive Statistics menu. This procedure is detailed below. In this example, I assess the normality of the distribution of scores for the Total perceived stress variable. You also have the option of doing this separately for different groups in your sample by specifying an additional categorical variable (e.g. sex) in the Factor List option that is available in the Explore dialogue box. Procedure for assessing normality using Explore 1. From the menu at the top of the screen click on Analyze, then select Descriptive Statistics, then Explore. 2. Click on all the variables you are interested in (e.g. Total perceived stress: tpstress). Click on the arrow button to move them into the Dependent List box. 3. In the Label Cases by box, put your ID variable. 4. In the Display section, make sure that Both is selected. 5. Click on the Statistics button and click on Descriptives and Outliers. Click on Continue. 6. Click on the Plots button. Under Descriptive, click on Histogram to select it. Click on Stem-and-leaf to unselect it. Click on Normality plots with tests. Click on Continue. 7. Click on the Options button. In the Missing Values section, click on Exclude cases pairwise. Click on Continue and then OK (or on Paste to save to Syntax Editor). The syntax generated is: EXAMINE VARIABLES=tpstress /ID=id /PLOT BOXPLOT HISTOGRAM NPPLOT /COMPARE GROUPS /STATISTICS DESCRIPTIVES EXTREME /CINTERVAL 95 /MISSING PAIRWISE /NOTOTAL. Selected output generated from this procedure is shown below. Descriptives Statistic Std. Error tpstress Total perceived stress Mean 26.73 .281 95\% Confidence Interval for Mean Lower Bound 26.18 Upper Bound 27.28 5\% Trimmed Mean 26.64 Median 26.00 Variance 34.194 Std. Deviation 5.848 Minimum 12 Maximum 46 Range 34 Interquartile Range 8 Skewness .245 .117 Kurtosis .182 .234 Extreme Values Case Number id Value tpstress Total perceived stress Highest 1 7 24 46 2 262 157 44 3 216 61 43 4 190 6 42 5 257 144 42a Lowest 1 366 404 12 2 189 5 12 3 247 127 13 4 244 119 13 5 98 301 13 a. Only a partial list of cases with the value 42 are shown in the table of upper extremes. Tests of Normality Kolmogorov-Smirnova Shapiro-Wilk Statistic df Sig. Statistic df Sig. tpstress Total perceived stress .069 433 .000 .992 433 .021 a. Lilliefors Significance Correction Interpretation of output from Explore Quite a lot of information is generated as part of this output. I take you through it step by step below. In the table labelled Descriptives, you are provided with descriptive statistics and other information concerning your variables. If you specified a grouping variable in the Factor List, this information will be provided separately for each group, rather than for the sample as a whole. Some of this information you may recognise (mean, median, standard deviation, minimum, maximum etc.). This output also shows the 95 per cent confidence interval surrounding the mean. We can be 95 per cent confident that the true mean value in the population falls within this range. One statistic you may not know is the 5\% Trimmed Mean. To obtain this value, IBM SPSS Statistics removes the top and bottom 5 per cent of your cases and calculates a new mean value. If you compare the original mean (26.73) and this new trimmed mean (26.64), you can see whether your extreme scores are having a strong influence on the mean. If these two mean values are very different, you may need to investigate these data points further. The ID values of the most extreme cases are shown in the Extreme Values table. Skewness and kurtosis values are also provided as part of this output, giving information about the distribution of scores for the two groups (see discussion of the meaning of these values earlier in this chapter). In the table labelled Tests of Normality, you are given the results of the Kolmogorov-Smirnov statistic. This assesses the normality of the distribution of scores. A non-significant result (Sig. value of more than .05) indicates normality. In this case, the Sig. value is .000, suggesting violation of the assumption of normality. This is quite common in larger samples. Tabachnick and Fidell (2013) recommend using the histograms instead to judge normality. The actual shape of the distribution for each group can be seen in the Histograms. In this example, scores appear to be reasonably normally distributed. This is also supported by an inspection of the normal probability plots (labelled Normal Q-Q Plot). In this plot, the observed value for each score is plotted against the expected value from the normal distribution. A reasonably straight line suggests a normal distribution. The Detrended Normal Q-Q Plots are obtained by plotting the deviation of the scores from the straight line. There should be no real clustering of points, with most collecting around the zero line. The final plot that is provided in the output is a boxplot of the distribution of scores for the two groups. The rectangle represents 50 per cent of the cases, with the whiskers (the lines protruding from the box) going out to the smallest and largest values. Sometimes, you will see additional circles outside this range—these are classified by IBM SPSS Statistics as outliers. The line inside the rectangle is the median value. Boxplots are discussed further in the next section, on detecting outliers. In the example given above, the distribution of scores was reasonably normal. Often, this is not the case. Many scales and measures used in the social sciences have scores that are skewed, either positively or negatively. This does not necessarily indicate a problem with the scale but rather reflects the underlying nature of the construct being measured. Life satisfaction measures, for example, are often negatively skewed, with most people being reasonably happy with their life. Clinical measures of anxiety or depression are often positively skewed in the general population, with most people recording relatively few symptoms of these disorders. Some authors in this area recommend that with skewed data the scores be transformed statistically. This issue is discussed further in Chapter 8. CHECKING FOR OUTLIERS Many of the statistical techniques covered in this book are sensitive to outliers (cases with values well above or well below the majority of other cases). The techniques described in the previous section can also be used to check for outliers. Inspect the Histogram. Check the tails of the distribution. Are there data points sitting on their own, out on the extremes? If so, these are potential outliers. If the scores drop away in a reasonably even slope, there is probably not too much to worry about. Inspect the Boxplot. Any scores that IBM SPSS Statistics considers are outliers appear as little circles with a number attached (this is the ID number of the case). IBM SPSS Statistics defines points as outliers if they extend more than 1.5 box-lengths from the edge of the box. Extreme points (indicated with an asterisk) are those that extend more than 3 box-lengths from the edge of the box. In the example above there are no extreme points, but there are two outliers: ID numbers 24 and 157. If you find points like this, you need to decide what to do with them. It is important to check that an outlier’s score is genuine and not an error. Check the score and see whether it is within the range of possible scores for that variable. Check back with the questionnaire or data record to see if there was a mistake in entering the data. If it is an error, correct it, and repeat the boxplot. If it turns out to be a genuine score, you then need to decide what you will do about it. Some statistics writers suggest removing all extreme outliers from the data file. Others suggest changing the value to a less extreme value, thus including the case in the analysis but not allowing the score to distort the statistics (for more advice on this, see Chapter 4 in Tabachnick & Fidell 2013). The information in the Descriptives table can give you an indication of how much of a problem these outlying cases are likely to be. The value you are interested in is the 5\% Trimmed Mean. If the trimmed mean and mean values are very different, you may need to investigate these data points further. In the example above, the two mean values (26.73 and 26.64) are very similar. Given this, and the fact that the values are not too different from the remaining distribution, I would retain these cases in the data file. If you wish to change or remove values in your file, go to the Data Editor window, sort the data file in descending order to find the cases with the highest values or in ascending order if you are concerned about cases with very low values. The cases you need to investigate in more detail are then at the top of the data file. Move across to the column representing that variable and modify or delete the value of concern. Always record changes to your data file in a log book. ADDITIONAL EXERCISES Business Data file: staffsurvey.sav. See Appendix for details of the data file. 1. Follow the procedures covered in this chapter to generate appropriate descriptive statistics to answer the following questions. (a) What percentage of the staff in this organisation are permanent employees? (Use the variable employstatus.) (b) What is the average length of service for staff in the organisation? (Use the variable service.) (c) What percentage of respondents would recommend the organisation to others as a good place to work? (Use the variable recommend.) 2. Assess the distribution of scores on the Total Staff Satisfaction Scale (totsatis) for employees who are permanent versus casual (employstatus). (a) Are there any outliers on this scale that you would be concerned about? (b) Are scores normally distributed for each group? Health Data file: sleep.sav. See Appendix for details of the data file. 1. Follow the procedures covered in this chapter to generate appropriate descriptive statistics to answer the following questions. (a) What percentage of respondents are female (gender)? (b) What is the average age of the sample? (c) What percentage of the sample indicated that they had a problem with their sleep (probsleeprec)? (d) What is the median number of hours sleep per weeknight (hourweeknight)? 2. Assess the distribution of scores on the Sleepiness and Associated Sensations Scale (totSAS) for people who feel that they do/don’t have a sleep problem (probsleeprec). (a) Are there any outliers on this scale that you would be concerned about? (b) Are scores normally distributed for each group? 7 Using graphs to describe and explore the data While the numerical values obtained in Chapter 6 provide useful information concerning your sample and your variables, some aspects are better explored visually. IBM SPSS Statistics provides a variety of graphs (also referred to as charts). In this chapter, I cover the basic procedures to obtain histograms, bar graphs, line graphs, scatterplots and boxplots. In IBM SPSS Statistics there are different ways of generating graphs, using the Graph menu option. These include Chart Builder, Graphboard Template Chooser and Legacy Dialogs. In this chapter I demonstrate the graphs using Chart Builder. Spend some time playing with each of the different graphs and exploring their possibilities. In this chapter only a brief overview is given to get you started. To illustrate the various graphs I use the survey.sav data file, which is included on the website accompanying this book (see p. ix and the Appendix for details). If you wish to follow along with the procedures described in this chapter, you will need to start IBM SPSS Statistics and open the file labelled survey.sav. At the end of this chapter, instructions are also given on how to edit a graph to better suit your needs. This may be useful if you intend to use the graph in your research paper or thesis. The procedure for importing graphs directly into Microsoft Word is also detailed. For additional hints and tips on presenting graphs I suggest you see Nicol and Pexman (2010a). Before you begin any of the graphs procedures it is important that you have defined the measurement properties of each of your variables in the Data Editor window (see Chapter 4, in the Defining the Variables section). Each variable needs to be correctly identified as Nominal (categories involving no order), Ordinal (categories which are ordered), and Scale (continuous with lots of values). HISTOGRAMS Histograms are used to display the distribution of a single continuous variable (e.g. age, perceived stress scores). Procedure for creating a histogram 1. From the menu click on Graphs, then select Chart Builder. Click OK. 2. To choose the type of graph that you want, click on the Gallery tab, and choose Histogram. 3. Click on the first image shown (Simple Histogram) and drag it up to the Chart Preview area, holding your left mouse button down. 4. Choose your continuous variable from the list of Variables (e.g. tpstress) and drag it across to the area on the Chart preview screen labelled X-Axis holding your left mouse button down. This will only work if you have identified your variable as scale in the Data Editor window (the icon next to the variable should be a ruler). 5. If you would like to generate separate graphs for different groups (e.g. males/females) you can click on the Groups/Point ID tab and choose the Column Panels variable option. This will produce separate graphs next to each other; if you would prefer them to be on top of one another choose the Rows panel variable. 6. Choose your categorical grouping variable (e.g. sex) and drag it across to the section labelled Panel in the Chart Preview area. 7. Click on the Options tab on the right-hand side of the screen and select Exclude variable-by-variable. 8. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: GRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=tpstress sex MISSING=VARIABLEWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: tpstress=col(source(s), name(“tpstress”)) DATA: sex=col(source(s), name(“sex”), unit.category()) GUIDE: axis(dim(1), label(“Total perceived stress”)) GUIDE: axis(dim(2), label(“Frequency”)) GUIDE: axis(dim(3), label(“sex”), opposite()) GUIDE: text.title(label(“Simple Histogram of Total perceived stress by sex”)) SCALE: cat(dim(3), include(“1”, “2”)) ELEMENT: interval(position(summary.count(bin.rect(tpstress*1*sex))), shape.interior(shape.square)) END GPL. The output generated from this procedure is shown below. Interpretation of output from Histogram Inspection of the shape of the histogram provides information about the distribution of scores on the continuous variable. Many of the statistics discussed in this manual assume that the scores on each of the variables are normally distributed (i.e. follow the shape of the normal curve). In this example the scores are reasonably normally distributed, with most scores occurring in the centre and the rest tapering out towards the extremes. It is quite common in the social sciences, however, to find that variables are not normally distributed. Scores may be skewed to the left or right or, alternatively, arranged in a rectangular shape. For further discussion of the assessment of the normality of variables see Chapter 6. BAR GRAPHS Bar graphs can be simple or very complex, depending on how many variables you wish to include. A bar graph can show the number of cases in specific categories, or it can show the score on a continuous variable for different categories. Basically, you need two main variables—one categorical and one continuous. You can also break this down further with another categorical variable if you wish. Procedure for creating a bar graph 1. From the menu at the top of the screen, click on Graphs, then select Chart Builder and click OK. Click on the Gallery tab and select Bar from the bottom left-hand menu. Click on the second graph displayed (Clustered Bar). Holding your left mouse button down, drag this graph to the Chart Preview area. 2. Select the Element Properties tab from the right-hand side of the screen. Click on Display error bars. 3. From the list of Variables drag one of your grouping variables (e.g. sex) to the section on the Chart Preview screen labelled Cluster on X: set colour. Click and drag your other categorical variable (e.g. agegp3) to the section labelled X-Axis at the bottom of the graph. Click and drag your continuous variable (Total Perceived Stress: tpstress) to the remaining blue section, the Y-axis. 4. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=agegp3 MEANCI(tpstress, 95)[name=”MEAN_tpstress” LOW=”MEAN_tpstress_LOW” HIGH=”MEAN_tpstress_HIGH”] sex MISSING=VARIABLEWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: agegp3=col(source(s), name(“agegp3”), unit.category()) DATA: MEAN_tpstress=col(source(s), name(“MEAN_tpstress”)) DATA: sex=col(source(s), name(“sex”), unit.category()) DATA: LOW=col(source(s), name(“MEAN_tpstress_LOW”)) DATA: HIGH=col(source(s), name(“MEAN_tpstress_HIGH”)) COORD: rect(dim(1,2), cluster(3,0)) GUIDE: axis(dim(3), label(“age 3 groups”)) GUIDE: axis(dim(2), label(“Mean Total perceived stress”)) GUIDE: legend(aesthetic(aesthetic.color.interior), label(“sex”)) GUIDE: text.title(label(“Clustered Bar Mean of Total perceived stress by age 3 groups by sex”)) GUIDE: text.footnote(label(“Error Bars: 95\% CI”)) SCALE: cat(dim(3), include(“1”, “2”, “3”)) SCALE: linear(dim(2), include(0)) SCALE: cat(aesthetic(aesthetic.color.interior), include(“1”, “2”), aestheticMissing(color.black)) SCALE: cat(dim(1), include(“1”, “2”)) ELEMENT: interval(position(sex*MEAN_tpstress*agegp3), color.interior(sex), shape.interior(shape.square)) ELEMENT: interval(position(region.spread.range(sex*(LOW+HIGH)*agegp3)), shape.interior(shape.ibeam)) END GPL. The output generated from this procedure is shown below. Interpretation of output from Bar Graph The output from this procedure gives you a quick summary of the distribution of scores for the groups that you have requested (in this case, males and females from the different age groups). The graph presented above suggests that females had higher perceived stress scores than males, and that this difference was more pronounced among the two older age groups. Among the 18 to 29 age group, the difference in scores between males and females is very small. Care should be taken when interpreting the output from Bar Graph. You should always check the scale used on the Y (vertical) axis. Sometimes, what appears to be a dramatic difference is really only a few scale points and, therefore, probably of little importance. This is clearly evident in the bar graph displayed above. You will see that the difference between the groups is quite small when you consider the scale used to display the graph. The difference between the smallest score (males aged 45 or more) and the highest score (females aged 18 to 29) is only about 3 points. To assess the significance of any difference you might find between groups, it is necessary to conduct further statistical analyses. In this case, a two-way between-groups analysis of variance (see Chapter 19) would be conducted to find out if the differences are statistically significant. LINE GRAPHS A line graph allows you to inspect the mean scores of a continuous variable across different values of a categorical variable (e.g. age groups: 18–29, 30–44, 45+). They are also useful for graphically exploring the results of a one- or two-way analysis of variance. Line graphs are provided as an optional extra in the output of analysis of variance (see Chapters 18 and 19). Procedure for creating a line graph 1. From the menu at the top of the screen, select Graphs, then Chart Builder, and then OK. 2. Click on the Gallery tab and select Line from the bottom left-hand list. Click on the second graph shown (Multiple Line) and drag this to the Chart preview area holding your left mouse button down. 3. From the Variables list drag your continuous variable (Total perceived stress: tpstress) to the Y-axis. Drag one of your categorical variables (e.g. sex) to the section labelled Set color and drag the other categorical variable (agegp5) to the X-Axis. 4. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: GRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=agegp5 MEAN(tpstress)[name=”MEAN_tpstress”] sex MISSING=VARIABLEWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: agegp5=col(source(s), name(“agegp5”), unit.category()) DATA: MEAN_tpstress=col(source(s), name(“MEAN_tpstress”)) DATA: sex=col(source(s), name(“sex”), unit.category()) GUIDE: axis(dim(1), label(“age 5 groups”)) GUIDE: axis(dim(2), label(“Mean Total perceived stress”)) GUIDE: legend(aesthetic(aesthetic.color.interior), label(“sex”)) GUIDE: text.title(label(“Multiple Line Mean of Total perceived stress by age 5 groups by sex”)) GUIDE: text.footnote(label(“Error Bars: 95\% CI”)) SCALE: cat(dim(1), include(“1”, “2”, “3”, “4”, “5”)) SCALE: linear(dim(2), include(0)) SCALE: cat(aesthetic(aesthetic.color.interior), include(“1”, “2”), aestheticMissing(color.black)) ELEMENT: line(position(agegp5*MEAN_tpstress), color.interior(sex), missing.wings()) END GPL. The output from the procedure is shown below. For display purposes I have modified the output graph so that the line for females is shown as dashed, and I have also reduced the scale of the Y-axis to start at a score of 24. The procedure for modifying graphs is provided later in this chapter. Interpretation of output from Line Graph First, you can examine the impact of age on perceived stress for each of the sexes separately. Younger males appear to have higher levels of perceived stress than either middle-aged or older males. For females, the difference across the age groups is not quite so pronounced. The older females are only slightly less stressed than the younger group. You can also consider the differences between males and females. Overall, males appear to have lower levels of perceived stress than females. Although the difference for the younger group is only small, there appears to be a discrepancy for the older age groups. Whether or not these differences reach statistical significance can be determined only by performing a two-way analysis of variance (see Chapter 19). The results presented above suggest that, to understand the impact of age on perceived stress, you must consider the respondents’ gender. This sort of relationship is referred to as an ‘interaction effect’. While the use of a line graph does not tell you whether this relationship is statistically significant, it certainly gives you a lot of information and raises a lot of additional questions. Sometimes, in interpreting the output, it is useful to consider other research questions. In this case, the results suggest that it may be worthwhile to explore in more depth the relationship between age and perceived stress for the two groups (males and females) separately, rather than assuming that the impact of age is similar for both groups. SCATTERPLOTS Scatterplots are typically used to explore the relationship between two continuous variables (e.g. age and self-esteem). It is a good idea to generate a scatterplot before calculating correlations (see Chapter 11). The scatterplot will give you an indication of whether your variables are related in a linear (straight-line) or curvilinear fashion. Only linear relationships are suitable for the correlation analyses described in this book. The scatterplot will also indicate whether your variables are positively related (high scores on one variable are associated with high scores on the other) or negatively related (high scores on one are associated with low scores on the other). For positive correlations, the points form a line pointing upwards to the right (i.e. they start low on the left-hand side and move higher on the right). For negative correlations, the line starts high on the left and moves down on the right (see an example of this in the output below). The scatterplot also provides a general indication of the strength of the relationship between your two variables. If the relationship is weak the points will be all over the place, in a blob-type arrangement. For a strong relationship the points will form a vague cigar shape, with a definite clumping of scores around an imaginary straight line. In the example that follows, I request a scatterplot of scores on two of the scales in the survey: the Total perceived stress and the Total Perceived Control of Internal States Scale (PCOISS). I ask for two groups in my sample (males and females) to be represented separately on the one scatterplot (using different symbols). This not only provides me with information concerning my sample as a whole but also gives additional information on the distribution of scores for males and females. Procedure for creating a scatterplot 1. From the menu at the top of the screen, click on Graphs, then Chart Builder, and then OK. 2. Click on the Gallery tab and select Scatter/Dot. Click on the third graph (Grouped Scatter) and drag this to the Chart Preview area by holding your left mouse button down. 3. Click and drag your continuous, independent variable (Total PCOISS: tpcoiss) to the X-Axis, and click and drag your dependent variable (Total perceived stress: tpstress) to the Y-Axis. Both variables need to be nominated as Scale variables. If you want to show groups (e.g. males, females) separately choose your categorical grouping variable (e.g. sex) and drag to the Set Colour box. 4. Choose the Groups/Point ID tab. Tick Point ID label. Click on the ID variable and drag to the Point ID box on the graph. 5. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: GRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=tpcoiss tpstress sex MISSING=VARIABLEWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE /FITLINE TOTAL=NO SUBGROUP=NO. BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: tpcoiss=col(source(s), name(“tpcoiss”)) DATA: tpstress=col(source(s), name(“tpstress”)) DATA: sex=col(source(s), name(“sex”), unit.category()) GUIDE: axis(dim(1), label(“Total PCOISS”)) GUIDE: axis(dim(2), label(“Total perceived stress”)) GUIDE: legend(aesthetic(aesthetic.color.interior), label(“sex”)) UIDE: text.title(label(“Grouped Scatter of Total perceived stress by Total PCOISS by sex”)) SCALE: cat(aesthetic(aesthetic.color.interior), include(“1”, “2”), aestheticMissing(color.black)) ELEMENT: point(position(tpcoiss*tpstress), color.interior(sex)) END GPL. The output generated from this procedure, modified slightly for display purposes, is shown below. Instructions for modifying graphs are provided later in this chapter. Interpretation of output from Scatterplot From the output above, there appears to be a moderate negative correlation between the two variables (Perceived Stress and PCOISS). Respondents with high levels of perceived control (shown on the X, or horizontal, axis) experience lower levels of perceived stress (shown on the Y, or vertical, axis). On the other hand, people with low levels of perceived control have much greater perceived stress. Remember, the scatterplot does not give you definitive answers; you need to follow it up with the calculation of the appropriate statistic. There is no indication of a curvilinear relationship, so it would be appropriate to calculate a Pearson product-moment correlation for these two variables (see Chapter 11) if the distributions are roughly normal (check the histograms for these two variables). In the example above, I explored the relationship between only two variables. It is also possible to generate a matrix of scatterplots between a whole group of variables. This is useful as preliminary assumption testing for analyses such as MANOVA. Procedure to generate a matrix of scatterplots 1. From the menu at the top of the screen, click on Graphs, then Chart Builder, then OK. 2. Click on the Gallery tab and choose Scatter/Dot and then select the eighth option (Scatterplot Matrix). Drag this to the Chart Preview area holding your left mouse button down. 3. From the Variables list choose the first of the continuous variables that you wish to display (e.g. tposaff) and drag this to the Scattermatrix box. Choose and drag each of the other variables in turn (tnegaff, tpstress). 4. Click on the Options button on the top right of the screen and choose how you would like to deal with missing data. In this example I have chosen Exclude variable-by-variable to maximise the use of data. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=tposaff tnegaff tpstress MISSING=VARIABLEWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE /FITLINE TOTAL=NO. BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: tposaff=col(source(s), name(“tposaff”)) DATA: tnegaff=col(source(s), name(“tnegaff”)) DATA: tpstress=col(source(s), name(“tpstress”)) GUIDE: axis(dim(1.1), ticks(null())) GUIDE: axis(dim(2.1), ticks(null())) GUIDE: axis(dim(1), gap(0px)) GUIDE: axis(dim(2), gap(0px)) GUIDE: text.title(label(“Scatterplot Matrix Total positive affect,Total negative affect,Total “, “perceived stress”)) TRANS: tposaff_label = eval(“Total positive affect”) TRANS: tnegaff_label = eval(“Total negative affect”) TRANS: tpstress_label = eval(“Total perceived stress”) ELEMENT: point(position((tposaff/tposaff_label+tnegaff/tnegaff_label+tpstress/tpstress_label)* (tposaff/tposaff_label+tnegaff/tnegaff_label+tpstress/tpstress_label))) END GPL. The output generated from this procedure is shown below. BOXPLOTS Boxplots are useful when you wish to compare the distribution of scores on variables. You can use them to explore the distribution of one continuous variable for the whole sample or, alternatively, you can ask for scores to be broken down for different groups. In the example below, I explore the distribution of scores on the Positive Affect Scale for males and females. Procedure for creating a boxplot 1. From the menu at the top of the screen, click on Graphs, then select Chart Builder, and click OK. 2. Click on the Gallery tab and choose Boxplot. Click on the first option (Simple Boxplot) and drag it up to the Chart Preview area, holding your left mouse button down. 3. From the Variables box choose your categorical variable (e.g. sex) and drag it to the X-axis box on the Chart Preview area. Drag your continuous variable (Total Positive Affect: tposaff) to the Y-axis. 4. Click on the Groups/Point ID tab and select Point ID label. 5. Select the ID variable from the list and drag it to the Point ID box on the graph. 6. Click on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=sex tposaff id MISSING=VARIABLEWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: sex=col(source(s), name(“sex”), unit.category()) DATA: tposaff=col(source(s), name(“tposaff”)) DATA: id=col(source(s), name(“id”)) GUIDE: axis(dim(1), label(“sex”)) GUIDE: axis(dim(2), label(“Total positive affect”)) GUIDE: text.title(label(“Simple Boxplot of Total positive affect by sex”)) SCALE: cat(dim(1), include(“1”, “2”)) SCALE: linear(dim(2), include(0)) ELEMENT: schema(position(bin.quantile.letter(sex*tposaff)), label(id)) END GPL. The output generated from this procedure is shown as follows. Interpretation of output from Boxplot The output from Boxplot gives you a lot of information about the distribution of your continuous variable and the possible influence of your other, categorical, variable (and cluster variable if used). Each distribution of scores is represented by a box and protruding lines (called whiskers). The length of the box is the variable’s interquartile range and contains 50 per cent of cases. The line across the inside of the box represents the median value. The whiskers protruding from the box go out to the variable’s smallest and largest values. Any scores that IBM SPSS Statistics considers to be outliers appear as little circles with a number attached (this is the ID number of the case). Outliers are cases with scores that are quite different from the remainder of the sample, either much higher or much lower. IBM SPSS Statistics defines points as outliers if they extend more than 1.5 box-lengths from the edge of the box. Extreme points (indicated with an asterisk) are those that extend more than 3 box-lengths from the edge of the box. For more information on outliers, see Chapter 6. In the example above, there are several outliers at the low values for Positive Affect for both males and females. In addition to providing information on outliers, a boxplot allows you to inspect the pattern of scores for your various groups. It provides an indication of the variability in scores within each group and allows a visual inspection of the differences between groups. In the example presented above, the distribution of scores on Positive Affect for males and females is very similar. EDITING A GRAPH Sometimes, modifications need to be made to the titles, labels, markers and so on of a graph before you can print it or use it in your report. I have edited some of the graphs displayed in this chapter to make them clearer (e.g. changing the patterns in the bar graph, thickening the lines used in the line graph). To edit a chart or graph, you need to open the Chart Editor window. To do this, place your cursor on the graph that you wish to modify. Double-click and a new window will appear showing your graph, complete with additional menu options and icons (see Figure 7.1). You should see a smaller Properties window pop up, which allows you to make changes to your graphs. If this does not appear, click on the Edit menu and select Properties. There are various changes you can make while in Chart Editor: To change the words used in a label, click once on the label to highlight it (a gold-coloured box should appear around the text). Click once again to edit the text (a red cursor should appear). Modify the text and then press Enter on your keyboard when you have finished. To change the position of the X and Y axis labels (e.g. to centre them), double-click on the title you wish to change. In the Properties box, click on the Text Layout tab. In the section labelled Justify, choose the position you want (the dot means centred, the left arrow moves it to the left, and the right arrow moves it to the right). To change the characteristics of the text, lines, markers, colours, patterns and scale used in the chart, click once on the aspect of the graph that you wish to change. The Properties window will adjust its options depending on the aspect you click on. The various tabs in this box will allow you to change aspects of the graph. If you want to change one of the lines of a multiple-line graph (or markers for a group), you will need to highlight the specific category in the legend (rather than on the graph itself). This is useful for changing one of the lines to dashes so that it is more clearly distinguishable when printed out in black and white. Figure 7.1 Example of a Chart Editor menu bar The best way to learn how to use these options is to experiment—so go ahead and play! IMPORTING GRAPHS INTO WORD DOCUMENTS IBM SPSS Statistics allows you to copy charts directly into your word processor (e.g. Microsoft Word). This is useful when you are preparing the final version of your report and want to present some of your results in the form of a graph. Sometimes, a graph will present your results more simply and clearly than numbers in a box. Don’t go overboard—use only for special effect. Make sure you modify the graph in IBM SPSS Statistics to make it as clear as possible before transferring it to Word. Procedure for importing a chart into a Word document 1. Start Microsoft Word and open the file in which you would like the graph to appear. Click on the IBM SPSS Statistics icon on the taskbar at the bottom of your screen to return to IBM SPSS Statistics. 2. In IBM SPSS Statistics make sure you have the Output (Viewer) window on the screen in front of you. 3. Click once on the graph that you would like to copy. A border should appear around the graph. 4. Click on Edit (from the menu at the top of the page) and then choose Copy. This saves the graph to the clipboard (you won’t be able to see it, however). Alternatively, you can right click on the graph and select Copy from the pop-up menu. 5. From the list of minimised programs at the bottom of your screen, click on your Word document. 6. In the Word document, place your cursor where you wish to insert the graph. 7. Click on Edit from the Word menu and choose Paste. Or just click on the Paste icon on the top menu bar (it looks like a clipboard). The keyboard shortcut, pressing Ctrl and V, can also be used. 8. Click on File and then Save to save your Word document, or use the keyboard shortcut Ctrl and S. 9. To move back to IBM SPSS Statistics to continue with your analyses, click on the IBM SPSS Statistics icon, which should be listed at the bottom of your screen. With both programs open you can just jump backwards and forwards between the two programs, copying graphs, tables etc. There is no need to close either of the programs until you have finished completely. Just remember to save as you go along. ADDITIONAL EXERCISES Business Data file: staffsurvey.sav. See Appendix for details of the data file. 1. Generate a histogram to explore the distribution of scores on the Staff Satisfaction Scale (totsatis). 2. Generate a bar graph to assess the staff satisfaction levels for permanent versus casual staff employed for less than or equal to 2 years, 3 to 5 years and 6 or more years. The variables you will need are totsatis, employstatus and servicegp3. 3. Generate a scatterplot to explore the relationship between years of service and staff satisfaction. Try first using the service variable (which is very skewed) and then try again with the variable towards the bottom of the list of variables (logservice). This new variable is a mathematical transformation (log 10) of the original variable (service), designed to adjust for the severe skewness. This procedure is covered in Chapter 8. 4. Generate a boxplot to explore the distribution of scores on the Staff Satisfaction Scale (totsatis) for the different age groups (age). 5. Generate a line graph to compare staff satisfaction for the different age groups (use the agerecode variable) for permanent and casual staff. Health Data file: sleep.sav. See Appendix for details of the data file. 1. Generate a histogram to explore the distribution of scores on the Epworth Sleepiness Scale (ess). 2. Generate a bar graph to compare scores on the Sleepiness and Associated Sensations Scale (totSAS) across three age groups (agegp3) for males and females (gender). 3. Generate a scatterplot to explore the relationship between scores on the Epworth Sleepiness Scale (ess) and the Sleepiness and Associated Sensations Scale (totSAS). Ask for different markers for males and females (gender). 4. Generate a boxplot to explore the distribution of scores on the Sleepiness and Associated Sensations Scale (totSAS) for people who report that they do/don’t have a problem with their sleep (probsleeprec). 5. Generate a line graph to compare scores on the Sleepiness and Associated Sensations Scale (totSAS) across the different age groups (use the agegp3 variable) for males and females (gender). 11 Correlation Correlation analysis is used to describe the strength and direction of the linear relationship between two variables. There are several different statistics available from IBM SPSS Statistics, depending on the level of measurement and the nature of your data. In this chapter, the procedure for obtaining and interpreting a Pearson product-moment correlation coefficient (r) is presented, along with Spearman Rank Order Correlation (rho). Pearson r is designed for interval level (continuous) variables. It can also be used if you have one continuous variable (e.g. scores on a measure of self-esteem) and one dichotomous variable (e.g. sex: male/female). Spearman rho is designed for use with ordinal level, or ranked, data and is particularly useful when your data do not meet the criteria for Pearson correlation. IBM SPSS Statistics can calculate two types of correlation for you. First, it can give you a simple bivariate correlation (which just means between two variables), also known as ‘zero-order correlation’. It will also allow you to explore the relationship between two variables while controlling for another variable. This is known as ‘partial correlation’. The procedure to obtain a bivariate Pearson r and non-parametric Spearman rho is presented here in Chapter 11 . Partial correlation is covered in Chapter 12 . Pearson correlation coefficients (r) can only take on values from –1 to +1. The sign out the front indicates whether there is a positive correlation (as one variable increases, so too does the other) or a negative correlation (as one variable increases, the other decreases). The size of the absolute value (ignoring the sign) provides an indication of the strength of the relationship. A perfect correlation of 1 or –1 indicates that the value of one variable can be determined exactly by knowing the value on the other variable. A scatterplot of this relationship would show a straight line. On the other hand, a correlation of 0 indicates no relationship between the two variables. Knowing the value on one of the variables provides no assistance in predicting the value on the second variable. A scatterplot would show a circle of points, with no pattern evident. There are several issues associated with the use of correlation that you need to consider. These include the effect of non-linear relationships, outliers, restriction of range, correlation versus causality and statistical versus practical significance. These topics are discussed in the introduction to Part Four of this book. I would strongly recommend that you read through that material before proceeding with the remainder of this chapter. DETAILS OF EXAMPLE To demonstrate the use of correlation, I explore the interrelationships among some of the variables included in the survey.sav data file provided on the website accompanying this book. The survey was designed to explore the factors that affect respondents’ psychological adjustment and wellbeing (see the Appendix for a full description of the study). In this example, I am interested in assessing the correlation between respondents’ feelings of control and their level of perceived stress. If you wish to follow along with this example, you should start IBM SPSS Statistics and open the survey.sav file. Example of research question: Is there a relationship between the amount of control people have over their internal states and their levels of perceived stress? Do people with high levels of perceived control experience lower levels of perceived stress? What you need: Two variables: both continuous, or one continuous and the other dichotomous (two values). What it does: Correlation describes the relationship between two continuous variables, in terms of both the strength of the relationship and the direction. Assumptions: See the introduction to Part Four. Non-parametric alternative: Spearman Rank Order Correlation (rho). PRELIMINARY ANALYSES FOR CORRELATION Before performing a correlation analysis, it is a good idea to generate a scatterplot. This enables you to check for outliers and for violation of the assumptions of linearity (see introduction to Part Four). Inspection of the scatterplots also gives you a better idea of the nature of the relationship between your variables. To generate a scatterplot between your independent variable (Total PCOISS) and dependent variable (Total perceived stress) follow the instructions detailed in Chapter 7 . The output generated from this procedure is shown below. Interpretation of output from Scatterplot The scatterplot can be used to check several aspects of the distribution of these two variables. Step 1: Check for outliers Check your scatterplot for outliers—that is, data points that are out on their own, either very high or very low, or away from the main cluster of points. Extreme outliers are worth checking: Was the information entered correctly? Could these values be errors? Outliers can seriously influence some analyses, so this is worth investigating. Some statistical texts recommend removing extreme outliers from the data set. Others suggest recoding them down to a value that is not so extreme (see Chapter 6 ). If you identify an outlier and want to find out the ID number of the case, you can use the Data Label Mode icon in the Chart Editor. Double-click on the chart to activate the Chart Editor window. Click on the icon that looks a bit like a bullseye (or choose Data Label Mode from the Elements menu) and move your cursor to the point on the graph you wish to identify. Click on it once and a number will appear—this is the ID number if you selected ID in Step 4 of the Scatterplot instructions in Chapter 7 ; otherwise, the case number assigned by IBM SPSS Statistics will be displayed. To turn the numbering off, just click on the icon again. Step 2: Inspect the distribution of data points The distribution of data points can tell you various things about your data: Are the data points spread all over the place? This suggests a very low correlation. Are all the points neatly arranged in a narrow cigar shape? This suggests quite a strong correlation. Could you draw a straight line through the main cluster of points, or would a curved line better represent the points? If a curved line is evident (suggesting a curvilinear relationship) Pearson correlation should not be used, as it assumes a linear relationship. Step 3: Determine the direction of the relationship between the variables The scatterplot can tell you whether the relationship between your two variables is positive or negative. If a line were drawn through the points, what direction would it point—from left to right, upward or downward? An upward trend indicates a positive relationship; high scores on the X axis are associated with high scores on the Y axis. A downward line suggests a negative correlation; low scores on the X axis are associated with high scores on the Y axis. In this example, we appear to have a negative correlation of moderate strength. Once you have explored the distribution of scores on the scatterplot and established that the relationship between the variables is roughly linear and that the scores are evenly spread in a cigar shape, you can proceed with calculating Pearson or Spearman correlation coefficients. Before you start the following procedure, choose Edit from the menu, select Options, and on the General tab make sure there is a tick in the box No scientific notation for small numbers in tables in the Output section. Procedure for requesting Pearson r or Spearman rho 1. From the menu at the top of the screen, click on Analyze, then select Correlate, then Bivariate. 2. Select your two variables and move them into the box marked Variables (e.g. Total perceived stress: tpstress, Total PCOISS: tpcoiss). If you wish you can list a whole range of variables here, not just two. In the resulting matrix, the correlation between all possible pairs of variables will be listed. This can be quite large if you list more than just a few variables. 3. In the Correlation Coefficients section, the Pearson box is the default option. If you wish to request the Spearman rho (the non-parametric alternative), tick the Spearman box instead (or as well). 4. Click on the Options button. For Missing Values, click on the Exclude cases pairwise box. Under Options, you can also obtain means and standard deviations if you wish. 5. Click on Continue and then on OK (or on Paste to save to Syntax Editor). The syntax generated from this procedure is: CORRELATIONS /VARIABLES=tpstress tpcoiss /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE. NONPAR CORR /VARIABLES=tpstress tpcoiss /PRINT=SPEARMAN TWOTAIL NOSIG /MISSING=PAIRWISE. The output generated from this procedure (showing both Pearson and Spearman results) is presented below. Correlations tpstress Total perceived stress tpcoiss Total PCOISS tpstress Total perceived stress Pearson Correlation 1 -.581** Sig. (2-tailed) .000 N 433 426 tpcoiss Total PCOISS Pearson Correlation -.581** 1 Sig. (2-tailed) .000 N 426 430 **. Correlation is significant at the 0.01 level (2-tailed). Nonparametric Correlations tpstress Total perceived stress tpcoiss Total PCOISS Spearmans rho tpstress Total perceived stress Correlation Coefficient 1.000 -.556** Sig. (2-tailed) .000 N 433 426 tpcoiss Total PCOISS Correlation Coefficient -.556** 1.000 Sig. (2-tailed) .000 N 426 430 **. Correlation is significant at the 0.01 level (2-tailed). INTERPRETATION OF OUTPUT FROM CORRELATION For both Pearson and Spearman results, IBM SPSS Statistics provides you with a table giving the correlation coefficients between each pair of variables listed, the significance level and the number of cases. The results for Pearson correlation are shown in the section headed Correlation. If you requested Spearman rho, these results are shown in the section labelled Nonparametric Correlations. You interpret the output from the parametric and non-parametric approaches in the same way. Step 1: Check the information about the sample The first thing to inspect is the table labelled Correlations is the N (number of cases). Is this correct? If there are a lot of missing data, you need to find out why. Did you forget to tick the Exclude cases pairwise box in the missing data option? Using listwise deletion (the other option) means any case with missing data on any of the variables will be removed from the analysis. This can sometimes severely restrict your N. In the above example we have 426 cases that had scores on both of the scales used in this analysis. If a case was missing information on either of these variables, it would have been excluded from the analysis. Step 2: Determine the direction of the relationship The second thing to consider is the direction of the relationship between the variables. Is there a negative sign in front of the correlation coefficient value? This would suggest a negative (inverse) correlation between the two variables (i.e. high scores on one are associated with low scores on the other). The interpretation of this depends on the way the variables are scored. Always check with your questionnaire, and remember that for many scales some items are negatively worded and therefore are reversed before scoring. What do high values really mean? This is one of the major areas of confusion for students, so make sure you get this clear in your mind before you interpret the correlation output. In the example given here, the Pearson correlation coefficient (r = –.58) and Spearman value (rho = –.56) are negative, indicating a negative correlation between perceived control and stress. The more control people feel they have, the less stress they experience. Step 3: Determine the strength of the relationship The third thing to consider in the output is the size of the correlation coefficient. This can range from –1 to +1. This value will indicate the strength of the relationship between your two variables. A correlation of 0 indicates no relationship at all, a correlation of 1 indicates a perfect positive correlation, and a value of –1 indicates a perfect negative correlation. How do you interpret values between 0 and 1? Different authors suggest different interpretations; however, Cohen (1988, pp. 79–81) suggests the following guidelines: small r = .10 to .29 medium r = .30 to .49 large r = .50 to 1.00 These guidelines apply whether or not there is a negative sign out the front of your r value. Remember, the negative sign refers only to the direction of the relationship, not the strength. The strength of correlation of r = .5 and r = –.5 is the same. It is only in a different direction. In the example presented above, there is a large correlation between the two variables (above .5), suggesting quite a strong relationship between perceived control and stress. Step 4: Calculate the coefficient of determination To get an idea of how much variance your two variables share, you can also calculate what is referred to as the ‘coefficient of determination’. Sounds impressive, but all you need to do is square your r value (multiply it by itself). To convert this to percentage of variance, just multiply by 100 (shift the decimal place two columns to the right). For example, two variables that correlate r = .2 share only .2 × .2 = .04 = 4\% of their variance. There is not much overlap between the two variables. A correlation of r = .5, however, means 25 per cent shared variance (.5 × .5 = .25). In our example the Pearson correlation is .581, which, when squared, indicates 33.76 per cent shared variance. Perceived control helps to explain nearly 34 per cent of the variance in respondents’ scores on the Perceived Stress Scale. This is quite a respectable amount of variance explained when compared with a lot of the research conducted in the social sciences. Step 5: Assess the significance level The next thing to consider is the significance level (listed as Sig. 2 tailed). This is a frequently misinterpreted area, so care should be exercised here. The level of statistical significance does not indicate how strongly the two variables are associated (this is given by r or rho), but instead it indicates how much confidence we should have in the results obtained. The significance of r or rho is strongly influenced by the size of the sample. In a small sample (e.g. n = 30), you may have moderate correlations that do not reach statistical significance at the traditional p < .05 level. In large samples (N = 100+), however, very small correlations (e.g. r = .2) may reach statistical significance. While you need to report statistical significance, you should focus on the strength of the relationship and the amount of shared variance (see Step 4). When publishing in some literature areas (particularly health and medical) you may be asked to provide confidence intervals for your correlation coefficients—that is, the range of values in which we are 95 per cent confident the true value lies (if we actually could measure it!). Unfortunately, IBM SPSS Statistics does not provide these; however, there are some websites that provide online calculators for you. If you need to obtain these values I suggest that you go to the website http://vassarstats.net/rho.html . All you need to provide is the r and n values that are available in your SPSS output. You might like to check out the whole VassarStats website, which provides a range of tools for performing statistical computation ( http://vassarstats.net/ ). PRESENTING THE RESULTS FROM CORRELATION The results of the above example using Pearson correlation could be presented in a research report as follows. If you need to report the results for Spearman’s Correlation, just replace the r value with the rho value shown in the output. Typically, the value of r is presented using two decimal places (rather the three decimal places provided in the SPSS output), but check with the conventions used in the journals in your literature area. For correct APA style the statistics are presented in italics (r, n, p). The relationship between perceived control of internal states (as measured by the PCOISS) and perceived stress (as measured by the Perceived Stress Scale) was investigated using a Pearson product-moment correlation coefficient. Preliminary analyses were performed to ensure no violation of the assumptions of normality and linearity. There was a strong negative correlation between the two variables, r = –.58, n = 426, p < .001, with high levels of perceived control associated with lower levels of perceived stress. Correlation is often used to explore the relationship among a group of variables, rather than just two as described above. In this case, it would be awkward to report all the individual correlation coefficients in a paragraph; it would be better to present them in a table. One way this could be done is shown below. Table 1 Pearson Product-Moment Correlations Between Measures of Perceived Control and Wellbeing Scale 1 2 3 4 5 1. Total PCOISS – 2. Total perceived stress –.58 ** – 3. Total negative affect –.48 ** .67 ** – 4. Total positive affect .46** –.44 ** –.29 ** – 5. Total life satisfaction .37 ** –.49 ** –.32 ** .42 ** – Note. PCOISS = Perceived Control of Internal States Scale. ** p < .001 (2- tailed). For other examples of how to present the results of correlation see Chapter 7 in Nicol and Pexman (2010b). OBTAINING CORRELATION COEFFICIENTS BETWEEN GROUPS OF VARIABLES In the previous procedures section, I showed you how to obtain correlation coefficients between two continuous variables. If you have a group of variables and you wish to explore the interrelationships among all of them, you can ask IBM SPSS Statistics to do this in one procedure. Just include all the variables in the Variables box. This can, however, result in an enormous correlation matrix that can be difficult to read and interpret. Sometimes, you want to explore only a subset of all these possible relationships. For example, you might want to assess the relationship between control measures (Mastery, PCOISS) and a set of adjustment and wellbeing measures (positive affect, negative affect, life satisfaction). You don’t want a full correlation matrix, because this would give you correlation coefficients among all the variables, including between each of the various pairs of adjustment measures. There is a way that you can limit the correlation coefficients that are displayed. This involves using Syntax Editor (described in Chapter 3 ) to limit the correlation coefficients that are produced by IBM SPSS Statistics. Procedure for obtaining correlation coefficients between two groups of variables 1. From the menu at the top of the screen, click on Analyze, then select Correlate, then Bivariate. 2. Move the variables of interest into the Variables box. Select the first group of variables (e.g. Total Positive Affect: tposaff, total negative affect: tnegaff, total life satisfaction: tlifesat), followed by the second group (e.g. Total PCOISS: tpcoiss, Total Mastery: tmast). In the output that is generated, the first group of variables will appear down the side of the table as rows and the second group will appear across the table as columns. Put your longer list first; this stops your table being too wide to appear on one page. 3. Click on Paste. This opens the Syntax Editor window. 4. Put your cursor between the first group of variables (e.g. tposaff, tnegaff, tlifesat) and the other variables (e.g. tpcoiss and tmast). Type in the word WITH (tposaff tnegaff tlifesat with tpcoiss tmast). This will ask IBM SPSS Statistics to calculate correlation coefficients between tmast and tpcoiss and each of the other variables listed. The final syntax should be: CORRELATIONS /VARIABLES=tposaff tnegaff tlifesat with tpcoiss tmast /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE. 5. To run this new syntax, you need to highlight the text from CORRELATIONS down to and including the full stop at the end. It is very important that you include the full stop in the highlighted section. 6. With this text highlighted, click on the green triangle or arrow-shaped icon, or, alternatively, click on Run from the Menu, and then Selection from the drop-down menu that appears. The output generated from this procedure is shown as follows. tpcoiss Total PCOISS tmast Total Mastery tposaff Total positive affect Pearson Correlation .456** .432** Sig. (2-tailed) .000 .000 N 429 436 tnegaff Total negative affect Pearson Correlation -.484** -.464** Sig. (2-tailed) .000 .000 N 428 435 tlifesat Total life satisfaction Pearson Correlation .373** .444** Sig. (2-tailed) .000 .000 N 429 436 **. Correlation is significant at the 0.01 level (2-tailed). Presented in this manner, it is easy to compare the relative strength of the correlations for my two control scales (Total PCOISS, Total Mastery) with each of the adjustment measures. COMPARING THE CORRELATION COEFFICIENTS FOR TWO GROUPS Sometimes, when doing correlational research, you may want to compare the strength of the correlation coefficients for two separate groups. For example, you may want to explore the relationship between optimism and negative affect for males and females separately. One way that you can do this is described below. Procedure for comparing correlation coefficients for two groups of participants Step 1: Split the sample 1. From the menu at the top of the screen, click on Data, then select Split File. 2. Click on Compare Groups. 3. Move the grouping variable (e.g. sex) into the box labelled Groups based on. Click on OK (or on Paste to save to Syntax Editor). If you use Syntax, remember to run the procedure by highlighting the command and clicking on Run. 4. This will split the sample by sex and repeat any analyses that follow for these two groups separately. The syntax for this command is: SORT CASES BY sex. SPLIT FILE LAYERED BY sex. Step 2: Run correlation Follow the steps in the earlier section of this chapter to request the correlation between your two variables of interest (e.g. Total optimism: toptim, Total negative affect: tnegaff). The results will be reported separately for the two groups. The syntax for this command is: CORRELATIONS /VARIABLES=toptim tnegaff /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE . Important: The Split File operation stays in place until you turn it off. Therefore, when you have finished examining males and females separately you will need to turn the Split File option off. To do this, click on Data, Split File and select the first button: Analyze all cases, do not create groups. The output generated from the correlation procedure is shown below. sex sex toptim Total Optimism tnegaff Total negative affect 1 MALES toptim Total Optimism Pearson Correlation 1 -.220** Sig. (2-tailed) .003 N 184 184 tnegaff Total negative affect Pearson Correlation -.220** 1 Sig. (2-tailed) .003 N 184 185 2 FEMALES toptim Total Optimism Pearson Correlation 1 -.394** Sig. (2-tailed) .000 N 251 250 tnegaff Total negative affect Pearson Correlation -.394** 1 Sig. (2-tailed) .000 N 250 250 Correlation is significant at the 0.01 level (2-tailed). Interpretation of output from correlation for two groups From the output given above, the correlation between Total optimism and Total negative affect for males was r = –.22, while for females it was slightly higher, r = –.39. Although these two values seem different, is this difference big enough to be considered significant? Detailed in the next section is one way that you can test the statistical significance of the difference between these two correlation coefficients. It is important to note that this process is different from testing the statistical significance of the correlation coefficients reported in the output table above. The significance levels reported above (for males: p = .003, for females: p = .000) provide a test of the null hypothesis that the correlation coefficient in the population is 0. The significance test described below, however, assesses the probability that the difference in the correlations observed for the two groups (males and females) would occur as a function of a sampling error, when in fact there was no real difference in the strength of the relationship for males and females. TESTING THE STATISTICAL SIGNIFICANCE OF THE DIFFERENCE BETWEEN CORRELATION COEFFICIENTS In this section, I describe a procedure that can be used to find out whether the correlations for the two groups (males/females) are significantly different. IBM SPSS Statistics does not provide this information. The quickest way to check this is to use an online calculator. One of the easiest ones I have found to use is available at http://vassarstats.net/rdiff.html . Assumptions As always, there are assumptions to check first. It is assumed that the r values for the two groups were obtained from random samples and that the two groups of cases are independent (not the same participants tested twice). The distribution of scores for the two groups is assumed to be normal (see histograms for the two groups). It is also necessary to have at least 20 cases in each of the groups. Procedure for using the online calculator From the IBM SPSS Statistics Correlation output, find the r value and n for Group 1 (males) and Group 2 (females). Males: ra = .22 na = 184 Females: rb = .394 nb = 250 In the online calculator ( http://vassarstats.net/rdiff.html ) enter this information into the boxes provided for Sample A (males) and Sample B (females) and press the Calculate button. The result of the procedure will appear in the boxes labelled z, p (one-tailed) and p (two-tailed). In this example the z value is –1.97 and the p (two-tailed) is .0488. Given that the p value is less than .05 the result is statistically significant. We can conclude that there is a statistically significant difference in the strength of the correlation between optimism and negative affect for males and females. Optimism explains significantly more of the variance in negative affect for females than for males. ADDITIONAL EXERCISES Health Data file: sleep.sav. See Appendix for details of the data file. 1. Check the strength of the correlation between scores on the Sleepiness and Associated Sensations Scale (totSAS) and the Epworth Sleepiness Scale (ess). 2. Use Syntax to assess the correlations between the Epworth Sleepiness Scale (ess) and each of the individual items that make up the Sleepiness and Associated Sensations Scale (fatigue, lethargy, tired, sleepy, energy). 12 Partial correlation Partial correlation is similar to Pearson product-moment correlation (described in Chapter 11 ), except that it allows you to control for an additional variable. This is usually a variable that you suspect might be influencing your two variables of interest. By statistically removing the influence of this confounding variable, you can get a clearer and more accurate indication of the relationship between your two variables. In the introduction to Part Four, the influence of contaminating or confounding variables was discussed (see the section on correlation versus causality). This occurs when the relationship between two variables (A and B) is influenced, at least to some extent, by a third variable (C). This can serve to artificially inflate the size of the correlation coefficient obtained. This relationship can be represented graphically as: In this case, A and B may appear to be related, but in fact their apparent relationship is due to the influence of C. If you were to statistically control for the variable C then the correlation between A and B is likely to be reduced, resulting in a smaller correlation coefficient. DETAILS OF EXAMPLE To illustrate the use of partial correlation, I use the same example as described in Chapter 11 but extend the analysis further to control for an additional variable. This time I am interested in exploring the relationship between scores on the Perceived Control of Internal States Scale (PCOISS) and scores on the Perceived Stress Scale, while controlling for what is known as ‘socially desirable responding bias’. This variable refers to people’s tendency to present themselves in a positive, or socially desirable, way (also known as ‘faking good’) when completing questionnaires. This tendency is measured by the Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe 1960). A short version of this scale (Strahan & Gerbasi 1972) was included in the questionnaire used to measure the other two variables. If you would like to follow along with the example presented below, you should start IBM SPSS Statistics and open the file labelled survey.sav, which is included on the website accompanying this book. Example of research question: After controlling for participants’ tendency to present themselves in a positive light on self-report scales, is there still a significant relationship between perceived control of internal states (PCOISS) and levels of perceived stress? What you need: two continuous variables that you wish to explore the relationship between (e.g. Total PCOISS, Total perceived stress) one continuous variable that you wish to control for (e.g. total social desirability: tmarlow). What it does: Partial correlation allows you to examine the relationship between two variables while statistically controlling for (getting rid of) the effect of another variable that you think might be contaminating or influencing the relationship. Assumptions: For full details of the assumptions for correlation, see the introduction to Part Four. Before you start the following procedure, choose Edit from the menu, select Options, and make sure there is a tick in the box No scientific notation for small numbers in tables. Procedure for partial correlation 1. From the menu at the top of the screen, click on Analyze, then select Correlate, then Partial. 2. Click on the two continuous variables that you want to correlate (e.g. Total PCOISS: tpcoiss, Total perceived stress: tpstress). Click on the arrow to move these into the Variables box. 3. Click on the variable that you wish to control for (e.g. Total social desirability: tmarlow) and move it into the Controlling for box. 4. Click on Options. In the Missing Values section, click on Exclude cases pairwise. In the Statistics section, click on Zero order correlations. 5. Click on Continue and then OK (or on Paste to save to Syntax Editor). The syntax from this procedure is: PARTIAL CORR /VARIABLES= tpcoiss tpstress BY tmarlow /SIGNIFICANCE=TWOTAIL /STATISTICS=CORR /MISSING=ANALYSIS . The output generated from this procedure is shown below. Control Variables tpcoiss Total PCOISS tpstress Total perceived stress tmarlow Total social desirability -none-a tpcoiss Total PCOISS Correlation 1.000 -.581 .295 Significance (2-tailed) . .000 .000 df 0 424 425 tpstress Total perceived stress Correlation -.581 1.000 -.228 Significance (2-tailed) .000 . .000 df 424 0 426 tmarlow Total social desirability Correlation .295 -.228 1.000 Significance (2-tailed) .000 .000 . df 425 426 0 tmarlow Total social desirability tpcoiss Total PCOISS Correlation 1.000 -.552 Significance (2-tailed) . .000 df 0 423 tpstress Total perceived stress Correlation -.552 1.000 Significance (2-tailed) .000 . df 423 0 a. Cells contain zero-order (Pearson) correlations. INTERPRETATION OF OUTPUT FROM PARTIAL CORRELATION The output provides you with a table made up of two sections: 1. In the top half of the table is the Pearson product-moment correlation matrix between your two variables of interest (e.g. perceived control and perceived stress), not controlling for your other variable. In this case, the correlation is –.581. The word ‘none’ in the left-hand column indicates that no control variable is in operation. This is often referred to as the ‘zero-order correlation coefficient’. 2. The bottom half of the table repeats the same set of correlation analyses, but this time controlling for (removing) the effects of your control variable (e.g. social desirability). In this case, the new partial correlation is –.552. You should compare these two sets of correlation coefficients to see whether controlling for the additional variable had any impact on the relationship between your two variables of interest. In this example, there was only a small decrease in the strength of the correlation (from –.581 to –.552). This suggests that the observed relationship between perceived control and perceived stress is not due merely to the influence of socially desirable responding. PRESENTING THE RESULTS FROM PARTIAL CORRELATION Although IBM SPSS Statistics provides the correlation coefficients using three decimal places, they are usually reported in journal articles as two decimals (see APA Publication Manual for details). The results of this analysis could be presented as: Partial correlation was used to explore the relationship between perceived control of internal states (as measured by the PCOISS) and perceived stress (measured by the Perceived Stress Scale) while controlling for scores on the Marlowe-Crowne Social Desirability Scale. Preliminary assessments were performed to ensure no violation of the assumptions of normality and linearity. There was a strong, negative partial correlation between perceived control of internal states and perceived stress, controlling for social desirability, r = –.55, n = 425, p < .001, with high levels of perceived control being associated with lower levels of perceived stress. An inspection of the zero-order correlation coefficient (r = –.58) suggested that controlling for socially desirable responding had very little effect on the strength of the relationship between these two variables. PAGE 339 Codebook SPSS variable name Full variable name Coding instructions Measurement level id Identification number Identification number Scale scale sex Sex 1=males, 2=females Nominal nominal age Age in years Scale scale marital Marital status 1=single, 2=steady relationship, 3=living with a partner, 4=married for the first time, 5=remarried, 6=separated, 7=divorced, 8=widowed Nominal nominal child Children 1=yes, 2=no Scale nominal educ Highest level of education 1=primary, 2=some secondary, 3=completed high school, 4=some additional training, 5=completed undergraduate, 6=completed postgraduate Nominal ordinal source Major source of stress 1=work, 2=spouse or partner, 3=relationships, 4=children, 5=family, 6=health/illness, 7=life in general, 8=finances, 9=time (lack of, too much to do) Scale nominal smoke Do you smoke? 1=yes, 2=no Nominal nominal smokenum Cigarettes smoked per week Number of cigarettes smoked per week Scale scale op1 to op6 Optimism Scale 1=strongly disagree, 5=strongly agree Nominal scale mast1 to mast7 Mastery Scale 1=strongly disagree, 4=strongly agree Scale scale pn1 to pn20 PANAS 1=very slightly, 5=extremely Nominal scale lifsat1 to lifsat5 Life Satisfaction Scale 1 =strongly disagree, 7=strongly agree Scale scale pss1 to pss10 Perceived Stress Scale 1=never, 5=very often Nominal scale sest1 to sest10 Self-Esteem Scale 1=strongly disagree, 4=strongly agree Scale scale m1 to m10 Marlowe-Crowne Social Desirability Scale 1=true, 2=false Nominal nominal pc1 to pc18 Perceived Control of Internal States Scale (PCOISS) 1=strongly disagree, 5=strongly agree ScaLE scale Total scale scores

55730

BIOSTAT 7

BIOSTAT 7 - Nursing

CATEGORIES