## Stats Modeling the World 3rd Edition By David E. Bock – Test Bank

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Suppose that after the study described in #5 we want to see if there’s evidence that the exercise program’s effectiveness in lowering blood pressure depends on how high the person’s initial blood pressure was. We should do a

1)

A) Λ2 goodness-of-fit test B) linear regression t-test C) matched pairs t-test D) Λ2 test of independence E) 2-sample t-test Answer: B Explanation: A) B) C) D) E)

2) Several volunteers engage in a special exercise program intended to lower their blood pressure. We measure each person’s initial blood pressure, lead them through the exercises daily for a month, then check blood pressures again. To see if the program lowered blood pressure significantly we should do a

2)

A) 2-sample t-test B) Λ2 test of homogeneity C) matched pairs t-test D) Λ2 goodness-of-fit test E) linear regression t-test Answer: C Explanation: A) B) C) D) E)

1

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. A high school counselor was interested in finding out how well student grade point averages (GPA) predict ACT scores. A sample of the senior class data was reviewed to obtain GPA and ACT scores. The data are shown in the table, and regression output is given below.

2

3) Create and interpret a 95% confidence interval for the slope of the regression line. 3) Answer: Degrees of freedom = 15 – 2 = 13 95% confidence interval for Ά1 is: b1 ±t*13× SE(b1) = 7.397 ± (2.16)(1.087) or (5.05, 9.74) We are 95% confident that the ACT score increases an average of between 5.05 and 9.74 points for every additional GPA point. Explanation:

3

Here are the data about the average January temperature in cities in the United States, and factors that might allow us to predict temperature. The data, available for 55 cities, include: Jantemp Mean January Temperature (degrees F) Lat Latitutde (degrees of Latitude) Long Longitude (degrees of Longitude) Altitude Altitude at the airport (where the temperatures have been recorded) in feet above sea level. Coast Approximate distance from the nearest seacoast in kilometers

We will attempt to make a regression model to help account for mean January temperature and to understand the effects of the various predictors. At each step of the analysis you may assume that things learned earlier in the process are known.

Units Note: The “degrees” of temperature, given here on the Fahrenheit scale, have only a coincidental language relationship to the “degrees” of longitude and latitude. The geographic “degrees” are based on modeling the Earth as a sphere and dividing it up into 360 degrees for a full circle. Thus 180 degrees of longitude is halfway around the world from Greenwich, England (0°) and Latitude increases from 0 degrees at the Equator to 90 degrees (North) latitude results at the north pole.

Here are the data:

4

4) First, we consider the relationship between temperature and latitude. This seems to be the obvious first choice; everybody knows that northern (high latitude) cities tend to be colder in January than southern (lower latitude) cities. Here is the scatterplot:

Describe what you see in this scatterplot in a sentence or two. Which of the regression assumptions for the regression of Jantemp on Lat can you check with this plot? State them and indicate whether you think they seem to be satisfied.

4)

Answer: The plot shows a negative direction, showing that it tends to be colder at higher latitudes. It shows a clearly linear shape. It has moderate to small scatter. There seems to be some heteroskedasticity, with larger variation at higher latitudes. Possibly there are just 3 cities that are a bit warm in January for their latitude. Explanation:

5

5) It is possible that the distance that a city is from the ocean could affect its January temperature. Coast gives an approximate distance of each city from the nearest coast. Including it in the regression yields the following regression table:

Dependent variable is: JanTemp R squared = 87.6% R squared (adjusted) = 86.9% s = 4.878 with 55 – 4 = 51 degrees of freedom

Source Sum of Squares df Mean Square F-ratio Regression 8611.86 3 2870.62 121 Residual 1213.67 51 23.7974

Variable Coefficient SE(Coeff) t-ratio P-value Intercept 111.878 6.167 18.1 0.0001 Lat -2.47722 0.1307 -19.0 0.0001 Long 0.221997 0.0462 4.81 0.0001 Coast -0.674929 0.0901 -7.49 0.0001

And here is a scatterplot of the residuals:

Write a report on this regression. Interpret the coefficients and the R2. Are the conditions met?

5)

Answer: This regression model seems to be an improvement. The model accounts for 87.6% of the variability of January temperature and all the standard hypothesis tests on the coefficients are highly significant. The residuals plot shows no particular pattern of concern. Explanation:

6) In a local school, vending machines offer a range of drinks from juices to sports drinks. The purchasing agent thinks each type of drink is equally favored among the students buying drinks from the machines. The recent purchasing choices from the vending machines are shown in the table.

a. Test an appropriate hypothesis to decide if the purchasing agent is correct. Give statistical evidence to support your conclusion.

6)

6

b. Which type of drink impacted your decision the most? Explain what this means in the context of the problem. Answer:

_

a. We want to know if the types of drinks are uniformly distributed (equally favored) among the students buying drinks from the machines. H0: The types of drinks are uniformly distributed (equally favored) among the students buying drinks. HA: The types are not uniformly distributed (equally favored) among the students buying drinks. Conditions: * Counted data: We have the counts from a sample of purchasing data. * Randomization: We don’t want to make inferences about other high schools, so we don’t need to check this condition. * Expected cell frequency: The null hypothesis expects 25% of the 680 drinks, or 170, should occur in each flavor. These expected values are all greater than 5, so the condition is satisfied. Under these conditions, the sampling distribution of the test statistic is Λ2 with 4 – 1 = 3 degrees of freedom. We will perform a chi-square goodness of fit test. Λ2 = (Obs – Exp)2 Exp = (159 – 170)2 170 + (198 – 170)2 170 + (174 – 170)2 170 + (149 – 170)2 170 = 8.012 The P-value is the area in the upper tail of the Λ2 model for 3 degrees of freedom above the computed Λ2 value. P =P (Λ2 > 8.012) = 0.0458 The P-value of 0.0458 is low, so we reject the null hypothesis. These data show evidence that the drink flavors are not uniformly chosen (equally favored) by the students. b. Kiwi Strawberry; its standardized residual is the largest (Obs – Exp) Exp = (198 – 170) 170 = 2.15, and its Λ2 component was 4.612. More students preferred the Kiwi Strawberry compared to what was expected by the purchasing agent. The purchasing agent should order more Kiwi Strawberry than other flavors. Explanation:

7

7) Could eye color be a warning signal for hearing loss in patients suffering from meningitis? British researcher Helen Cullington recorded the eye color of 130 deaf patients, and noted whether the patient’s deafness had developed following treatment for meningitis. Her data are summarized in the table below. Test an appropriate hypothesis and state your conclusion.

7)

Answer:

_

H0: Deafness and eye color are independent. HA: There is an association between deafness and eye color. These are counts of categorical data, assumed to be representative of deaf patients in Britain, with expected counts (25.1, 76.9, 6.9, and 21.1) all at least 5. OK to do a Λ2 test for independence, with df= 1. Λ2 = (Obs – Exp)2 Exp = (30 – 25.1)2 25.1 + (72 – 76.9)2 76.9 + (2 – 6.9)2 6.9 + (26 – 21.1)2 21.1 = 5.87 P = 0.015

Since P = 0.015 is low, I reject the null hypothesis. There is strong evidence that hearing loss is associated with eye color. It appears that people with dark-colored eyes are at less risk of suffering deafness from meningitis. Explanation:

8) Height and weight Last fall, as our first example of correlation, we looked at the heights and weights of some AP* Statistics students. Here are the scatterplot, the residuals plot, a histogram of the residuals, and the regression analysis for the data we collected from the males. Use this information to analyze the association between heights and weights of teenage boys.

8)

8

a. Is there an association? Write appropriate hypotheses. b. Are the assumptions for regression satisfied? Explain. c. What do you conclude? d. Create a 95% confidence interval for the true slope. e. Explain in context what your interval means. Answer: a. H0: There is no association between height and weight. HA: There is an association between height and weight. b. The scatterplot looks straight enough, residuals are random and display consistent spread, the histogram of residuals looks roughly unimodal and symmetric. c. Reject H0because of the small P-value; there is strong evidence of an association between height and weight. d. 7.30 ± 2.75 e. We are 95% confident that teenage boys gain an average of between 4.55 and 10.05 pounds per inch of height. Explanation:

9) As part of a survey, students in a large statistics class were asked whether or not they ate breakfast that morning. The data appears in the following table:

Is there evidence that eating breakfast is independent of the student’s sex? Test an appropriate hypothesis. Give statistical evidence to support your conclusion.

9)

9

Answer:

_

We want to know whether the categorical variables “eating breakfast” and “student’s sex” are statistically independent. H0: Eating breakfast and student’s sex are independent. HA: There is an association between eating breakfast and student’s sex. Conditions: *Counted data: We have the counts of individuals in categories of two categorical variables. *Randomization: We have a convenience sample of students, but no reason to suspect bias. *Expected cell frequency: The expected values (shown in parenthesis in the table) are all greater than 5, so the condition is satisfied. Under these conditions, the sampling distribution of the test statistic is Λ2 with (r- 1)(c- 1) = (2 – 1)(2 – 1) = 1 degree of freedom, and we will perform a chi-square test of independence.

Λ2 = (Obs – Exp)2 Exp =

(66 – 76.169)2 76.169 +

(66 – 55.831)2 55.831 +

(125 – 114.83)2 114.83 +

(74 – 84.169)2 84.169 = 5.339 The P-value is P(Λ2 > 5.339) = 0.0209.

The P-value of 0.0209 is pretty small, so we reject the null hypothesis. There is evidence of an association between student’s sex and whether or not breakfast is eaten. It appears that females may be more likely to eat breakfast. Explanation:

10) A manufacturing plant for recreational vehicles receives shipments from three different parts vendors. There has been a defect issue with some of the electrical wiring in the recreational vehicles manufactured at the plant. The plant manager wonders if all of the vendors might be contributing equally to the defect issue. The plant manager reviews a sample of quality assurance inspections from the last six months. The data are shown in the table below.

Test an appropriate hypothesis to decide if the plant manager is correct. Give statistical evidence to support your conclusion.

10)

10

Answer:

_

We want to know whether all of the vendors might be contributing equally to the defect issue. H0: The type of defects in vehicles made by the three vendors has the same distribution (are homogeneous). HA: The type of defects in vehicles made by the three vendors does not have the same distribution (are not homogeneous). Conditions: *Counted data: We have the counts from a sample of quality assurance inspections from the last six months. *Randomization: The data are from a sample of quality assurance inspections from the last six months. *Expected cell frequency: The expected values (shown in parenthesis in the table) are all greater than five. Under these conditions, the sampling distribution of the test statistic is Λ2 with (3 – 1)(3 – 1) = 4 degrees of freedom.

We will perform a chi-square test of homogeneity. Λ2 = (Obs – Exp)2 Exp = (53 – 57.69)2 57.69 + (48 – 51.51)2 51.51 + . . . = 7.40 P =P(Λ2 > 7.40) = 0.1161

The P-value of 0.1161 is rather high, so we fail to reject the null hypothesis. There is little statistical evidence to indicate that the types of defects vary by vendor. Explanation:

11

11) Voter registration A random sample of 337 college students was asked whether or not they were registered to vote. We wonder if there is an association between a student’s sex and whether the student is registered to vote. The data are provided in the table below (expected counts are in parentheses). (All the conditions are satisfied – don’t worry about checking them.)

The calculated statistic is Λ2 = 0.249.

a. Write appropriate hypotheses. b. Suppose the expected values had not been given. Show exactly how to calculate the expected number of men who are registered to vote. c. Show how to calculate the component of Λ2 for the first cell. d. How many degrees of freedom are there? e. Find the P-value for this test. f. State your complete conclusion in context.

11)

Answer: a. H0: Voter registration is independent of a student’s sex. HA: There is an association between voter registration and a student’s sex. b. 251 337 137 = 102 c. (104 – 102)2 102 d. df = (2 – 1)(2 – 1) = 1 e. 0.618 f. Since the P-value of 0.618 is high, we fail to reject the null hypothesis. There is no evidence of an association between a student’s sex and whether the student is registered to vote. Explanation:

Of the 23 first year male students at State U. admitted from Jim Thorpe High School, 8 were offered baseball scholarships and 7 were offered football scholarships. The University admissions committee looked at the students’ composite ACT scores (shown in table), wondering if the University was lowering their standards for athletes. Assuming that this group of students is representative of all admitted students, what do you think?

Boxplots:

12

Normal Probability Plot:

12) Are the two sports teams mean ACT scores different? 12) Answer: To get a 95% confidence interval for the difference between the baseball and football players, we replace the t* critical value at ΅ = 0.05 with a t** value at ΅ = 0.05/3 = 0.01667. For 20 degrees of freedom, t** = 2.162. The pooled standard deviation is sp= 2.79 points. The mean ACT of baseball players is 23.375 and 21.857 for football players, so the Bonferroni confidence interval for the difference is: 23.375 – 21.857 ±t** × sp 1 nB+ 1 nF = 1.518 ± 2.162 × 2.79 1 8 + 1 7 = (-2.26, 5.28) points. So we conclude that there is not sufficient evidence of a difference between the mean ACT of the two teams. Explanation:

13

13) College admissions According to information from a college admissions office, 62% of the students there attended public high schools, 26% attended private high schools, 2% were home schooled, and the remaining students attended schools in other countries. Among this college’s Honors Graduates last year there were 47 who came from public schools, 29 from private schools, 4 who had been home schooled, and 4 students from abroad. Is there any evidence that one type of high school might better equip students to attain high academic honors at this college? Test an appropriate hypothesis and state your conclusion.

13)

Answer:

_

H0: Distribution of school type among honors grads is the same as for whole college. HA: Distribution of school type among honors grads is different. These are counts; we assume this group is representative of other years; after combining home schoolers and students from abroad as “other”, expected counts of 52.08, 21.84, and 10.08 are all L 5. OK to do a chi-square goodness-of-fit test with 2 df. Λ2 = (Obs – Exp)2 Exp = (47 – 52.08)2 52.08 + (29 – 21.84)2 21.84 + (8 – 10.08)2 10.08 = 3.27 P = 0.195. With such a large P-value we do not reject the null hypothesis. There is no evidence that students who graduate with honors came from different high school backgrounds than others. Explanation:

Of the 23 first year male students at State U. admitted from Jim Thorpe High School, 8 were offered baseball scholarships and 7 were offered football scholarships. The University admissions committee looked at the students’ composite ACT scores (shown in table), wondering if the University was lowering their standards for athletes. Assuming that this group of students is representative of all admitted students, what do you think?

Boxplots:

14

Normal Probability Plot:

14) Test an appropriate hypothesis and state your conclusion 14) Answer: H0: µF=µB=µNA vs. HA: not all the means are equal. We assume these students are representative of all admissions. Scores for the groups are independent. Boxplots of the three groups show similar variance and no outliers. The nearly Normal condition appears to be met from the Normal probability plot: With a P-value this low we reject the null hypothesis (even with this small sample size!). There is evidence that average composite ACT scores for the three groups are not the same. Explanation:

15

Here are the data about the average January temperature in cities in the United States, and factors that might allow us to predict temperature. The data, available for 55 cities, include: Jantemp Mean January Temperature (degrees F) Lat Latitutde (degrees of Latitude) Long Longitude (degrees of Longitude) Altitude Altitude at the airport (where the temperatures have been recorded) in feet above sea level. Coast Approximate distance from the nearest seacoast in kilometers

We will attempt to make a regression model to help account for mean January temperature and to understand the effects of the various predictors. At each step of the analysis you may assume that things learned earlier in the process are known.

Units Note: The “degrees” of temperature, given here on the Fahrenheit scale, have only a coincidental language relationship to the “degrees” of longitude and latitude. The geographic “degrees” are based on modeling the Earth as a sphere and dividing it up into 360 degrees for a full circle. Thus 180 degrees of longitude is halfway around the world from Greenwich, England (0°) and Latitude increases from 0 degrees at the Equator to 90 degrees (North) latitude results at the north pole.

Here are the data:

16

15) Here is the corresponding regression table:

Dependent variable is: JanTemp R squared = 71.9% R squared (adjusted) = 71.3% s = 7.222 with 55 – 2 = 53 degrees of freedom

Source Sum of Squares df Mean Square F-ratio Regression 7061.32 1 7061.32 135 Residual 2764.21 53 52.1549

Variable Coefficient SE(Coeff) t-ratio P-value Intercept 108.805 7.146 15.2 0.0001 Lat -2.11114 0.1814 -11.6 0.0001

Write a brief report based on this regression. Explain in words and numbers what this equation says about the relationship between January temperature and latitude. Discuss the R2 value and t-ratios.

15)

Answer: January temperature falls about 2.11 degrees Fahrenheit per degree Latitude. The intercept of 108.8 is best interpreted as a starting value. 0° Latitude is the Equator, where the temperature may get this high, but we cannot extrapolate from the data that far. 71.9% of the variability in January temperature is accounted for by a least squares linear regression on Latitude. Explanation:

16) Car reliability A consumer group assigned 62 car models reliability ratings of 1 – 5 based upon repair records. They wondered if more expensive cars might be more reliable. To find out, they created the regression analysis shown. (SHOW WORK. Don’t bother writing hypotheses, and you may assume the assumptions for inference were all satisfied.)

a. df = ______, t =______, P = ______ b. State your conclusion.

16)

Answer: a. df = 60, t = 1.24, P = 0.11 b. Fail to reject (P > 0.05); We do not have evidence that expensive cars are more reliable. Explanation:

17

17) Peanut M&Ms According to the Mars Candy Company, peanut M&M’s are 12% brown, 15% yellow, 12% red, 23% blue, 23% orange, and 15% green. On a Saturday when you have run out of statistics homework, you decide to test this claim. You purchase a medium bag of peanut M&M’s and find 39 browns, 44 yellows, 36 red, 78 blue, 73 orange, and 48 greens. Test an appropriate hypothesis and state your conclusion.

17)

Answer:

_

We want to know if the distribution of colors in the bag matches the distribution stated by the Mars Candy Company. H0: The distribution of colors in the bag matches the distribution stated by the Mars Candy Company. HA: The distribution of colors in the bag does not match the distribution stated by the Mars Candy Company. Conditions: *Counted data: We have the counts of the number of peanut M&Ms of each color. *Randomization: We will assume that each bag of peanut M&Ms represents a random sample of peanut M&Ms. *Expected cell frequency: There are a total of 318 peanut M&Ms. The smallest percentage of any particular color is 12% (brown and red), and we expect 318(0.12) = 38.16. Since the smallest expected count exceeds 5, all expected counts will exceed 5, so the condition is satisfied. Under these conditions, the sampling distribution of the test statistic is Λ2 with 6 – 1 = 5 degrees of freedom, and we will perform a chi-square goodness-of-fit test.

Λ2 = (Obs- Exp)2 Exp =

(39 – 38.16)2 38.16 +

(44 – 47.7)2 47.7 +… = 0.7528

P-value = P(Λ2 > 0.7528) = 0.980 A P-value this large says that if the distribution of colors in the bag matches the distribution stated by the Mars Candy Company, an observed chi-square value of 0.7528 would happen about 98% of the time. Thus, we fail to reject the null hypothesis. These data do not show evidence that the distribution of colors in the bag does not match the distribution stated by the Mars Candy Company. Explanation:

18

18) Height and weight Is the height of a man related to his weight? The regression analysis from a sample of 26 men is shown. (Show work. Don’t write hypotheses. Assume the assumptions for inference were satisfied.)

a. How many degrees of freedom? b. What is the value of the t statistic? c. What is the P-value? d. State your conclusion in context.

18)

Answer: a. df = 26 – 2 = 24 b. t = 8.737 1.312 = 6.659 c. The P-value is 2 × P (t*24> 6.659) < 0.0001. d. The P-value is very small, so we reject the null hypothesis. There is strong evidence that, on average, men’s weights are about 8.7 pounds higher for each additional inch in height. Explanation:

19) Student progress The Comprehensive Test of Basic Skills (CTBS) is used by school district to assess student progress. Two of the areas tested are math and reading. A random sample of student results was reviewed to determine if there is an association between math and reading scores on the CTBS. Here are the scatterplot, the residuals plot, a histogram of the residuals, and the regression analysis of the data. Use this information to analyze the association between the math and reading scores on the CTBS.

a. Is there an association? Write appropriate hypotheses. b. Are the assumptions for regression satisfied? Explain.

19)

19

c. What do you conclude? d. Create a 95% confidence interval for the true slope. e. Explain in context what your interval means. Answer: a. H0: There is no association between Math and Reading CTBS scores. Ά1 = 0 HA: There is an association between Math and Reading CTBS scores. Ά1 J 0 b. * Straight Enough Condition: There is no obvious bend in the scatterplot. * Independence Condition: The residuals show no clear pattern. * Does the Plot Thicken? Condition: The residual plot shows reasonably consistent spread. * Nearly Normal condition: A histogram of the residuals is unimodal and roughly symmetric. c. The P-value is very small, so we reject the null hypothesis. There is strong evidence of a positive association between CTBS scores in Math and Reading. d. A 95% confidence interval for Ά1 is: Ά1 ±t*18× SE(b1)= 0.866 ± 2.101(0.1045) or (0.646, 1.086) e. We are 95% confident that the Reading CTBS score will be higher, on average, between 0.646 and 1.086 points for each additional CTBS point scored on the Math CTBS test. Explanation:

20) Suppose you were asked to analyze each of the situations described below. (NOTE: DO NOT DO THESE PROBLEMS!) For each, indicate which inference procedure you would use (from the list), the test statistic (z, t, or Λ2 ), and, if t or Λ2 , the number of degrees of freedom.

1. proportion, 1 sample 2. difference of proportions, 2 samples 3. mean, 1 sample 4. mean of differences, matched pairs 5. difference of means, independent samples 6. goodness of fit 7. homogeneity 8. independence 9. regression, inference for Ά

a. Doctors offer small candies to sixty teenagers, recording the number of candies consumed by each. One hour later they test the blood sugar level for each person. Is there any evidence that high blood sugar levels in teenagers are related to the amount of candy eaten?

b. Which takes less time to travel to work –– car or train? We select a random sample of 45 businessmen and compare their travel time to work for both types of commute.

c. An orthodontist wonders if soda in the diet may be a factor in loose cement on children’s braces. She checks the cement bonds of 40 randomly selected patients who do not drink soda, and 40 patients who do drink soda.

20)

## Reviews

There are no reviews yet.