BIO STATIC HOMEWORK
1. The following are body mass index (BMI) scores measured in 12 patients who are free of diabetes and participating in a study of risk factors for obesity. Body mass index is measured as the ratio of weight in kilograms to height in meters squared. Generate a 95% confidence interval estimate of the true BMI.
25 27 31 33 26 28 38 41 24 32 35 40
2. Consider the data in Problem 1. How many subjects would be needed to ensure that a 95% confidence interval estimate of BMI had a margin of error not exceeding 2 units?
3. The mean BMI in patients free of diabetes was reported as 28.2. The investigator conducting the study described in Problem 1 hypothesizes that the BMI in patients free of diabetes is higher. Based on the data in Problem 1 is there evidence that the BMI is significantly higher that 28.2? Use a 5% level of significance.
4. Peak expiratory flow (PEF) is a measure of a patientâ€™s ability to expel air from the lungs. Patients with asthma or other respiratory conditions often have restricted PEF. The mean PEF for children free of asthma is 306. An investigator wants to test whether children with chronic bronchitis have restricted PEF. A sample of 40 children with chronic bronchitis are studied and their mean PEF is 279 with a standard deviation of 71. Is there statistical evidence of a lower mean PEF in children with chronic bronchitis? Run the appropriate test at a =0.05.
5. Consider again the study in Problem 4, a different investigator conducts a second study to investigate whether there is a difference in mean PEF in children with chronic bronchitis as compared to those without. Data on PEF are collected and summarized below. Based on the data, is there statistical evidence of a lower mean PEF in children with chronic bronchitis as compared to those without? Run the appropriate test at a =0.05.
Number of Children
Std Dev PEF
No Chronic Bronchitis
6. Using the data presented in Problem 5,
a) Construct a 95% confidence interval for the mean PEF in children without chronic bronchitis.
b) How many children would be required to ensure that the margin of error in (a) does not exceed 10 units?
7. A clinical trial is run to investigate the effectiveness of an experimental drug in reducing preterm delivery to a drug considered standard care and to placebo. Pregnant women are enrolled and randomly assigned to receive either the experimental drug, the standard drug or placebo. Women are followed through delivery and classified as delivering preterm (< 37 weeks) or not. The data are shown below.
Is there a statistically significant difference in the proportions of women delivering preterm among the three treatment groups? Run the test at a 5% level of significance.
8. Using the data in Problem 7, generate a 95% confidence interval for the difference in proportions of women delivering preterm in the experimental and standard drug treatment groups.
9. Consider the data presented in Problem 7. Previous studies have shown that approximately 32% of women deliver prematurely without treatment. Is the proportion of women delivering prematurely significantly higher in the placebo group? Run the test at a 5% level of significance.
10. A study is run comparing HDL cholesterol levels between men who exercise regularly and those who do not. The data are shown below.
Generate a 95% confidence interval for the difference in mean HDL levels between men who exercise regularly and those who do not.
11. A clinical trial is run to assess the effects of different forms of regular exercise on HDL levels in persons between the ages of 18 and 29. Participants in the study are randomly assigned to one of three exercise groups â€“ Weight training, Aerobic exercise or Stretching/Yoga â€“ and instructed to follow the program for 8 weeks. Their HDL levels are measured after 8 weeks and are summarized below.
Is there a significant difference in mean HDL levels among the exercise groups? Run the test at a 5% level of significance. HINT: SSerror = 7286.5.
12. Consider again the data in Problem 11. Suppose that in the aerobic exercise group we also measured the number of hours of aerobic exercise per week and the mean is 5.2 hours with a standard deviation of 2.1 hours. The sample correlation is -0.42.
a) Estimate the equation of the regression line that best describes the relationship between number of hours of exercise per week and HDL cholesterol level (Assume that the dependent variable is HDL level).
b) Estimate the HDL level for a person who exercises 7 hours per week.
c) Estimate the HDL level for a person who does not exercise.
13. The table below summarizes baseline characteristics on patients participating in a clinical trial.
Mean (+ SD) Age
54 + 4.5
53 + 4.9
% Less than High School Education
% Completing High School
% Completing Some College
Mean (+ SD) Systolic Blood Pressure
136 + 13.8
134 + 12.4
Mean (+ SD) Total Cholesterol
214 + 24.9
210 + 23.1
% Current Smokers
% with Diabetes
a) Are there any statistically significant differences in baseline characteristics between treatment groups? Justify your answer.
b) Write the hypotheses and the test statistic used to compare ages between groups. (No calculations â€“ just H0, H1 and form of the test statistic)
c) Write the hypotheses and the test statistic used to compare % females between groups. (No calculations â€“ just H0, H1 and form of the test statistic)
d) Write the hypotheses and the test statistic used to compare educational levels between groups. (No calculations â€“ just H0, H1 and form of the test statistic)
14. A study is designed to investigate whether there is a difference in response to various treatments in patients with rheumatoid arthritis. The outcome is patientâ€™s self-reported effect of treatment. The data are shown below. Is there a significant difference in effect of treatment? Run the test at a 5% level of significance.
15. Using the data shown in Problem 14, suppose we focus on the proportions of patients who show improvement. Is there a statistically significant difference in the proportions of patients who show improvement between treatments 1 and 2. Run the test at a 5% level of significance.
16. An analysis is conducted to compare mean time to pain relief (measured in minutes) under four competing treatment regimens Summary statistics on the four treatments are shown below.
Mean Time to Relief
a) Complete the following ANOVA Table
Source of Variation
b) Write the hypotheses to be tested.
c) Write the decision rule.
d) What is the conclusion?
17. The following data were collected in a clinical trial to compare a new drug to a placebo for its effectiveness in lowering total serum cholesterol. Generate a 95% confidence interval for the difference in mean total cholesterol levels between treatments.
Mean (SD) Total Serum Cholesterol
% Patients with Total Cholesterol < 200
18. Using the data in Problem 17,
a) Generate a 95% confidence interval for the proportion of all patients with total cholesterol < 200.
b) How many patients would be required to ensure that a 95% confidence interval has a margin of error not exceeding 5%?
19. A small pilot study is conducted to investigate the effect of a nutritional supplement on total body weight. Six participants agree to take the nutritional supplement. To assess its effect on body weight, weights are measured before starting the supplementation and then after 6 weeks. The data are shown below. Is there a significant increase in body weight following supplementation? Run the test at a 5% level of significance.
Weight after 6 Weeks
20. The following table was presented in an article summarizing a study to compare a new drug to a standard drug and to a placebo.
Annual Income, $000s
% with Insurance
*Table entries and Mean (SD) or %
a) Are there any statistically significant differences in the characteristics shown among the treatments? Justify your answer.
b) Consider the test for differences in age among treatments. Write the hypotheses and the formula of the test statistic used (No computations required â€“ formula only).
c) Consider the test for differences in insurance coverage among treatments. Write the hypotheses and the formula of the test statistic used (No computations required â€“ formula only).
d) Consider the test for differences in disease stage among treatments. Write the hypotheses and the formula of the test statistic used (No computations required â€“ formula only).
21. A small pilot study is run to compare a new drug for chronic pain to one that is currently available. Participants are randomly assigned to receive either the new drug or the currently available drug and report improvement in pain on a 5-point ordinal scale: 1=Pain is much worse, 2=Pain is slightly worse, 3= No change, 4=Pain improved slightly, 5=Pain much improved. Is there a significant difference in self-reported improvement in pain? Use the Mann-Whitney U test with a 5% level of significance.
New Drug: 4 5 3 3 4 2
Standard Drug: 2 3 4 1 2 3
22. Answer True or False to each of the following
a) The margin of error is always greater than or equal to the standard error.
b) If a test is run and p=0.0356, then we can reject H0 at a =0.01.
c) If a 95% CI for the difference in two independent means is (-4.5 to 2.1), then the point estimate is -2.1.
d) If a 95% CI for the difference in two independent means is (2.1 to 4.5), there is no significant difference in means.
e) If a 90% CI for the mean is (75.3 to 80.9), we would reject H0: m =70 in favor of H1: m â‰ 70 at a =0.05.
23. A randomized controlled trial is run to evaluate the effectiveness of a new drug for asthma in children. A total of 250 children are randomized to either the new drug or placebo (125 per group). The mean age of children assigned to the new drug is 12.4 with a standard deviation of 3.6 years. The mean age of children assigned to the placebo is 13.0 with a standard deviation of 4.0 years. Is there a statistically significant difference in ages of children assigned to the treatments? Run the appropriate test at a 5% level of significance.
24. Consider again the randomized controlled trial described in Problem 22. Suppose that there are 63 boys assigned to the new drug group and 58 boys assigned to the placebo. Is there a statistically significant difference in the proportions of boys assigned to the treatments? Run the appropriate test at a 5% level of significance.
25. A clinical trial is run to evaluate the effectiveness of a new drug to prevent preterm delivery. A total of n=250 pregnant women agree to participate and are randomly assigned to receive either the new drug or a placebo and followed through the course of pregnancy. Among 125 women receiving the new drug, 24 deliver preterm and among 125 women receiving the placebo, 38 deliver preterm. Construct a 95% confidence interval for the difference in proportions of women who deliver preterm.
26. â€œAverage adult Americans are about one inch taller, but nearly a whopping 25 pounds heavier than they were in 1960, according to a new report from the Centers for Disease Control and Prevention (CDC). The bad news, says CDC is that average BMI (body mass index, a weight-for-height formula used to measure obesity) has increased among adults from approximately 25 in 1960 to 28 in 2002.â€ Boston is considered one of Americaâ€™s healthiest cities â€“ is the weight gain since 1960 similar in Boston? A sample of n=25 adults suggested a mean increase of 17 pounds with a standard deviation of 8.6 pounds. Is Boston statistically significantly different in terms of weight gain since 1960? Run the appropriate test at a 5% level of significance.
27. In 2007, the CDC reported that approximately 6.6 per 1000 (0.66%) children were affected with autism spectrum disorder. A sample of 900 children from Boston are tested and 7 are diagnosed with autism spectrum disorder. Is the proportion of children affected with autism spectrum disorder higher in Boston as compared to the national estimate? Run the appropriate test at a 5% level of significance.
28. A clinical trial is being planned to investigate the effect of a new experimental drug designed to reduce total serum cholesterol. Investigators will enroll participants with total cholesterol levels between 200-240, they will be randomized to receive the new drug or a placebo and followed for 2 months, and the total cholesterol will be measured. Investigators plan to run a test of hypothesis and want 80% power to detect a difference of 10 points in mean total cholesterol levels between groups. They assume that 10% of the participants randomized will be lost over the 2 month follow-up. How many participants must be enrolled in the study? Assume that the standard deviation of total cholesterol is 18.5.
29. An observational study is conducted to investigate the association between age and total serum cholesterol. The correlation is estimated at r = 0.35. The study involves n=125 participants and the mean (std dev) age is 44.3 (10.0) years with an age range of 35 to 55 years, and mean (std dev) total cholesterol is 202.8 (38.4).
a) Estimate the equation of the line that best describes the association between age (as the independent variable) and total serum cholesterol.
b) Estimate the total serum cholesterol for a 50-year old person.
c) Estimate the total serum cholesterol for a 70-year old person.
30. For each statement below, indicate whether the statement is true or false.
a) In logistic regression, the predictors are dichotomous, and the outcome is a continuous variable.
b) When calculating a correlation coefficient between two continuous variables, the scales on which the variables are measured affect the value of the correlation coefficient.
c) It is more difficult to reject a null hypothesis if we use a 10% level of significance compared with a 5% level of significance.
d) The sample size required to detect an effect size of 0.25 is larger than the sample size required to detect an effect size of 0.50 with 80% power and a 5% level of significance.
31. For each question below, provide a brief (1-2 sentences) response.
a) How is the slope coefficient (b1) in a simple linear regression different than the coefficient (b1) in a multiple linear regression model?
b) When would a survival analysis model be used instead of a logistic regression model?
c) What is the appropriate statistical test to assess whether there is an association between obesity status (normal weight, overweight, obese) and 5-year incident cardiovascular disease (CVD)? Suppose each participantâ€™s obesity status (category) is known as is whether they develop CVD over the next 5 years or not.
32. An observational study is conducted to compare experiences of men and women between the ages of 50-59 years following coronary artery bypass surgery. Participants undergo the surgery and are followed until the time of death, until they are lost to follow-up or up to 30 years, whichever comes first. The following table details the experiences of participating men and women. The data below are years of death or years of last contact for men and women.
Year of Death
Year of Last Contact
Year of Death
Year of Last Contact
a) Estimate the Estimate the survival functions for each treatment group using the Kaplan-Meier approach
b) Test if there is a significant difference in survival between treatment groups using the log rank test and a 5% level of significance.