Bootstrap Hypothesis Testing in R with Example | R Video Tutorial 4.4 | MarinStatsLecutres
TLDRIn this educational video, Mike Marin introduces a bootstrap approach to hypothesis testing for comparing numeric variables between two groups using R programming language. The tutorial covers importing datasets, visualizing data with box plots, and calculating test statistics like the difference in means and medians. The focus is on implementing a bootstrap method with 10,000 resamples to estimate p-values, providing an alternative to traditional t-tests and non-parametric tests. The script also hints at the importance of considering both statistical and scientific significance in data analysis.
Takeaways
- π The video is a tutorial on implementing a bootstrap approach to hypothesis testing using R, comparing numeric variables for two groups as an alternative to the two-sample t-test.
- π The dataset used in the video compares weight gain for chicks on two different diets: casein and meat meal, with a small sample size of 12 and 11 observations respectively.
- π The video includes a side-by-side box plot visualization to provide a preliminary comparison of weight gain between the two diets.
- π Two test statistics are calculated: the absolute value of the difference in mean weight and the absolute value of the difference in median weights for the two diets.
- π’ The script demonstrates how to calculate these test statistics using R functions like 'mean', 'median', 'with', and 'tapply'.
- π The concept of bootstrapping involves resampling with replacement to create datasets and calculate the test statistics multiple times to build a sampling distribution.
- π The video explains how to set up a bootstrap analysis in R, including setting a seed for reproducibility and defining parameters like the number of observations and bootstrap resamples.
- π’ The process of calculating bootstrap test statistics involves creating a matrix of resampled data and then using a loop to calculate the test statistics for each resample.
- π The p-value is discussed as a measure of the probability of observing the test statistic or a more extreme one if the null hypothesis is true, calculated from the bootstrap test statistics.
- π The script includes a reminder of the difference between statistical significance and scientific significance, emphasizing the importance of context over a strict p-value cutoff.
- π The video concludes with a teaser for a follow-up video that will explore using a bootstrap approach to build confidence intervals for the differences in means and medians.
Q & A
What is the main topic of the video by Mike Marin?
-The video discusses implementing a bootstrap approach to hypothesis testing for comparing a numeric variable across two groups using the R programming language, specifically as an alternative to the two-sample t-test.
Why does Mike Marin suggest watching another video first?
-Mike Marin suggests watching another video first because it explains in more detail the concept and the general approach to the test that is being demonstrated in the current video.
What dataset is used in the video for demonstration purposes?
-The dataset used in the video compares weight gain for chicks on two different diets: casein and meat meal.
How many observations are there in each of the two feed types in the dataset?
-There are 12 observations on the casein diet and 11 on the meat meal diet.
What are the two test statistics calculated in the video?
-The two test statistics calculated are the absolute value of the difference in mean weight for the two diets and the absolute value of the difference in median weights for the two diets.
What is the purpose of setting a seed in the R script as demonstrated in the video?
-Setting a seed in the R script allows for reproducibility of results, ensuring that the same random data is generated each time the code is run.
What is the significance of the number of bootstrap resamples (B) chosen in the video?
-The number of bootstrap resamples (B) determines the size of the dataset used for calculating the bootstrap test statistics. Starting with 10,000 allows for a robust estimation of the sampling distribution of the test statistics.
How does the video demonstrate the calculation of the bootstrap test statistics?
-The video demonstrates the calculation by using a loop to calculate the test statistic for each of the 10,000 bootstrap resamples and storing the results in vectors.
What is the definition of a p-value as explained in the video?
-The p-value is defined as the probability of getting the observed test statistic or one more extreme, assuming the null hypothesis is true.
How does the video address the difference between statistical significance and scientific significance?
-The video reminds viewers that a p-value should not be rigidly used as the sole criterion for decision-making and that even if a difference is not statistically significant, there may still be evidence worth exploring further.
What additional analysis is suggested in the video for further exploration?
-The video suggests exploring the construction of confidence intervals for the difference in means and medians using a bootstrap approach as a follow-up to the hypothesis testing demonstrated.
Outlines
π Introduction to Bootstrap Hypothesis Testing in R
In this video, Mike Marin introduces a bootstrap approach to hypothesis testing for comparing a numeric variable across two groups using the R programming language. He suggests watching a previous video for a detailed explanation of the concept and approach. The dataset used involves weight gain in chicks on two different diets: casein and meat meal. The video provides a visual comparison through side-by-side box plots and discusses the calculation of two test statistics: the absolute difference in mean weights and median weights. The script includes an alternative test statistic using the 90th percentile of weights. The video also touches on classical hypothesis testing methods such as the two-sample t-test, Wilcoxon test, and Kolmogorov-Smirnov test.
π§ Setting Up for Bootstrap Resampling in R
The script proceeds with setting up the bootstrap resampling process in R. It starts by setting a seed for reproducibility and defining parameters such as the number of observations and the number of bootstrap resamples. The variable for resampling is generalized to allow for easy adaptation to different datasets. The code generates 10,000 bootstrap resample datasets with replacement, creating a matrix of 23 rows and 10,000 columns, where each column represents a resample. The script then calculates the test statistics for each resample, initially using a loop to calculate the mean for casein and meat meal observations and storing the absolute differences in means and medians in separate vectors.
π Calculating and Interpreting Bootstrap P-Values
The final part of the script involves calculating the p-values for the two test statistics using the bootstrap resamples. The p-value is defined as the proportion of bootstrap test statistics that are more extreme than the observed test statistic under the null hypothesis. The video demonstrates how to calculate this proportion in R and interprets the results, suggesting that a p-value of approximately 8% for test statistic one indicates that if there were no difference in mean weights, such a difference would occur by chance about 8% of the time. The video also discusses the p-value for test statistic two, which is around 26.3%. It concludes by emphasizing the difference between statistical and scientific significance and the importance of considering confidence intervals alongside hypothesis tests. The script includes additional code for plotting the sampling distribution of bootstrap test statistics and exploring hypothesis tests using different percentiles.
Mindmap
Keywords
π‘Bootstrap
π‘Hypothesis Testing
π‘R Programming Language
π‘Two-Sample T-Test
π‘Wilcoxon or Mann-Whitney U Test
π‘Kolmogorov-Smirnov Two Sample Test
π‘Test Statistic
π‘P-Value
π‘Confidence Interval
π‘Statistical Significance
π‘Feed Types
Highlights
Introduction to implementing a bootstrap approach for hypothesis testing in R programming language as an alternative to the two sample t-test.
Recommendation to watch a separate video for a detailed explanation of the concept and approach to the test.
Importing a dataset to compare weight gain for chicks on two different diets without attaching the data.
Visual comparison of weight gain for two different feed types using side by side box plots.
Calculation of the first test statistic: the absolute value of the difference in mean weight for two diets.
Calculation of the second test statistic: the absolute value of the difference in median weights for two diets.
Explanation of more advanced ways of using R to produce the same results with the 'with' command and 'tapply'.
Setting a seed for reproducibility in bootstrapping approach.
Using n for the number of observations and B for the number of bootstrap resamples.
Resampling with replacement to generate bootstrap resample datasets.
Calculating test statistics for each bootstrap resample using a loop for transparency.
Observed test statistics: absolute difference in means and medians.
Classical approaches to hypothesis testing: two sample t-test, Wilcoxon, and Kolmogorov-Smirnov tests.
Bootstrapping approach to calculate p-values for test statistics.
Interpretation of p-values in the context of hypothesis testing.
Difference between statistical significance and scientific significance with a reminder not to be rigid with the 5% cutoff.
Importance of exploring further even if not statistically significant due to small sample sizes.
Inclusion of code for plotting the sampling distribution of bootstrap test statistics.
Upcoming video on using a bootstrap approach to build confidence intervals for the difference in means and medians.
Transcripts
Browse More Related Video
Permutation Hypothesis Test in R with Examples | R Tutorial 4.6 | MarinStatsLectures
Bootstrap Confidence Interval with R | R Video Tutorial 4.5 | MarinStatsLectures
Permutation Hypothesis Testing with Example | Statistics Tutorial # 37 | MarinStatsLectures
SPSS (9): Mean Comparison Tests | T-tests, ANOVA & Post-Hoc tests
Bootstrap Confidence Interval with Examples | Statistics Tutorial #36 | MarinStatsLectures
Mann Whitney U / Wilcoxon Rank-Sum Test in R | R Tutorial 4.3 | MarinStatsLectures
5.0 / 5 (0 votes)
Thanks for rating: