Permutation Hypothesis Test in R with Examples | R Tutorial 4.6 | MarinStatsLectures
TLDRIn this educational video, Mike Marin guides viewers through the implementation of a permutation test to compare numeric variables between two groups, an alternative to traditional t-tests or Mann-Whitney U tests. Using R software, the tutorial covers the concept of permutation testing, data exploration, and the calculation of test statistics for mean and median differences. The video also demonstrates how to generate permutation datasets, calculate p-values, and interpret the results, highlighting the importance of considering both statistical and clinical significance in hypothesis testing.
Takeaways
- π The video discusses implementing a permutation test to compare a numeric variable between two groups using statistical software.
- π The permutation test is an alternative to the independent two-sample t-test and the Mann-Whitney U test (Wilcoxon rank-sum test).
- π The video provides a recap of the permutation test concept and its application to a specific dataset, with links to related videos and resources.
- π The dataset involves comparing weight gain after six weeks for chicks on two different diets, with two variables: weight and feed type.
- π Two test statistics are used for demonstration: the absolute difference in mean weight and the absolute difference in median weight for the two feed types.
- π’ The video demonstrates calculating these test statistics using R commands, including the mean and median weight for each feed type.
- π The permutation test involves setting a seed for reproducibility, initializing permutation samples, and generating 100,000 permutation datasets.
- π The test statistics are calculated for each permutation sample, comparing the absolute differences in means and medians.
- π The p-value is estimated by counting how often the permutation test statistics are more extreme than the observed test statistics.
- π€ The video emphasizes the difference between statistical significance and scientific or clinical significance, cautioning against relying solely on p-values.
- π Additional resources include code for plotting the sampling distribution and reshuffling labels, as well as a discussion on the limitations of permutation tests for constructing confidence intervals.
Q & A
What is the main topic of the video by Mike Marin?
-The video discusses implementing a permutation test approach to compare a numeric variable for two groups using statistical software, as an alternative to the independent two-sample t-test or the Mann-Whitney U test.
What are the two test statistics used in the video for the permutation test?
-The two test statistics used are the absolute value of the difference in mean weight for each of the two different food types and the absolute value of the difference in median weight for each of the two food types.
What is the data set used in the video about?
-The data set consists of comparing weight gain after six weeks for chicks on two different diets, specifically casein and meat meal.
How many observations are there in the data set used in the video?
-There are a total of 23 observations in the data set, with 12 measurements on casein and 11 on meat meal.
What is the purpose of setting a seed in the permutation test?
-Setting a seed allows for the generation of the exact same set of random data each time the code is run, which is useful for reproducibility of results.
How many permutation datasets are generated in the video's example?
-100,000 permutation datasets are generated in the video's example.
What is the observed absolute difference in means for the two feed types in the video?
-The observed absolute difference in means is 46.67 grams.
What is the observed absolute difference in medians for the two feed types in the video?
-The observed absolute difference in medians is 79 grams.
What is the p-value obtained for test statistic one after running 100,000 permutations?
-The p-value obtained for test statistic one is approximately 9.747% or 0.09747.
What is the p-value obtained for test statistic two after running 100,000 permutations?
-The p-value obtained for test statistic two is approximately 5.42% or 0.0542.
What is the difference between statistical significance and scientific or clinical significance mentioned in the video?
-Statistical significance refers to whether the observed results are unlikely to have occurred by chance if the null hypothesis were true, often determined by a p-value threshold like 5%. Scientific or clinical significance refers to the practical importance or meaningfulness of the results in a real-world context, which is not strictly determined by p-values.
Why might the video suggest that a permutation test might not be the best approach for constructing confidence intervals?
-The video suggests that permutation testing does not directly allow for the construction of confidence intervals, whereas bootstrapping, a closely related concept, does allow for it.
What alternative method to permutation testing is mentioned in the video for constructing confidence intervals?
-Bootstrapping is mentioned as an alternative method to permutation testing for constructing confidence intervals.
What is the relationship between permutation testing and bootstrapping mentioned in the video?
-Permutation testing and bootstrapping are related in that they both involve resampling of data. However, while permutation testing involves shuffling the entire dataset, bootstrapping involves resampling with replacement from the dataset to create many simulated samples.
Outlines
π Introduction to Permutation Test for Group Comparison
Mike Marin introduces a video on using permutation tests to compare a numeric variable between two groups, offering an alternative to the t-test or Mann-Whitney U test. He provides a recap of the concept and approach of permutation tests, referencing a previous video for more details. The data set involves weight gain in chicks on different diets, with 23 observations split between casein and meat meal diets. A box plot is used to visualize the data. Two test statistics are defined for demonstration: the absolute difference in mean weight and the absolute difference in median weight. The video includes R script examples for calculating these statistics.
π Generating Permutation Samples and Calculating Test Statistics
The script details the process of generating 100,000 permutation samples by reshuffling the weight variable to create new datasets. It explains initializing a matrix to store these samples and using a loop to fill it. The video demonstrates how to calculate the test statistics for each permutation sample, comparing the mean or median weights for the two feed types within each permuted dataset. The process is kept transparent for educational purposes, though it could be optimized with a function. The observed test statistics are compared to the permutation results to estimate the p-value, which is a measure of the probability of observing the test statistic under the null hypothesis.
π Interpreting Permutation Test Results and P-Value Calculation
The final part of the script focuses on interpreting the permutation test results, calculating the p-value, and comparing it with the observed test statistics. The p-value is the proportion of permutation test statistics that are more extreme than the observed value, indicating the likelihood of observing such a statistic if the null hypothesis is true. The script includes a step-by-step guide to calculating this value for both test statistics, resulting in p-values that suggest the observed differences are not statistically significant at the 5% level. However, the video emphasizes the importance of considering effect size and power, especially with small sample sizes, and mentions the limitations of permutation tests in constructing confidence intervals, suggesting bootstrapping as an alternative approach.
Mindmap
Keywords
π‘Permutation Test
π‘Independent Two-Sample T-Test
π‘Mann-Whitney U Test
π‘Bootstrapping
π‘Test Statistic
π‘Feed Type
π‘Weight Gain
π‘P-Value
π‘Null Hypothesis
π‘Significance Level (Alpha)
π‘Confidence Interval
Highlights
Introduction to a permutation test as an alternative to the independent two-sample t-test or the Mann-Whitney U test.
Explanation of the concept and general approach of a permutation test in a previous video.
Data set overview with two variables: weight and feed type, comparing weight gain for chicks on two different diets.
Demonstration of box plot analysis to explore data distribution between the two feed types.
Selection of two test statistics for demonstration: absolute difference in mean and median weight.
Calculation of test statistics for casein and meat meal diets and their absolute differences.
Introduction of permutation test setup, including setting a seed for reproducibility.
Initialization of a matrix to store permutation samples and explanation of the permutation process.
Loop implementation to generate 100,000 permutations of the weight variable.
Calculation of test statistics for each permutation sample without using a function for clarity.
Observation of the first 15 permutation test statistics to understand the distribution.
Definition and calculation of p-values for the permutation test statistics.
Interpretation of p-values and their implications for hypothesis testing.
Discussion on the difference between statistical significance and scientific or clinical significance.
Acknowledgment of the small sample size and its impact on the power to detect differences.
Mention of the inability to construct confidence intervals using permutation tests, unlike bootstrapping.
Inclusion of additional code for plotting the sampling distribution and reshuffling labels in the script.
Encouragement to subscribe to the channel and share the video for further exploration of the topic.
Transcripts
Browse More Related Video
Bootstrap Hypothesis Testing in R with Example | R Video Tutorial 4.4 | MarinStatsLecutres
Permutation Hypothesis Testing with Example | Statistics Tutorial # 37 | MarinStatsLectures
Mann Whitney U / Wilcoxon Rank-Sum Test in R | R Tutorial 4.3 | MarinStatsLectures
Two-Sample t Test in R (Independent Groups) with Example | R Tutorial 4.2 | MarinStatsLectures
t-Test - Full Course - Everything you need to know
Paired t-Test in R with Examples | R Tutorial 4.7 | MarinStatsLectures
5.0 / 5 (0 votes)
Thanks for rating: