Bootstrap Confidence Interval with R | R Video Tutorial 4.5 | MarinStatsLectures

MarinStatsLectures-R Programming & Statistics
4 Mar 201911:26
EducationalLearning
32 Likes 10 Comments

TLDRIn this educational video, Mike Marin demonstrates how to use the bootstrap method in R programming to construct confidence intervals for comparing numeric variables across two different groups. He discusses the alternative to traditional large sample approaches, explains the process of calculating means and medians, and guides viewers through building confidence intervals using the percentile method. The video also touches on the importance of distinguishing between statistical and scientific significance, suggesting further investigation despite non-significant findings due to the small sample size.

Takeaways
  • πŸ˜€ The video discusses implementing a bootstrap approach in R to build a confidence interval for comparing a numeric variable across two groups.
  • πŸ” The script provides an alternative to traditional large sample methods for constructing confidence intervals for the difference in means.
  • πŸ“š It's recommended to watch previous videos for a better understanding of the concept and general approach of bootstrap confidence intervals.
  • πŸ“ˆ The dataset used in the video involves weight gain of chicks on two different feed types: casein and meatmeal, with 23 observations in total.
  • πŸ“Š The video demonstrates how to create side-by-side box plots to explore the weight differences between the two feed types.
  • πŸ“ The script calculates the difference in means and medians for the two groups, providing sample estimates for these differences.
  • πŸ”’ The observed difference in means is 46.67 grams, favoring casein, while the median difference is 79 grams, also in favor of casein.
  • πŸ”„ The bootstrap approach involves resampling with replacement from each group to create a large number of bootstrap samples.
  • πŸ“‰ The script explains how to calculate bootstrap estimates for the difference in means and medians using the colMeans and apply functions in R.
  • πŸ“Š The percentile method is used to construct the confidence intervals, which involves finding the 2.5th and 97.5th percentiles of the bootstrap estimates.
  • πŸ“ The 95% confidence intervals for both the difference in means and medians include zero, suggesting no statistically significant difference between the groups.
  • πŸ”¬ The video emphasizes the distinction between statistical significance and scientific significance, noting that further investigation is warranted despite non-significant results.
Q & A
  • What is the main topic of the video by Mike Marin?

    -The main topic of the video is implementing a bootstrap approach for building a confidence interval in R programming language to compare a numeric variable for two different groups.

  • What is an alternative approach to building a confidence interval for the difference in means using large sample approaches?

    -An alternative approach is using the bootstrap method, which the video discusses in detail.

  • What are the two variables in the dataset used in the video?

    -The two variables in the dataset are 'weight' and 'feed type', focusing on the weight gain of chicks on one of two different feed types: casein or meatmeal.

  • How many observations are there in the dataset?

    -There are 23 observations in total in the dataset, with 12 chicks on the casein feed type and 11 on meatmeal.

  • What statistical measures are used to estimate the difference between the two groups in the video?

    -The video discusses estimating the difference in means and the difference in medians between the two groups.

  • What is the observed difference in means and medians for the two feed types?

    -The observed difference in means is 46.67 grams, with casein having a higher mean weight, and the observed difference in medians is 79 grams, with casein having a higher median weight.

  • What is the purpose of setting a seed in the bootstrapping process as shown in the video?

    -Setting a seed ensures that the bootstrap re-samples can be reproduced exactly whenever the code is run, which is useful for consistency and verification purposes.

  • How many bootstrap samples are taken in the video's example?

    -100,000 bootstrap samples are taken in the video's example.

  • What are the four common methods for building a confidence interval using a bootstrap approach mentioned in the video?

    -The four common methods are the percentile method, the basic method, the normal method, and the bias-corrected method.

  • What does the percentile method involve when constructing a confidence interval?

    -The percentile method involves using the 2.5th percentile and the 97.5th percentile of the bootstrap estimates to form the confidence interval, effectively capturing the middle 95% of the distribution.

  • What conclusion can be drawn from the confidence intervals for the difference in means and medians?

    -The conclusion is that the means and medians of the two feed types are not statistically significantly different, as both confidence intervals include zero.

  • Why might further investigation be warranted despite the non-significance of the means and medians?

    -Further investigation may be warranted because, although the differences are not statistically significant, there is evidence suggesting that the feed types may differ, and the sample size is relatively small.

Outlines
00:00
πŸ“Š Introduction to Bootstrap Confidence Intervals in R

In this video, Mike Marin introduces a bootstrap approach for constructing confidence intervals in R to compare a numeric variable across two groups, offering an alternative to traditional large sample methods. The video serves as a continuation of previous content, where the concept of bootstrap confidence intervals and hypothesis testing were explained. The dataset in focus involves weight gain of chicks on two different feed types, casein and meatmeal, with 23 observations in total. The video emphasizes the importance of understanding the data and context before proceeding with statistical analysis. It also provides a brief on how to visualize data using box plots and mentions the calculation of mean and median differences between the two groups as a precursor to building confidence intervals.

05:01
πŸ”’ Implementing Bootstrap Resampling for Statistical Analysis

This section of the script delves into the technical process of implementing a bootstrap approach in R. The process begins with setting a seed for reproducibility and involves resampling with replacement from the observed measurements of both feed types to create bootstrap samples. The script guides viewers through creating matrices for casein and meatmeal bootstrap resamples and checking the dimensions to ensure accuracy. It then demonstrates how to calculate bootstrap estimates of the difference in means and medians using column means and the median function applied to the bootstrap samples. The explanation includes practical R commands and functions, such as 'colMeans' and 'apply', to perform these calculations efficiently.

10:03
πŸ“‰ Constructing Confidence Intervals Using the Percentile Method

The final part of the script discusses the construction of confidence intervals using the percentile method, one of the several bootstrapping techniques. The method involves using the quantile function in R to determine the 2.5th and 97.5th percentiles of the bootstrap estimates, which form the bounds of the 95% confidence interval. The video provides a step-by-step guide on calculating these percentiles for both the difference in means and medians between the two feed types. The results are interpreted to suggest that there is no statistically significant difference between the means or medians of the two groups, although there is evidence of potential differences that warrant further investigation. The script concludes with a reminder of the distinction between statistical and scientific significance and an invitation to explore additional methods for constructing confidence intervals included in the R-script.

Mindmap
Keywords
πŸ’‘Bootstrap approach
The bootstrap approach is a resampling technique used in statistics to estimate the accuracy of sample estimates. It involves repeatedly sampling from the original data set with replacement to create many 'bootstrap samples'. In the video, this method is used to build a confidence interval for comparing a numeric variable between two groups, offering an alternative to traditional large sample approaches. The script describes implementing this approach in R to calculate confidence intervals for the difference in means and medians of weight gain in chicks on different feed types.
πŸ’‘Confidence interval
A confidence interval is a range of values, derived from a data sample, that is likely to contain the value of an unknown population parameter. It is used to indicate the reliability of an estimate. In the context of the video, confidence intervals are constructed for the difference in means and medians of weight between two feed types to determine the variability and reliability of these estimates.
πŸ’‘R (Programming Language)
R is a programming language and environment commonly used for statistical computing and graphics. The video script discusses using R to implement a bootstrap approach for statistical analysis. The script provides examples of R code for calculating means, medians, and constructing confidence intervals, demonstrating how R can be utilized for complex statistical tasks.
πŸ’‘Casein feed type
Casein is a protein derived from milk and is used in the video as one of the two feed types for chicks. The script explores the weight gain of chicks on casein feed compared to another feed type, meatmeal. The analysis of the data for casein feed type is part of the statistical comparison in the video.
πŸ’‘Means
In statistics, the mean is the average value of a data set and is calculated by summing all the values and dividing by the number of values. The video script discusses calculating the mean weight for chicks on casein and meatmeal feeds and then finding the difference in means between these two groups.
πŸ’‘Medians
The median is the middle value of a data set when it is ordered from least to greatest. If there is an even number of observations, the median is the average of the two middle numbers. The script describes calculating the median weight for each feed type and then determining the difference in medians as part of the bootstrap analysis.
πŸ’‘Percentiles
Percentiles divide a data set into 100 equal parts and are used to understand the distribution of data. In the video, percentiles are used in the percentile method of constructing a confidence interval, where the 2.5th and 97.5th percentiles of the bootstrap estimates define the interval.
πŸ’‘Resampling
Resampling is the process of drawing samples repeatedly from a data set, often used in bootstrapping. In the script, resampling with replacement from the casein and meatmeal groups is described to create the bootstrap samples that are used to estimate the distribution of the difference in means and medians.
πŸ’‘Statistical significance
Statistical significance refers to the likelihood that a result is not due to chance. In the video, the confidence intervals for the differences in means and medians include zero, indicating that the observed differences are not statistically significant, meaning they could be due to random variation in the sample.
πŸ’‘Quantile
A quantile is a value that divides a data set into equal intervals, with the most common being quartiles (dividing into four parts). The script uses the quantile function in R to find the 2.5th and 97.5th percentiles for constructing confidence intervals using the bootstrap percentile method.
πŸ’‘Feed types
In the context of the video, feed types refer to the different diets given to the chicks, specifically casein and meatmeal. The script discusses analyzing the weight gain data for these two feed types to compare their effects on the chicks' growth.
Highlights

The video discusses implementing a bootstrap approach for building a confidence interval in R to compare a numeric variable for two different groups.

Bootstrap is an alternative to large sample approaches for confidence intervals of the difference in means.

The concept and general approach behind building confidence intervals are explained in separate videos.

Links to related videos, R-Script, and data are provided in the video description.

The dataset consists of two variables: weight and feed type, with 23 observations in total.

12 chicks are on the casein feed type and 11 on meatmeal.

Side by side box plots are used to explore the weight of the two different feed types.

The video builds confidence intervals for the difference in means and medians of the two groups.

R is used to calculate the mean and median weight for each feed type.

The observed difference in means is 46.67 grams higher for casein.

The sample difference in medians is 79 grams higher for casein.

A bootstrap approach is introduced to build confidence intervals without relying on external packages.

The number of bootstrap samples (B) is set to 100,000 for the analysis.

Bootstrap resamples are taken with replacement from each feed type separately.

The percentile method is used to construct the confidence intervals from the bootstrap estimates.

The 95% confidence interval for the difference in means ranges from -4 to 96.8 grams.

The 95% confidence interval for the difference in medians ranges from 24.5 to 116 grams.

Both confidence intervals include zero, indicating no statistically significant difference between the means or medians.

The video emphasizes the difference between statistical and scientific significance and suggests further investigation is warranted.

Additional R-script code is provided for constructing confidence intervals using the basic method and for the 80th percentile of weight.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: