Mann Whitney U / Wilcoxon Rank-Sum Test in R | R Tutorial 4.3 | MarinStatsLectures

MarinStatsLectures-R Programming & Statistics
25 Aug 201304:19
EducationalLearning
32 Likes 10 Comments

TLDRIn this educational video, Mike Marin explains the 'Mann-Whitney U' test, also known as the 'Wilcoxon Rank-Sum' test, a nonparametric method for comparing medians between two independent groups. He demonstrates how to use R statistical software to test the hypothesis that the median lung capacity of smokers is equal to that of non-smokers, using the 'wilcox.test' function. The video covers setting arguments for a two-sided test, calculating confidence intervals, and interpreting the results, including a discussion on handling ties in the data.

Takeaways
  • ๐Ÿ“š The video is a tutorial on conducting the Mann-Whitney U test, also known as the Wilcoxon Rank-Sum test, using R statistical software.
  • ๐Ÿ” The Mann-Whitney U test is a nonparametric method used to examine the difference in medians between two independent populations.
  • ๐Ÿ“ˆ The test can be used to explore the relationship between a numeric outcome variable and a categorical explanatory variable when the groups are independent.
  • ๐Ÿ”ฌ The video uses Lung Capacity Data to demonstrate the test, focusing on the relationship between smoking and lung capacity.
  • ๐Ÿ’ก Before conducting the test, it's suggested to visualize the data with a boxplot to understand the distribution and relationship between variables.
  • ๐Ÿง The null hypothesis tested is that the median lung capacity of smokers is equal to that of non-smokers.
  • ๐Ÿ“ The 'wilcox.test' command in R is used to perform the nonparametric test, with various arguments to customize the test conditions.
  • ๐Ÿ“‰ The 'mu' argument is set to 0 to test for no difference in medians, and 'alt' specifies a two-sided alternative hypothesis.
  • ๐Ÿ“Š The 'conf.int' argument, when set to TRUE, provides a nonparametric confidence interval for the difference in medians.
  • โš ๏ธ The video notes a warning from R about the inability to calculate exact p-values and confidence intervals when there are ties in the data.
  • ๐Ÿ”„ The 'exact' and 'correct' arguments can be adjusted for exact p-value computation and continuity correction, respectively.
  • ๐Ÿ“‹ The results include the test statistic, p-value, confidence interval, and the difference in medians, with a note on the default settings in R.
Q & A
  • What is the Mann-Whitney U test also known as?

    -The Mann-Whitney U test is also known as the Wilcoxon Rank-Sum test.

  • What type of statistical method is the Mann-Whitney U test?

    -The Mann-Whitney U test is a nonparametric method suitable for examining the difference in medians between two independent populations.

  • What is the purpose of the Mann-Whitney U test in the context of the video?

    -In the video, the Mann-Whitney U test is used to examine the relationship between smoking and lung capacity, specifically to test if there is a difference in the median lung capacity between smokers and non-smokers.

  • How can one access help for a command or function in R?

    -To access the Help menu in R, one can type 'help' followed by the name of the command/function in brackets, or simply use a question mark (?) in front of the command's name.

  • What is a boxplot and why is it useful before conducting the Mann-Whitney U test?

    -A boxplot is a graphical representation of the distribution of a dataset, which can be useful to visually examine the relationship between variables, such as lung capacity and smoking, before conducting the Mann-Whitney U test.

  • What is the null hypothesis being tested in the video?

    -The null hypothesis being tested is that the median lung capacity of smokers is equal to that of non-smokers.

  • What command in R is used to conduct the Mann-Whitney U test?

    -The 'wilcox.test' command in R is used to conduct the Mann-Whitney U test.

  • What does the 'mu' argument represent in the 'wilcox.test' function?

    -The 'mu' argument in the 'wilcox.test' function represents the median difference under the null hypothesis, which is set to 0 to test for no difference in medians.

  • What does the 'conf.int' argument do in the 'wilcox.test' function?

    -The 'conf.int' argument, when set to TRUE, returns a nonparametric confidence interval for the difference in medians.

  • What is the significance of the 'paired' argument in the 'wilcox.test' function?

    -The 'paired' argument, when set to FALSE or 'F', indicates that the groups being compared are independent and not paired.

  • What does the 'exact' argument do in the 'wilcox.test' function?

    -The 'exact' argument, when set to TRUE, instructs R to compute an exact p-value rather than an approximate one.

  • What is a potential issue with calculating an exact p-value and confidence interval?

    -An exact p-value and confidence interval cannot be calculated when there are ties in the ranks of the observations.

  • What is the default behavior of R regarding the return of a confidence interval in the 'wilcox.test' function?

    -By default, R does not return a confidence interval unless the 'conf.int' argument is explicitly set to TRUE.

  • What is the next topic discussed in the series of videos after the Mann-Whitney U test?

    -The next topic discussed in the series is the paired t-test and how to conduct it using R.

Outlines
00:00
๐Ÿ“Š Introduction to the Mann-Whitney U Test in R

The script introduces Mike Marin, who will guide through the Mann-Whitney U test, also known as the Wilcoxon Rank-Sum test, using R statistical software. This nonparametric test is used to examine the difference in medians between two independent populations, or the relationship between a numeric outcome and a categorical explanatory variable with independent groups. The video will use the Lung Capacity Data to explore the relationship between smoking and lung capacity. The 'wilcox.test' command in R will be utilized for this purpose, and viewers are instructed on how to access help in R, examine data with a boxplot, and test the null hypothesis of equal median lung capacities for smokers and non-smokers.

Mindmap
Keywords
๐Ÿ’กMann-Whitney U test
The Mann-Whitney U test, also known as the Wilcoxon Rank-Sum test, is a nonparametric statistical test used to determine whether there is a statistically significant difference between the medians of two independent groups. In the context of the video, it is used to examine the difference in lung capacity between smokers and non-smokers, which is the central theme of the video.
๐Ÿ’กNonparametric method
A nonparametric method is a statistical approach that does not assume a specific distribution for the data. It is used when the data does not meet the assumptions required for parametric tests. In the video, the Mann-Whitney U test is introduced as a nonparametric alternative to the t-test for comparing medians of two independent groups.
๐Ÿ’กWilcoxon Rank-Sum test
The Wilcoxon Rank-Sum test is another name for the Mann-Whitney U test and is used for comparing the distributions of two groups. It is mentioned in the script as synonymous with the Mann-Whitney U test, emphasizing its role in analyzing the relationship between smoking and lung capacity.
๐Ÿ’กR statistical software
R is a programming language and environment commonly used for statistical computing and graphics. In the video, R is the tool used to demonstrate how to conduct the Mann-Whitney U test, highlighting its importance in statistical analysis.
๐Ÿ’กLung Capacity Data
The Lung Capacity Data is the dataset used in the video to illustrate the application of the Mann-Whitney U test. It is a specific example used to examine the relationship between smoking and lung capacity, providing a practical context for the statistical method discussed.
๐Ÿ’กwilcox.test command
The 'wilcox.test' command in R is used to perform the Mann-Whitney U test. The video script explains how to use this command to test the null hypothesis regarding the median lung capacity of smokers and non-smokers, demonstrating its practical application in R.
๐Ÿ’กNull hypothesis
The null hypothesis is a statement of no effect or no difference, which is tested in a statistical study. In the video, the null hypothesis is that the median lung capacity of smokers is equal to that of non-smokers, which is what the Mann-Whitney U test aims to evaluate.
๐Ÿ’กTwo-sided test
A two-sided test is a statistical test that considers the possibility of differences in either direction (greater or less than). The video describes conducting a two-sided test using the 'wilcox.test' command in R to explore whether the median lung capacity differs between the two groups.
๐Ÿ’กConfidence interval
A confidence interval provides a range of values that is likely to contain the true population parameter with a certain level of confidence. In the video, the 'conf.int' argument is set to TRUE in the 'wilcox.test' command to obtain a nonparametric confidence interval for the difference in medians.
๐Ÿ’กP-value
The p-value is the probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true. The video script mentions a p-value of 0.0055, indicating strong evidence against the null hypothesis of no difference in median lung capacity between smokers and non-smokers.
๐Ÿ’กContinuity correction
Continuity correction is a statistical adjustment used to improve the approximation of the p-value in tests like the Mann-Whitney U test. The 'correct' argument in the 'wilcox.test' command is mentioned in the script, where setting it to TRUE allows R to apply this correction.
๐Ÿ’กExact p-value
An exact p-value is a precise probability value calculated without approximation, especially useful when sample sizes are small or there are ties in the data. The script discusses setting the 'exact' argument to TRUE in the 'wilcox.test' command to compute an exact p-value, although it notes that this is not possible when there are ties in the ranks.
Highlights

Introduction to the Mann-Whitney U test, also known as the Wilcoxon Rank-Sum test, a nonparametric method for examining differences in medians between two independent populations.

Explanation of the test's applicability for examining the relationship between a numeric outcome and a categorical explanatory variable with independent groups.

Use of the Lung Capacity Data to explore the relationship between Smoking and Lung Capacity.

Demonstration of the 'wilcox.test' command in R for conducting the nonparametric test.

Guidance on accessing help in R for specific commands or functions.

Suggestion to examine a boxplot of the data to visualize the relationship between Lung Capacity and Smoking.

Null hypothesis testing that the median Lung Capacity of Smokers is equal to that of Non-Smokers.

Conducting a two-sided test in R using the 'wilcox.test' command.

Setting the 'mu' argument to test for a difference in medians of zero.

Use of the 'alt' argument to specify a two-sided alternative hypothesis.

Inclusion of a nonparametric confidence interval using the 'conf.int' argument set to TRUE.

Setting the 'conf.level' argument to define the level of confidence for the interval.

Indicating independence of groups with the 'paired' argument set to FALSE.

Option to compute an exact p-value using the 'exact' argument.

Use of the 'correct' argument for applying a continuity correction in R.

Presentation of test results including the test statistic, p-value, and confidence interval.

Discussion of the implications of ties in the ranks of observations on the calculation of exact p-values and confidence intervals.

Clarification on the default values of arguments in the 'wilcox.test' command and the importance of specifying 'conf.int' to receive a confidence interval.

้ข„ๅ‘Šไธ‹ไธ€่ง†้ข‘ๅฐ†่ฎจ่ฎบ้…ๅฏนtๆฃ€้ชŒๅŠๅ…ถๅœจRไธญ็š„ๅฎž็Žฐๆ–นๆณ•ใ€‚

Encouragement to subscribe for more statistical tutorials and lectures.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: