Chi-Square Test, Fisher’s Exact Test, & Cross Tabulations in R | R Tutorial 4.10| MarinStatsLectures

MarinStatsLectures-R Programming & Statistics
29 Aug 201303:44
EducationalLearning
32 Likes 10 Comments

TLDRIn this educational video, Mike Marin teaches viewers how to perform the chi-square test of independence and Fisher's exact test using R programming language. He uses lung capacity data to illustrate the relationship between gender and smoking, demonstrating the creation of a contingency table, visualizing data with bar plots, and conducting statistical tests. Marin also discusses using the 'CHISQ.Test' and 'Fisher.test' functions in R, including the application of Yates' continuity correction and confidence intervals for the odds ratio. The video concludes with a teaser for upcoming content on calculating relative risks and odds ratios.

Takeaways
  • πŸ‘‹ Introduction: Mike Marin is presenting a tutorial on performing chi-square tests and Fisher's exact test in R programming language.
  • πŸ“Š Chi-Square Test: The chi-square test of independence is a statistical method used to test the independence between two categorical variables.
  • πŸ” Data Exploration: The tutorial uses lung capacity data to explore the relationship between gender and smoking habits.
  • πŸ“ Importing Data: The data has been imported and attached in R for analysis.
  • πŸ“‹ Contingency Table: The 'table' function in R is used to create a contingency table for the analysis.
  • πŸ“ˆ Visual Representation: A bar plot is generated to visually examine the relationship between variables, using the 'barplot' command with 'beside' and 'legend' arguments.
  • 🧐 CHISQ.Test: The chi-square test is conducted using the 'CHISQ.Test' function, with the option to apply Yate's continuity correction.
  • πŸ“Š Test Results: The test statistic and p-value are presented, and the results can be stored in an object for further analysis.
  • πŸ” Attributes: The 'attributes' function can be used to explore and extract specific attributes from the test results object.
  • πŸ€” Fisher's Exact Test: When chi-square test assumptions are not met, Fisher's exact test is an alternative nonparametric method.
  • πŸ”’ Fisher.Test: This test is performed using the 'Fisher.test' function, with options to include a confidence interval and set the confidence level.
  • πŸ“š Future Content: The next video will discuss packages for calculating relative risks and odds ratios.
Q & A
  • What is the main topic of the video by Mike Marin?

    -The video is about conducting the chi-square test of independence and Fisher's exact test using the R programming language.

  • What is the chi-square test of independence used for?

    -The chi-square test of independence is a parametric method used for testing the independence between two categorical variables.

  • What data set is used in the video for the example?

    -The lung capacity data set, which was introduced earlier in the series, is used for the example in the video.

  • What variables' relationship is explored in the video?

    -The video explores the relationship between gender and smoking using the lung capacity data.

  • How does one produce a contingency table in R?

    -A contingency table can be produced in R using the 'table' command or function.

  • What is the purpose of the 'CHISQ.Test' command/function in R?

    -The 'CHISQ.Test' command/function in R is used to perform the chi-square test for a contingency table.

  • What is the Yate's continuity correction and when is it used in the chi-square test?

    -Yate's continuity correction is a method used to adjust the chi-square test statistic when the expected frequencies in the contingency table are too small, and it is set by the 'correct' argument in the 'CHISQ.Test' function.

  • What does the 'Fisher.test' command in R do?

    -The 'Fisher.test' command in R performs Fisher's exact test, which is a nonparametric alternative to the chi-square test.

  • What is the purpose of the 'conf.int' and 'conf.level' arguments in the 'Fisher.test' function?

    -The 'conf.int' argument is used to request a confidence interval for the odds ratio, and 'conf.level' is used to set the desired level of confidence for the interval.

  • How can one visualize the relationship between variables before the chi-square test?

    -A bar plot can be used to visualize the relationship between variables, which can be produced using the 'barplot' command in R.

  • What is the significance of the p-value in the chi-square test?

    -The p-value in the chi-square test indicates the probability of observing the data, or something more extreme, assuming the null hypothesis of independence is true. A higher p-value suggests that the null hypothesis cannot be rejected.

  • What should one do if the assumptions for the chi-square test are not met?

    -If the assumptions for the chi-square test are not met, such as having small expected frequencies, one may consider using Fisher's exact test as an alternative.

  • What are the additional statistical measures that will be discussed in the next video?

    -The next video will discuss a package for calculating relative risks, odds ratios, and other statistical measures.

Outlines
00:00
πŸ“Š Introduction to Chi-Square and Fisher's Tests in R

In this video, Mike Marin introduces viewers to statistical tests for independence using R programming language. He explains the 'chi-square test of independence' and 'Fisher's exact test', focusing on their application with categorical variables. The example data on lung capacity is used to explore the relationship between gender and smoking habits. Marin demonstrates how to import data, create a contingency table using the 'table' function, and visualize data with a bar plot. He also guides on performing the chi-square test with the 'CHISQ.Test' command, including the use of Yate's continuity correction, and storing the results for further analysis.

Mindmap
Keywords
πŸ’‘Chi-square test of independence
The chi-square test of independence is a statistical method used to determine if there is a significant association between two categorical variables. In the video, it is used to examine the relationship between gender and smoking. The test is appropriate when the data is in the form of a contingency table, which is a cross-tabulation of the variables being analyzed. The script mentions using the 'CHISQ.Test' command in R to perform this test and setting the 'correct' argument to apply Yate's continuity correction.
πŸ’‘R programming language
R is a programming language and environment commonly used for statistical computing and graphics. In the context of the video, R is the tool used to conduct statistical tests such as the chi-square test of independence and Fisher's exact test. The script provides instructions on how to use specific R commands and functions to perform these tests, demonstrating the practical application of R in statistical analysis.
πŸ’‘Contingency table
A contingency table is a type of table in a matrix form that displays the frequency distribution of two or more categorical variables. In the video, the contingency table is generated using the 'table' command in R to explore the relationship between gender and smoking. The table is saved in an object called 'TAB' for further use, which is a common practice in R for data manipulation and analysis.
πŸ’‘Bar plot
A bar plot is a graphical representation of data using bars to show comparisons among categories. In the video, a bar plot is produced to visually examine the relationship between gender and smoking. The 'barplot' command in R is used, with the 'beside' argument set to True for clustered bar charts and the 'legend' argument to include a legend. This visual representation helps in understanding the distribution and relationship between the variables.
πŸ’‘Test statistic
A test statistic is a value calculated from a sample of data that is used to make inferences about a population parameter. In the context of the chi-square test, the test statistic is 1.744 as mentioned in the script. It is used to determine the likelihood of observing the data under the null hypothesis, with the p-value indicating the probability of such an occurrence.
πŸ’‘P-value
The p-value is the probability that the null hypothesis is true given the observed data. In the video, a p-value of 0.1866 is obtained from the chi-square test, which is used to assess the significance of the results. A higher p-value suggests that there is not enough evidence to reject the null hypothesis, indicating no significant association between the variables.
πŸ’‘Yate's continuity correction
Yate's continuity correction is a statistical adjustment used in the chi-square test to correct for continuity when expected frequencies are close to zero. In the script, the 'correct' argument is set to True in the 'CHISQ.Test' function to apply this correction, which helps to improve the accuracy of the test when sample sizes are small.
πŸ’‘Attributes
In R, attributes provide additional information about an object, such as its dimensions or class. The script mentions using the 'attributes' command to explore what R stored in the object 'CHI', which contains the results of the chi-square test. Extracting attributes using the '$' sign allows for the retrieval of specific information, such as the expected table, from the test results.
πŸ’‘Fisher's exact test
Fisher's exact test is a nonparametric statistical test used to determine if there are nonrandom associations between two categorical variables. It is an alternative to the chi-square test, especially when the assumptions of the chi-square test are not met, such as small sample sizes or expected frequencies less than 5. The script mentions using the 'Fisher.test' command in R to perform this test and discusses setting the 'conf.int' and 'conf.level' arguments for confidence intervals.
πŸ’‘Confidence interval
A confidence interval is a range of values, derived from a statistical model, that is likely to contain the value of an unknown parameter. In the context of Fisher's exact test, the 'conf.int' argument in the 'Fisher.test' function is set to True to obtain a confidence interval for the odds ratio. This interval provides a measure of the precision of the estimate and is important for understanding the reliability of the results.
πŸ’‘Odds ratio
The odds ratio is a measure of association between two categorical variables. It is the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. Although not explicitly mentioned in the script, the confidence interval for the odds ratio is discussed, which is derived from Fisher's exact test and provides insight into the strength and direction of the association between the variables.
Highlights

Introduction to the chi-square test of independence and Fisher's exact test using R programming language.

The chi-square test is a parametric method for testing independence between two categorical variables.

Using lung capacity data to explore the relationship between gender and smoking.

Importing and attaching data in R for analysis.

Using the 'CHISQ.Test' function to perform the chi-square test in R.

Accessing help in R for specific commands or functions.

Creating a contingency table using the 'table' command/function in R.

Saving the contingency table in an object called 'TAB' for later use.

Visual examination of the relationship using a bar plot with the 'barplot' command.

Setting the 'beside' argument to True for clustered bar charts.

Producing a default legend with the 'legend' argument set to True.

Conducting the chi-square test with the 'correct' argument for Yate's continuity correction.

Storing the test results in an object named 'CHI'.

Using the 'attributes' function to explore what R stored in the 'CHI' object.

Extracting certain attributes from the 'CHI' object using the '$' sign.

Considering Fisher's exact test when chi-square test assumptions are not met.

Using the 'Fisher.test' command for nonparametric analysis equivalent to the chi-square test.

Setting 'conf.int' to True for a confidence interval of the odds ratio.

Adjusting the 'conf.level' argument for the desired level of confidence.

Upcoming discussion on a package for calculating relative risks and odds ratios in the next video.

Encouragement to subscribe for more R programming and statistics videos.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: