Odds Ratio, Relative Risk & Risk Difference with R | R Tutorial 4.11| MarinStatsLectures

MarinStatsLectures-R Programming & Statistics
25 Sept 201306:51
EducationalLearning
32 Likes 10 Comments

TLDRIn this informative video, Mike Marin introduces a package for calculating 'relative risk', 'odds ratio', and 'attributable risk' using R software. The tutorial focuses on analyzing the relationship between gender and smoking habits using lung capacity data. Marin demonstrates how to create 2-way tables and bar plots, perform chi-square tests, and utilize the epiR package to compute statistical measures. He explains the interpretation of these measures, including the significance of the odds ratio, and how to reorganize data into a standard a, b, c, d format for clearer analysis. The video is a valuable resource for those interested in R programming and statistical analysis.

Takeaways
  • πŸ“Š The video introduces a package for calculating 'relative risk', 'odds ratio', and 'attributable risk' or 'risk difference' using R software.
  • πŸ” The focus is on analyzing the relationship between two categorical variables, specifically Gender and Smoking, using lung capacity data.
  • πŸ“ˆ A 2-way table is created to visualize the relationship between Gender and Smoking, and a bar plot is suggested for better visual examination.
  • 🚫 The chi-square test of independence is mentioned, but it's noted that it doesn't indicate the strength or direction of the association.
  • πŸ“š The 'epiR' package in R is recommended for calculating the summaries of 'relative risk', 'odds ratio', and 'attributable risk'.
  • πŸ›  The '2by2' command from the 'epiR' package is used to calculate the statistical measures, with options to specify the study type and confidence level.
  • πŸ”’ The script explains how to interpret the 'odds ratio', providing an example of how to switch reference groups for different interpretations.
  • πŸ“ The standard a, b, c, d table format is introduced for organizing data in a way that aligns with common statistical interpretations.
  • πŸ”„ Two methods are demonstrated for reorganizing the data into the a, b, c, d format: using the 'matrix' command and the 'cbind' command.
  • πŸ“‘ The 'colnames' command is shown to add column names to the reorganized table for clarity.
  • πŸ”„ The video concludes with recalculating the statistical summaries using the reorganized table and interpreting the results, including the significance of the confidence interval.
Q & A
  • What statistical measures are discussed in the video to analyze the association between two categorical variables?

    -The video discusses 'relative risk', 'odds ratio', and 'attributable risk' or 'risk difference' as measures of the direction and strength of the association between two categorical variables.

  • What R package is used in the video to calculate the statistical measures?

    -The 'epiR' package is used in the video to calculate the statistical measures such as 'relative risk', 'odds ratio', and 'attributable risk'.

  • How is the lung capacity data imported and attached in R for analysis?

    -The lung capacity data is imported into R and attached using the 'table' command, which is then saved in an object named 'TAB'.

  • What is the purpose of setting the 'beside' argument to TRUE in the bar plot?

    -Setting the 'beside' argument to TRUE in the bar plot places the bar plots side by side, allowing for a visual comparison of the two categorical variables.

  • What does the bar plot suggest about the relationship between gender and smoking based on the video?

    -The bar plot suggests that there may be a relationship between gender and smoking, as the non-smoking group has more males than females, while the smoking group has more females than males.

  • Why is the chi-square test of independence not sufficient to indicate the strength or direction of an association?

    -The chi-square test of independence is not sufficient because it only tests for the presence of an association but does not provide information about the strength or direction of that association.

  • What is the default confidence level used in the 'epi.2by2' command?

    -The default confidence level used in the 'epi.2by2' command is 95 percent.

  • How is the 'relative risk' interpreted in the context of the video?

    -In the video, the 'relative risk', also referred to as the 'incidence risk ratio', is interpreted as the risk of the outcome occurring in the exposed group compared to the unexposed group.

  • What does an odds ratio of 0.71 signify in the context of the video?

    -An odds ratio of 0.71 signifies that the odds of a female not smoking are 0.71 times the odds of a male not smoking, indicating a lower likelihood of smoking among females compared to males.

  • What is the significance of the confidence interval containing the value 1 in the context of the odds ratio?

    -If the confidence interval of the odds ratio contains the value 1, it indicates that the odds ratio is not statistically significant, suggesting that there is no significant difference in the odds of the outcome between the groups being compared.

  • How can the standard a, b, c, d format of a 2x2 table be achieved in R?

    -The standard a, b, c, d format of a 2x2 table can be achieved in R by creating a matrix with the appropriate values and using the 'matrix' command, or by using square brackets and the 'cbind' command to bind values column-wise.

Outlines
00:00
πŸ“Š Analyzing Risk Measures with R: Introduction and Data Setup

In this introductory section, Mike Marin presents a video focused on calculating key statistical measures such as 'relative risk', 'odds ratio', and 'attributable risk' using R software. He introduces the lung capacity dataset and demonstrates the initial steps of data analysis, including importing the data into R, creating a 2-way table with the 'table' command, and visualizing the relationship between gender and smoking through a bar plot. The video sets the stage for exploring the association between categorical variables and introduces the concept of statistical measures that quantify the strength and direction of such associations.

05:04
πŸ“ˆ Understanding and Calculating Statistical Summaries with epiR Package

This paragraph delves into the specifics of calculating statistical summaries using the epiR package in R. Mike explains the use of the '2by2' command to generate summaries for a 2x2 table, including setting the 'method' argument for different study types and adjusting the 'conf.level' for the desired confidence interval. He provides an example of interpreting the odds ratio and demonstrates how to reorganize the table into a standard a, b, c, d format for consistency with traditional statistical interpretations. The summary also includes a step-by-step guide on creating matrices and binding columns to form the required table structure. The video concludes with an example calculation and interpretation of the odds ratio for smoking habits among males and females, highlighting the significance of the confidence interval in determining statistical relevance.

Mindmap
Keywords
πŸ’‘Relative Risk
Relative risk is a statistical measure that quantifies the strength of the association between an exposure and an outcome. In the context of the video, it's used to examine the relationship between gender and smoking habits. The script mentions calculating the 'incidence risk ratio', which is another term for relative risk, to understand how the risk of being a smoker differs between males and females.
πŸ’‘Odds Ratio
The odds ratio is a measure of association between two categorical variables, often used in case-control studies. It is defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. In the video, the odds ratio is calculated to determine if there is a significant difference in the odds of smoking between males and females, with an example given as 'the odds of a female not smoking are 0.71 times the odds of a male not smoking'.
πŸ’‘Attributable Risk
Attributable risk, also known as 'etiological fraction', is the proportion of disease in the exposed group that can be attributed to the exposure. In the video, it is one of the statistical measures calculated using the 'epi.2by2' command in the epiR package to understand the additional risk of smoking among females compared to males.
πŸ’‘Risk Difference
Risk difference is the absolute difference in the risk of an event occurring between two groups. Although not explicitly detailed in the script, it is implied as another measure that could be used to analyze the association between gender and smoking habits, alongside relative risk, odds ratio, and attributable risk.
πŸ’‘Categorical Variables
Categorical variables are variables that can take on one of a limited, and usually fixed, number of possible values, assigning them into different categories. In the video, gender and smoking status are used as categorical variables to explore their association using statistical measures.
πŸ’‘Chi-Square Test
The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables. The script notes that while the chi-square test can show if there is an association, it does not indicate the strength or direction of that association, which is why other measures like relative risk and odds ratio are used.
πŸ’‘epiR Package
The epiR package is an R library specifically designed for epidemiological statistics. The video demonstrates its use to calculate relative risk, odds ratio, and attributable risk, highlighting its utility in analyzing the association between categorical variables such as gender and smoking.
πŸ’‘2by2 Table
A 2by2 table is a specific type of contingency table used in statistics to organize data for two categorical variables, each with two levels. In the video, a 2by2 table is created to organize the data on gender and smoking status, which is then used to calculate various statistical measures.
πŸ’‘Bar Plot
A bar plot is a chart that represents the comparison of two or more quantities, with rectangular bars representing the data. The script describes the creation of a bar plot to visually examine the relationship between gender and smoking, with the bars placed side by side to compare the two groups.
πŸ’‘Confidence Interval
A confidence interval provides a range of values within which the true population parameter is likely to fall, with a certain level of confidence. In the video, confidence intervals are calculated for the relative risk, odds ratio, and attributable risk to provide a measure of precision for these estimates.
πŸ’‘Standard a, b, c, d Notation
The standard a, b, c, d notation is a way of organizing data in a 2by2 table format, where 'a' and 'b' represent the number of individuals with the exposure and outcome, 'b' without the outcome, 'c' without the exposure but with the outcome, and 'd' without both. The video script describes reorganizing the data into this format for standard interpretation.
Highlights

Introduction to a package for calculating 'relative risk', 'odds ratio', and 'attributable risk' or 'risk difference' using R statistical software.

Explanation of 'relative risk', 'odds ratio', and 'attributable risk' as measures of the association between two categorical variables.

Use of lung capacity data to explore the relationship between Gender and Smoking.

Creation and saving of a 2-way table using the 'table' command in R.

Visualization of the relationship between Gender and Smoking with a bar plot.

Observation of potential association between non-smoking and gender based on bar plot analysis.

Discussion on the limitations of the chi-square test of independence in indicating the strength or direction of association.

Introduction of the epiR package for calculating 'relative risk', 'odds ratio', and 'attributable risk'.

Guidance on installing and loading the epiR package in R.

Accessing help documentation for the epiR package.

Use of the '2by2' command to produce summaries of association measures.

Setting the 'method' argument for different study types in the 'epi.2by2' command.

Calculation and interpretation of 'relative risk', 'odds ratio', and 'attributable risk' from the 2by2 table.

Interpretation of odds ratio and its implications for the association between gender and smoking.

Standard a, b, c, d table notation for statistical formulas and interpretations.

Reorganization of the table into the standard a, b, c, d format using matrix and cbind commands.

Adding column names to the reorganized table for clarity.

Re-calculation of association measures using the reorganized table in the epi.2by2 command.

Interpretation of the recalculated odds ratio and its significance.

Upcoming discussion on correlation and linear regression in the next video.

Closing remarks and call to action to subscribe to MarinStatsLectures for more content.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: