Stacked and Grouped Bar Charts and Mosaic Plots in R |R Tutorial 2.6| MarinStatsLectures

MarinStatsLectures-R Programming & Statistics
9 Aug 201303:52
EducationalLearning
32 Likes 10 Comments

TLDRIn this instructional video, Mike Marin guides viewers on creating 'stacked bar charts', 'clustered bar charts', and 'mosaic plots' using R to analyze the relationship between two categorical variables. The tutorial utilizes lung capacity data and demonstrates how to generate a contingency table with the 'table' command. It also covers customization techniques such as changing chart types, calculating conditional probabilities, and adding legends, titles, and axis labels. The 'mosaicplot' command is introduced for an alternative visual representation, with tips on further customization available in the Help menu.

Takeaways
  • πŸ“Š The video is about creating 'stacked bar charts', 'clustered bar charts', and 'mosaic plots' in R for examining relationships between two categorical variables.
  • πŸ” The example data used is the lung capacity data, focusing on the relationship between 'Gender' and 'Smoking'.
  • πŸ“ The 'barplot' command in R is used to produce bar charts, with the default being 'stacked bar charts'.
  • πŸ“ˆ A 'contingency table' is required for plotting, which can be created using the 'table' command in R.
  • πŸ”‘ The table is saved as 'Table1' for later use in plotting.
  • πŸ”„ 'Clustered bar charts' can be produced by setting the 'beside' argument to TRUE in the 'barplot' command.
  • 🎯 Conditional probabilities, such as the probability of 'Smoking' given one is female, can be represented by manipulating 'Table1'.
  • πŸ“‹ A legend can be added to the plot using the 'legend.text' argument, with custom labels if desired.
  • 🏷 Titles and axis labels can be added using the 'main', 'xlab', and 'ylab' arguments.
  • πŸ–Œ The color of the bars can be changed using the 'col' argument, with different colors for different categories.
  • πŸŒ„ A 'mosaic plot' is another visualization option for two categorical variables, created with the 'mosaicplot' command.
Q & A
  • What are the types of plots discussed in the video?

    -The video discusses 'stacked bar charts', 'clustered bar charts', and 'mosaic plots' as types of plots for examining the relationship between two categorical variables.

  • What is the dataset used in the video for demonstration?

    -The lung capacity data is used in the video for demonstrating the creation of various plots.

  • How can one access help in R for a specific command?

    -To access help in R for a specific command, you can type 'help' followed by the command name or use a question mark '?' in front of the command name.

  • What is a contingency table and why is it needed for producing bar plots?

    -A contingency table is a type of table in statistics that displays the frequency distribution of variables and is needed for producing bar plots to visualize the relationship between categorical variables.

  • How can you save a contingency table in R?

    -You can save a contingency table in R by using the 'table' command and assigning the result to an object, for example, 'Table1'.

  • What is the default type of bar chart produced by R's 'barplot' command?

    -The default type of bar chart produced by R's 'barplot' command is a 'stacked bar chart'.

  • How can you change a stacked bar chart to a clustered bar chart in R?

    -You can change a stacked bar chart to a clustered bar chart in R by using the 'beside' argument in the 'barplot' command and setting it equal to TRUE.

  • What are conditional probabilities and how can they be represented in a plot?

    -Conditional probabilities are the probabilities of an event given another event has occurred. In a plot, they can be represented by manipulating the values in the contingency table to reflect these probabilities.

  • How can you add a legend to a plot in R?

    -You can add a legend to a plot in R by using the 'legend.text' argument in the plotting function and setting it to TRUE or providing a vector of custom labels.

  • What is the 'mosaic plot' and how is it used to examine relationships between categorical variables?

    -A 'mosaic plot' is a graphical representation used to examine the relationship between two categorical variables by using the 'mosaicplot' command in R, providing a visual representation of the data in the form of tiles.

  • How can you customize the appearance of plots in R, such as adding titles and changing colors?

    -You can customize the appearance of plots in R by using various arguments such as 'main' for the title, 'xlab' and 'ylab' for axis labels, 'col' for changing colors, and 'las' for rotating axis labels.

  • What is the 'las' argument used for in R plots?

    -The 'las' argument in R is used to control the style of axis labels. Setting it to 1 rotates the labels to be perpendicular to the axis.

  • How can viewers learn more about customizing plots in R?

    -Viewers can learn more about customizing plots in R by exploring the Help menu, which provides documentation and examples for various plotting functions and customization options.

Outlines
00:00
πŸ“Š Introduction to R Plots: Stacked and Clustered Bar Charts & Mosaic Plots

In this introductory segment, Mike Marin sets the stage for a tutorial on creating various types of bar charts and mosaic plots in R. The focus is on examining the relationship between two categorical variables using the lung capacity dataset. The video will guide viewers through the process of importing data, producing bar charts with the 'barplot' command, and generating a contingency table using the 'table' command. The segment also introduces the concept of conditional probabilities and explains how to customize plots with legends, titles, axis labels, and color schemes.

Mindmap
Keywords
πŸ’‘Stacked Bar Charts
Stacked bar charts are a type of chart used to compare the quantities of different categories stacked on top of each other. In the context of the video, they are used to examine the relationship between two categorical variables, such as Gender and Smoking, by stacking the counts of each category to show their proportions relative to each other. The script mentions that R's default is to produce stacked bar charts, which can be changed to clustered bar charts by setting the 'beside' argument to TRUE.
πŸ’‘Clustered Bar Charts
Clustered bar charts, also known as grouped bar charts, display the data for each category grouped side by side rather than stacked. This allows for a clear comparison of the quantities between categories. In the video, the script explains how to change the default stacked bar chart to a clustered bar chart by using the 'beside' argument in the 'barplot' command within R.
πŸ’‘Mosaic Plots
Mosaic plots are a graphical representation used to visualize the relationship between two categorical variables. They are similar to stacked bar charts but use areas of different colors to represent the proportions of categories. The script describes how to produce a mosaic plot using the 'mosaicplot' command in R, providing an alternative way to examine the relationship between Gender and Smoking.
πŸ’‘Contingency Table
A contingency table is a type of table in statistics that displays the frequency distribution of variables in a matrix format. In the video, the script explains that a contingency table is needed to make the bar plot and mosaic plot, using the 'table' command in R to create a table for the variables Smoke and Gender, which is then saved as 'Table1' for later use.
πŸ’‘Conditional Probabilities
Conditional probabilities are probabilities of an event given that another event has occurred. In the video script, the concept is used to illustrate how one might wish to express the plot in terms of the probability of 'Smoking' given one is female, and the probability of 'not smoking' given one is female. This involves manipulating the values in 'Table1' to represent these conditional probabilities.
πŸ’‘Legend
A legend in a plot is a key that explains the meaning of the symbols or colors used in the chart. The script mentions adding a legend to the plot using the 'legend.text' argument, which can either produce a default legend with the names stored in 'Table1' or allow the user to provide custom labels, such as 'Non-smoker' and 'Smoker'.
πŸ’‘Main Argument
The 'main' argument in a plotting function is used to add a title to the plot. In the script, it is mentioned as a way to provide a title for the bar plot or mosaic plot, helping to clearly communicate the main theme or focus of the visualization.
πŸ’‘Xlab and Ylab Arguments
The 'xlab' and 'ylab' arguments in a plotting function are used to label the x-axis and y-axis of the plot, respectively. The script suggests using these arguments to label the axes of the bar plot or mosaic plot, which aids in understanding the data being presented.
πŸ’‘Las Argument
The 'las' argument in a plotting function determines the style of axis labels. In the script, it is mentioned that setting the 'las' argument to 1 will rotate the values on the y-axis, which can improve the readability of the labels, especially when they are long or numerous.
πŸ’‘Col Argument
The 'col' argument in a plotting function is used to specify the color of the bars or elements in a plot. The script describes changing the color of the bars in the bar chart by using the 'col' argument, with an example of making one set of bars red and another set blue.
πŸ’‘R Programming Language
R is a programming language and software environment for statistical computing and graphics. Throughout the script, R is used to demonstrate how to produce various types of plots, such as stacked and clustered bar charts and mosaic plots, for analyzing the relationship between categorical variables using the lung capacity data.
Highlights

Introduction to producing different types of plots in R for examining relationships between two categorical variables.

Use of lung capacity data for demonstration.

Importing and attaching data in R.

Graphical examination of the relationship between Gender and Smoking variables.

Using the 'barplot' command to produce bar charts.

Accessing the Help menu in R for command assistance.

Creating a contingency table using the 'table' command.

Saving the contingency table as an object for later use.

Default production of stacked bar charts in R.

Conversion to clustered bar charts using the 'beside' argument.

Expressing plots in terms of conditional probabilities.

Adding a legend to the plot using 'legend.text'.

Customizing plot labels and titles with 'main', 'xlab', and 'ylab'.

Rotating y-axis values with the 'las' argument.

Changing bar colors using the 'col' argument.

Introduction to mosaic plots for examining categorical variable relationships.

Producing a mosaic plot using the 'mosaicplot' command.

Customizing mosaic plots with titles, labels, and colors.

Encouragement to explore the Help menu for further customization.

Closing remarks and invitation to watch other instructional videos.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: