What is COVARIANCE? What is CORRELATION? Detailed video!

zedstatistics
6 May 202020:59
EducationalLearning
32 Likes 10 Comments

TLDRThis video concludes a series on descriptive statistics by focusing on covariance and correlation, which measure the relationship between two variables. The presenter explains the concepts using intuitive examples, such as the correlation between temperature and ice cream sales. They distinguish between covariance, which indicates the direction of the relationship, and correlation, which quantifies the strength. The tutorial includes step-by-step calculations using sample data and a discrete probability distribution, demonstrating how to use Excel for these computations. The video aims to demystify these statistical measures and show their practical applications in analyzing data.

Takeaways
  • ๐Ÿ“Š The video discusses covariance and correlation, which are measures of the relationship between two variables.
  • ๐Ÿ” Covariance and correlation are not typically part of a standard suite of descriptive statistics but are important for understanding variable relationships.
  • ๐Ÿ“ˆ Positive covariance or correlation indicates that two variables move in the same direction, while negative indicates they move in opposite directions.
  • ๐Ÿ“‰ The difference between covariance and correlation is that covariance measures the direction of the relationship, whereas correlation also provides a measure of the strength of the relationship.
  • ๐Ÿงฎ To calculate covariance from a sample, one must find the mean of the variables, then the deviations from the mean, and finally the product of these deviations.
  • ๐Ÿ“ The covariance formula involves summing the products of the deviations from the mean, divided by the number of observations minus one (n-1).
  • ๐Ÿค” The division by n-1 in covariance calculation is due to degrees of freedom, which is a concept related to the number of independent pieces of information in a dataset.
  • ๐Ÿ”ข The correlation formula is the covariance divided by the product of the standard deviations of the two variables, resulting in a value between -1 and 1.
  • ๐Ÿ“‹ The video provides a step-by-step guide on how to calculate covariance and correlation using sample data and a discrete probability distribution in Excel.
  • ๐Ÿ’ป Excel formulas such as COV.S for sample covariance and CORREL for correlation can simplify the calculation process without manually going through the steps.
  • ๐Ÿ“š The video concludes by explaining why the concept of degrees of freedom does not apply to theoretical distributions in the same way as it does to sample data.
Q & A
  • What is the main focus of the final video in the descriptive statistics series?

    -The main focus of the final video is on covariance and correlation, which describe the relationship between two numerical variables.

  • Why are covariance and correlation not typically part of a standard suite of descriptive statistics measures?

    -Covariance and correlation are not typically part of a standard suite of descriptive statistics measures because they deal with the relationship between two variables, rather than characteristics of a single variable.

  • What is the difference between positive covariance and negative covariance?

    -Positive covariance indicates that two variables move in the same direction, while negative covariance indicates that they move in opposite directions.

  • Why is the stock market movement considered to have little or no correlation with temperature?

    -The stock market movement is considered to have little or no correlation with temperature because temperature does not significantly affect the general stock market movement.

  • What is the difference between covariance and correlation?

    -Covariance measures the direction of the relationship between two variables, while correlation measures both the direction and the strength of the relationship, with values ranging from -1 to 1.

  • Why do we need to find the mean of variables when calculating covariance or correlation from a sample?

    -We need to find the mean of variables to assess whether the values are higher or lower than the mean, which helps in determining the relationship between the variables.

  • What is the purpose of dividing by n-1 instead of n when calculating covariance or correlation from a sample?

    -Dividing by n-1 instead of n accounts for the degrees of freedom in the sample, which is necessary because we are estimating the mean from the sample data and need to adjust for the uncertainty this introduces.

  • How does the formula for calculating covariance from a sample compare to the formula for calculating variance?

    -The formula for calculating covariance from a sample is similar to the formula for calculating variance, with the main difference being that covariance involves two different variables, while variance involves a single variable.

  • What is the significance of the correlation coefficient value of 0.82 in the example provided?

    -A correlation coefficient value of 0.82 indicates a strong positive relationship between the two variables, with values closer to 1 or -1 representing stronger relationships.

  • Why is it necessary to find the standard deviation when calculating correlation?

    -Finding the standard deviation is necessary when calculating correlation because it normalizes the covariance, allowing for a measure of the strength of the relationship that is independent of the scale of the variables.

  • How does the calculation of covariance and correlation differ when working with a discrete probability distribution compared to a sample?

    -When working with a discrete probability distribution, the calculations involve multiplying each deviation by its corresponding probability and summing the products, whereas with a sample, the deviations are multiplied together and then averaged without considering probabilities.

  • Why is the correlation between X and Y in the discrete probability distribution example close to -1?

    -The correlation between X and Y in the discrete probability distribution example is close to -1 because the variables are very strongly negatively related, moving in opposite directions with a high degree of consistency.

  • What is the reason for not using n-1 in the denominator when calculating covariance or correlation from a theoretical distribution?

    -When working with a theoretical distribution, there is no need to adjust for degrees of freedom as the expected values are not estimated from a sample but are given directly, eliminating the additional uncertainty that requires the n-1 adjustment.

Outlines
00:00
๐Ÿ“Š Introduction to Covariance and Correlation

This paragraph introduces the topic of covariance and correlation as part of a series on descriptive statistics. The speaker explains that while these measures typically relate to the relationship between two variables, they are not standard descriptive statistics but are important nonetheless. The video promises to provide intuition behind these concepts and to explore two scenarios where they are used: analyzing a sample for covariance or correlation and working with a discrete probability distribution. The speaker also mentions a demonstration of Excel techniques for calculating these measures.

05:01
๐Ÿ” Understanding Covariance and Correlation Calculations

The speaker delves into the calculation of covariance and correlation from a sample, using a hypothetical example involving stock prices. The process involves finding the mean of the variables, calculating deviations from the mean, and then multiplying these deviations to find a numerical measure of their relationship. Covariance is presented as an average of these products, with a positive or negative sum indicating the direction of the relationship. The paragraph also touches on the concept of degrees of freedom and variance, highlighting the similarity between the formulas for covariance and variance.

10:03
๐Ÿ“˜ Calculating Covariance and Correlation from a Sample

This section continues the discussion on calculating covariance and correlation but focuses on the practical application using Excel. The speaker explains the Excel functions for calculating covariance and correlation from a sample, emphasizing ease of use and the importance of understanding the underlying concepts. The paragraph also discusses the difference between covariance and correlation, with the latter providing a measure of the strength of the relationship between variables, normalized by their standard deviations.

15:06
๐Ÿ“Š Analyzing Covariance and Correlation with a Discrete Probability Distribution

The speaker shifts the focus to calculating covariance and correlation from a discrete probability distribution, rather than a sample. Using a detailed example with hypothetical stock outcomes and their probabilities, the process involves finding expected values, deviations from these expected values, and then multiplying these deviations to assess the relationship between the variables. The speaker demonstrates how to calculate covariance and standard deviations in this context, leading to the calculation of a strong negative correlation, indicating a strong inverse relationship between the variables.

20:06
๐Ÿ“ˆ Conclusion and Next Steps in Descriptive Statistics

In the final paragraph, the speaker concludes the discussion on covariance and correlation and the entire series on descriptive statistics. They encourage viewers to subscribe to the channel, like the video, and share it with friends to support the channel's growth. The speaker, Justin Seltzer, invites viewers to check out more of his content on his website, indicating the end of the video series and providing a platform for further exploration of statistics.

Mindmap
Keywords
๐Ÿ’กDescriptive Statistics
Descriptive statistics are numerical measures that summarize and describe the features of a set of data. In the video, descriptive statistics serve as the overarching theme, with the focus on covariance and correlation as specific types of descriptive measures that describe the relationship between two variables. The script explains how these statistics can be used to understand the relationship between variables such as temperature and ice cream sales or temperature and pneumonia presentations.
๐Ÿ’กCovariance
Covariance is a measure that quantifies the joint variability of two variables. A positive covariance, as mentioned in the script with temperature and ice cream sales, indicates that the variables move in the same direction, while a negative covariance, like the example of temperature and pneumonia presentations, indicates they move in opposite directions. The script provides a detailed calculation of covariance from a sample, illustrating how it can indicate the direction of the relationship between two variables.
๐Ÿ’กCorrelation
Correlation is a statistical measure that expresses the extent to which two variables are linearly related. The script differentiates correlation from covariance by explaining that correlation normalizes the covariance by the product of the standard deviations of the two variables, resulting in a value between -1 and 1. An example given in the script is the correlation between stock market movement and temperature, which is suggested to be close to zero, indicating no linear relationship.
๐Ÿ’กPositive Covariance/Correlation
Positive covariance or correlation indicates that as one variable increases, the other variable also tends to increase. In the video script, this is exemplified through the relationship between temperature and ice cream sales, where warmer temperatures are associated with higher sales, suggesting a positive relationship.
๐Ÿ’กNegative Covariance/Correlation
Negative covariance or correlation implies that as one variable increases, the other variable tends to decrease. The script uses the example of temperature and pneumonia presentations to illustrate a negative relationship, where warmer temperatures might lead to fewer pneumonia cases.
๐Ÿ’กDegrees of Freedom
Degrees of freedom is a term related to the number of independent values that can vary in a calculation. In the context of the script, when calculating covariance or standard deviation from a sample, the divisor n-1 is used instead of n to account for degrees of freedom, reflecting the reduction in the number of independent data points. The script briefly touches on this concept and provides a link for further explanation.
๐Ÿ’กVariance
Variance is a measure of the dispersion or spread of a set of data points. In the script, it is shown that the formula for variance is similar to that of covariance, but applied to a single variable. The script explains that covariance can be thought of as a version of variance that applies to two different variables.
๐Ÿ’กStandard Deviation
Standard deviation is a measure that indicates the amount of variation or dispersion of a set of values. In the script, standard deviation is calculated as part of finding the correlation, which requires dividing the covariance by the product of the standard deviations of the two variables involved.
๐Ÿ’กDiscrete Probability Distribution
A discrete probability distribution is a listing of all possible outcomes of a random variable with their associated probabilities. In the script, an example of a discrete probability distribution is given with different stock market outcomes and their probabilities, which is used to calculate covariance and correlation.
๐Ÿ’กExpected Value
The expected value is the average value that a random variable may be expected to take on over many trials. In the context of the script, the expected values of X and Y are calculated by multiplying each outcome by its probability and summing these products, which is used as a basis for calculating covariance and correlation from a theoretical distribution.
๐Ÿ’กExcel Formula
Excel formulas are used in the script to demonstrate how to calculate covariance and correlation using spreadsheet software. The script provides examples of Excel functions such as 'COVARIANCE.S' for sample covariance and 'CORREL' for sample correlation, showing how these can simplify the calculation process.
Highlights

Introduction to the final video in the descriptive statistics series focusing on covariance and correlation.

Covariance and correlation are measures of the relationship between two variables, not typically part of standard descriptive statistics.

Positive covariance or correlation indicates variables moving in the same direction, such as temperature and ice cream sales.

Negative covariance or correlation suggests variables moving in opposite directions, like temperature and pneumonia presentations.

Zero covariance or correlation, as with temperature and stock market movement, indicates no relationship between variables.

Covariance and correlation are calculated differently, with correlation providing a measure of strength of the relationship.

Explanation of calculating covariance from a sample, including finding the mean of variables and deviations from the mean.

Covariance formula involves multiplying deviations and averaging them, providing a measure of the relationship's direction.

Degrees of freedom and the rationale behind dividing by n-1 in covariance calculations are discussed.

Variance formula is compared to covariance formula, highlighting the similarity and the variance's role in correlation calculation.

Correlation is calculated as covariance divided by the product of standard deviations, resulting in a value between -1 and 1.

Excel formulas for calculating covariance and correlation from a sample are demonstrated.

Transition to calculating covariance and correlation from a discrete probability distribution in Excel.

Finding the expected value (mean) of variables in a probability distribution by incorporating probabilities.

Multiplying deviations from the mean by their respective probabilities to calculate covariance in a theoretical distribution.

Calculating variance and standard deviation for a probability distribution to find the correlation.

Differences in calculating covariance and correlation from a sample versus a theoretical distribution are explained.

Final thoughts on the importance of understanding the underlying concepts of covariance and correlation calculations.

Conclusion of the descriptive statistics video series and an invitation to subscribe and engage with the channel.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: