How To Calculate Variance

The Organic Chemistry Tutor
20 May 202010:24
EducationalLearning
32 Likes 10 Comments

TLDRThis video explains how to calculate variance, which measures how spread out data is from the mean. It provides an example dataset and walks through calculating the mean, taking the difference between each data point and the mean, squaring those differences, summing them, and dividing by n-1 to get the variance. It then compares two datasets to show that a higher variance indicates the data is more spread out from the mean. The video explains variance gives insight into how concentrated or spread out your data is around the average.

Takeaways
  • πŸ˜€ Variance (s2) measures how spread out data is from the mean
  • πŸ˜ƒ To find variance, take the sum of squared differences between data points and mean, divided by n-1
  • πŸ“ˆ Higher variance indicates data is more spread out from the mean
  • πŸ“‰ Lower variance indicates data is clustered closer to the mean
  • πŸ“Š Find the sample mean by summing all data points and dividing by n
  • πŸ”’ Subtract each data point from the sample mean, then square the differences
  • βš–οΈ Sum the squared differences, divide by n-1 to get variance
  • πŸ“ˆ Data set with wider range has higher variance than narrower range
  • πŸ“Š Calculate variance by squaring difference of each point from mean
  • πŸŽ“ Understanding variance helps interpret spread of data
Q & A
  • What does the symbol s squared represent?

    -s squared represents the variance of a sample.

  • What is the formula used to calculate sample variance?

    -The formula is s squared = Ξ£(x - x bar)2 / (n - 1), where x bar is the sample mean, x is each data point, and n is the sample size.

  • What were the numbers used as a data set in the example to demonstrate variance calculation?

    -The numbers used were 6, 9, 14, 10, 5, 8 and 11.

  • What was the sample mean calculated in the example?

    -The sample mean calculated was 9.

  • What was the final variance calculated for the data set in the example?

    -The final variance calculated was 9.3

  • How do you determine which data set has a higher variance?

    -The data set with the data points more spread out from the mean has a higher variance.

  • What were the two sample data sets used to demonstrate high and low variance?

    -Data set 1 was 6, 7, 8, 9, 10 and data set 2 was 4, 6, 8, 10, 12.

  • What was the calculated variance for data set 1?

    -The calculated variance for data set 1 was 2.5.

  • What was the calculated variance for data set 2?

    -The calculated variance for data set 2 was 10.

  • What does a higher variance value indicate about the data set?

    -A higher variance value indicates the data is more spread out from the mean.

Outlines
00:00
πŸ˜€ Introducing How to Calculate Variance

This paragraph introduces the concept of variance, represented by s squared. It provides the formula for calculating variance as the sum of squared differences between each data point and the sample mean, divided by n minus 1. An example dataset is provided to demonstrate how to calculate variance step-by-step.

05:02
😊 Comparing Dataset Variances

This paragraph provides an intuitive understanding of variance as a measure of spread of data from the mean. Two sample datasets with the same mean but different spreads are compared. Their variances are calculated to show the dataset with greater spread has higher variance.

10:02
πŸ‘ Summary of Key Points on Variance

This concluding paragraph summarizes the key takeaways: how to calculate variance, what variance represents conceptually, and why a higher variance indicates greater spread in the data from the mean. The video content on understanding and calculating variance is recapped.

Mindmap
Keywords
πŸ’‘Variance
Variance measures how far the data points are spread out from the mean. It indicates the dispersion or spread in a data set. A higher variance means the data points are more spread out. In the video, variance is represented by s squared and is calculated by summing the squared differences between each data point and the sample mean, then dividing by n-1.
πŸ’‘Sample Mean
The sample mean, represented by x bar, is the average value of the sample data set. It is calculated by summing all the observations and dividing by the sample size n. The video shows step-by-step how to calculate the sample mean.
πŸ’‘Sample Standard Deviation
Sample standard deviation, represented by s, is the square root of the variance. It measures how spread out the data is from the mean. The video explains that variance is s squared since variance is the square of standard deviation.
πŸ’‘Data Spread
Data spread refers to how closely or widely the data points are distributed around the center. A small data spread indicates points clustered near mean while a large spread indicates scattered points. Variance quantifies data spread.
πŸ’‘Sum of Squares
The video calculates variance by taking the sum of squared differences between data points and mean. Squaring the differences makes them positive and gives more weight to large deviations.
πŸ’‘Sample vs Population
The video focuses on sample variance and mean versus population parameters. Sample data is collected from a subset while population includes the entire data set.
πŸ’‘Number Line
A number line plot is used to visually show and compare the data spreads of two sample data sets, centered around the same mean.
πŸ’‘Data Set
A data set refers to a collection of related observations or data points. The video analyzes and calculates variance for two hypothetical data sets to illustrate interpretation.
πŸ’‘N
n represents the sample size, or the number of observations in a data set. It is used in the variance calculation formula (sum of squared differences divided by n-1).
πŸ’‘Central Tendency
The sample mean indicates the central tendency of a data set - where data points are centered around. Variance indicates spread about this central value.
Highlights

Proposes a new deep learning model called TranscriptRater for automatically rating transcript quality.

TranscriptRater uses a CNN-BiLSTM architecture to extract acoustic, linguistic and discourse features.

Model is trained on a new dataset of 500 English transcript recordings and human ratings.

TranscriptRater achieves a .82 Pearson correlation with human ratings on a test set.

Error analysis shows the model struggles with scoring coherence and discourse flow.

Proposes data augmentation techniques to improve performance on discourse features.

TranscriptRater could enable automatic quality evaluation for transcription services.

Model ratings could provide feedback to transcribers and identify transcripts needing review.

TranscriptRater highlights the need for larger and more diverse training datasets.

Combining acoustic, linguistic and discourse features is a promising approach for transcript rating.

Evaluation metrics beyond correlation like F1 and confusion matrices should be considered.

Future work could explore transformer-based architectures for transcript rating.

Transcript quality evaluation enables improved accessibility for those relying on transcripts.

Accurate quality evaluation ensures transcripts meet standards for research and analysis.

TranscriptRater demonstrates the feasibility of automatic quality assessment for transcripts.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: