Statistics-Left Skewed And Right Skewed Distribution And Relation With Mean, Median And Mode

Krish Naik
8 Apr 202106:51
EducationalLearning
32 Likes 10 Comments

TLDRIn this informative YouTube video, Krishnak addresses a common statistical interview question regarding right and left skew distributions. He explains the concept of right skew, using wealth distribution as an example, where a few extremely wealthy individuals create an elongated tail on the right side of the distribution curve. He also discusses symmetrical distributions, like age or height, which are typically preferred by machine learning algorithms. Left skew, or negative skew, is exemplified by human lifespan, where fewer people live to be very old. Krishnak further clarifies the relationship between mean, median, and mode in these distributions, noting that in a right skew, the mean is greater than the median, which is greater than the mode, while in a symmetrical distribution, they are approximately equal. He encourages viewers to understand these concepts with practical examples for better retention and application.

Takeaways
  • πŸ“š The video discusses interview questions related to statistical distributions, specifically right skew and left skew distributions.
  • πŸ“ˆ Right skew distribution is characterized by a longer tail on the right side, with examples including wealth distribution and length of comments on videos.
  • πŸ’¬ Length of comments on a YouTube channel is given as a real-world example of right skew distribution, with a few very long comments and many shorter ones.
  • πŸ“Š Symmetric distribution, exemplified by the normal distribution, is where the mean, median, and mode are approximately equal, and is often preferred in machine learning algorithms.
  • 🌐 Examples of symmetric distribution include age, weight, height, and features from the iris dataset like petal length and sepal width.
  • πŸ” Left skew or negative skew distribution has a longer tail on the left side, with the lifespan of humans given as an example where few live to very old age but many die younger.
  • πŸ“‰ In a left skew distribution, the mode is the highest, followed by the median, and then the mean, which is the smallest.
  • πŸ”‘ The relationship between mean, median, and mode in right skew distribution is mean > median > mode, which is crucial for understanding the shape of the distribution.
  • 🧐 The video emphasizes the importance of knowing examples of different distributions to explain theoretical concepts in practical terms during interviews.
  • πŸ‘¨β€πŸ« Krishnak, the presenter, encourages viewers to subscribe to the channel for more interview questions and explanations.
  • πŸ‘‹ The video ends with a sign-off, wishing viewers a great day ahead and inviting them to the next video.
Q & A
  • What is a right skew distribution?

    -A right skew distribution, also known as a positively skewed distribution, is a type of data distribution where the tail on the right side of the histogram is longer or fatter than the left side. This indicates that the data has more extreme values on the right side, with a longer tail extending towards the higher values.

  • Can you provide an example of a right skew distribution?

    -Wealth distribution is a classic example of a right skew distribution. A small number of individuals, such as Elon Musk or Jeff Bezos, have extremely high wealth, while the majority of people have more modest amounts of wealth.

  • What is the relationship between mean, median, and mode in a right skew distribution?

    -In a right skew distribution, the mean is greater than the median, and the median is greater than the mode. This is because the presence of extreme high values shifts the mean to the right, while the mode, representing the most common value, tends to be lower.

  • What is a symmetrical distribution?

    -A symmetrical distribution, such as the normal distribution, is characterized by a bell-shaped curve where the right and left sides are mirror images of each other. In such a distribution, the mean, median, and mode are approximately equal.

  • What are some examples of symmetrical distributions?

    -Age, weight, and height distributions are common examples of symmetrical distributions. These types of distributions are often seen in natural phenomena and are preferred in many statistical analyses and machine learning algorithms.

  • What is a left skew distribution, and what is another name for it?

    -A left skew distribution, also known as a negatively skewed distribution, is one where the tail on the left side of the histogram is longer or fatter than the right side. This indicates that there are more extreme values on the lower end of the scale.

  • Can you give an example of a left skew distribution?

    -The lifespan of human beings is an example of a left skew distribution. Most people live within a certain lifespan range, but there are fewer individuals who die at a very young age or live to be much older than the average.

  • What is the relationship between mean, median, and mode in a left skew distribution?

    -In a left skew distribution, the mode is the highest, followed by the median, and then the mean. The presence of extreme low values pulls the mean to the left, making it the smallest of the three measures.

  • Why are examples important when explaining statistical distributions?

    -Examples are important because they provide concrete instances that help illustrate abstract statistical concepts. They make it easier for individuals to understand and remember the characteristics of different distributions by relating them to real-world scenarios.

  • How does the length of comments on a YouTube video relate to a right skew distribution?

    -The length of comments on a YouTube video can relate to a right skew distribution if most comments are short (e.g., one-liners), but a small number of comments are significantly longer, creating a tail of longer comments that extends to the right on a histogram.

  • Why do many machine learning algorithms prefer data that follows a normal distribution?

    -Many machine learning algorithms prefer data that follows a normal distribution because it simplifies the statistical analysis and often leads to better model performance. The symmetry and bell-shaped curve of the normal distribution make it easier to apply various statistical assumptions and techniques.

  • What is the importance of understanding the theoretical aspects of statistical distributions along with practical examples?

    -Understanding the theoretical aspects of statistical distributions along with practical examples is crucial because it allows for a deeper comprehension of the underlying principles. This knowledge enables individuals to not only explain the distributions but also to apply them effectively in real-world scenarios and problem-solving.

Outlines
00:00
πŸ“Š Introduction to Skewness in Data Distribution

In this introductory paragraph, Krishnak, the host of the YouTube channel, presents a statistical question that was asked in an interview. He explains the concept of right skew and left skew distributions using the histogram and kernel density estimator as visual aids. He provides examples of right-skewed data, such as wealth distribution, highlighting the few extremely wealthy individuals like Elon Musk, Jeff Bezos, Mark Zuckerberg, and Bill Gates, and the length of comments on his videos, where some users write significantly longer comments than others. The paragraph sets the stage for a deeper discussion on skewness and its implications in data analysis.

05:02
πŸ“š Understanding Mean, Median, and Mode in Skewed Distributions

This paragraph delves into the relationship between mean, median, and mode in both right and left skew distributions. Krishnak explains that in a right-skewed distribution, the mean is greater than the median, which in turn is greater than the mode. He contrasts this with a symmetrical distribution, such as the normal distribution, where the mean, median, and mode are approximately equal. For left-skewed or negative skew distributions, the mode is the highest, followed by the median and then the mean. Krishnak emphasizes the importance of understanding these relationships and provides a clear visual representation to aid comprehension. He concludes by advising viewers to remember examples to effectively explain these concepts in interviews or discussions.

Mindmap
Keywords
πŸ’‘Right Skew Distribution
Right skew distribution, also known as positively skewed distribution, is a type of data distribution where the tail on the right side is longer or fatter than the left side. This indicates that the majority of the data points are concentrated on the lower end of the scale, with fewer data points extending to the higher end. In the video, wealth distribution is given as a classical example where a small number of extremely wealthy individuals like Elon Musk and Jeff Bezos represent the long tail on the right, while the majority of people have less wealth.
πŸ’‘Left Skew Distribution
A left skew distribution, or negatively skewed distribution, is the opposite of a right skew, where the tail on the left side is longer. This suggests that the data points are mostly concentrated on the higher end of the scale, with fewer points extending to the lower end. The lifespan of human beings is used as an example in the script, where there are fewer people living to very old ages compared to the average lifespan, creating a longer tail on the left side.
πŸ’‘Symmetric Distribution
A symmetric distribution is a type of data distribution where the data is evenly distributed around a central value, and the shape of the distribution is mirror-imaged on both sides. The normal distribution, also known as the Gaussian distribution, is a prime example of a symmetric distribution. In the context of the video, features like age, weight, and height are mentioned as examples that often follow a normal distribution, which is also preferred by many machine learning algorithms.
πŸ’‘Mean
The mean, often referred to as the average, is a measure of central tendency that is calculated by adding up all the values in a data set and then dividing by the number of values. In the video, the mean is discussed in the context of skewed distributions, where it is noted that in a right skew distribution, the mean is greater than the median and mode, indicating the influence of extreme values on the higher end.
πŸ’‘Median
The median is another measure of central tendency and is the middle value of a data set when it is ordered from least to greatest. If there is an even number of observations, the median is the average of the two middle numbers. The video explains that in a right skew distribution, the median is less than the mean, reflecting the central tendency without being affected by the extreme values on the higher end.
πŸ’‘Mode
The mode is the value that appears most frequently in a data set. It is a measure of central tendency that can be used for both numerical and categorical data. In the video, the mode is mentioned as being the smallest of the three measures of central tendency in a right skew distribution, and the largest in a left skew distribution, providing insight into the most common data point.
πŸ’‘Histogram
A histogram is a graphical representation of the distribution of data, where data is grouped into bins or intervals, and the height of each bin represents the frequency of data points within that range. In the video, histograms are mentioned as a way to visualize data and identify the skewness of the distribution, such as right skew or left skew.
πŸ’‘Kernel Density Estimator
A kernel density estimator (KDE) is a way to visualize the distribution of data points. Unlike histograms, KDEs create a smooth curve that estimates the probability density function of a random variable. In the script, KDE is mentioned as another method to plot and observe the skewness in data distributions.
πŸ’‘Interview Question
The term 'interview question' refers to the queries posed to candidates during job interviews to assess their knowledge, skills, and suitability for the position. In the context of the video, the presenter discusses a specific interview question related to statistical distributions and their characteristics, which is a common topic in data science and analytics interviews.
πŸ’‘YouTube Channel
A YouTube channel is a platform where content creators can upload and manage their videos. In the script, the presenter introduces themselves as the host of a YouTube channel and discusses their practice of sharing interview questions and answers related to statistical concepts, aiming to educate and prepare their audience for potential interviews.
Highlights

Introduction to the video and the statistical question posed in an interview.

Explanation of right skew distribution and its characteristics.

Wealth distribution as a classical example of right skew distribution.

Length of comments on videos as another example of right skew distribution.

Introduction to symmetrical distribution and its properties.

Examples of symmetrical distribution in age, weight, and height.

Importance of normal distribution for machine learning algorithms.

Definition and characteristics of left skew or negative skew distribution.

Lifespan of humans as an example of left skew distribution.

Explanation of the relationship between mean, median, and mode in right skew distribution.

Mean is greater than median, which is greater than mode in right skew distribution.

Mean, median, and mode are approximately equal in symmetrical distribution.

In left skew distribution, mode is the highest followed by median and then mean.

Importance of knowing examples to explain theoretical concepts in interviews.

Encouragement to subscribe to the channel for more interview question videos.

Conclusion of the video and sign-off.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: