Biased and unbiased estimators from sampling distributions examples

Khan Academy
28 Nov 201705:56
EducationalLearning
32 Likes 10 Comments

TLDRThe video script discusses the concept of unbiased estimators in statistics, using the example of estimating a population median with a sample median. It explains that an unbiased estimator has a sampling distribution centered around the true population parameter. The script uses a scenario with ping pong balls to illustrate this, showing that a sample median of five balls, taken multiple times, forms a sampling distribution that appears unbiased when roughly balanced around the population median of 16. Further examples compare different estimators for a population parameter, highlighting that the estimator with both low bias and low variability is the most desirable.

Takeaways
  • πŸ“Š The concept of an unbiased estimator was discussed, emphasizing the importance of the sampling distribution being evenly distributed around the true population parameter.
  • 🎱 The example of estimating the population median using ping pong balls numbered from zero to 32 was provided, with the true population median being 16.
  • πŸ”„ A random sample of five balls was taken and replaced for a total of 50 trials to simulate the sampling process and calculate the sample median.
  • πŸ“ˆ The dotplot was used to visualize the sampling distribution of the sample median, showing 50 data points representing the median of each sample.
  • πŸ”’ The sampling distribution for the sample median was analyzed, noting that it should be balanced around the population median to be considered unbiased.
  • πŸ€” The question of whether the sample median is a biased or unbiased estimator was posed, prompting viewers to consider the characteristics of the sampling distribution.
  • 🏁 Two scenarios were described where the sampling distribution would indicate bias: if it consistently underestimated (to the left) or overestimated (to the right) the true parameter.
  • πŸ“Š Another example was given with three different estimators approximating the sampling distribution for a population parameter of five, aiming to identify which estimator had both low bias and low variability.
  • πŸ”Ž The analysis of the three estimators revealed that Statistic C was clearly biased as its distribution was consistently to the left of the true parameter.
  • πŸ† It was concluded that Statistic A, with a relatively low spread and reasonable balance around the true parameter, was the estimator with both low bias and low variability.
Q & A
  • What is an unbiased estimator in statistics?

    -An unbiased estimator is a statistic that has a sampling distribution with a mean equal to the true value of the parameter being estimated. Essentially, an unbiased estimator does not systematically overestimate or underestimate the true parameter of the population.

  • What was the population median in the given example?

    -The population median in the given example is 16. This is the value that separates the higher half from the lower half of the data when the population is ordered from least to greatest.

  • How many balls did Alejandro sample in his experiment?

    -Alejandro sampled five balls in each trial of his experiment.

  • How many trials did Alejandro conduct in total?

    -Alejandro conducted a total of 50 trials in his experiment.

  • What is the sampling distribution of a statistic?

    -The sampling distribution of a statistic is the probability distribution of a given statistic based on a random sample of data. It shows how the statistic would vary if you took many samples of the same size from the same population and calculated the statistic for each sample.

  • How can you determine if a sampling distribution is balanced around the true parameter?

    -A sampling distribution is balanced around the true parameter if it is evenly distributed to the left and right of the true parameter value. If the distribution is skewed or consistently leans to one side, it is not balanced and may indicate bias in the estimator.

  • What is the population parameter that the three different estimators in the dotplots are trying to approximate?

    -The population parameter that the three different estimators are trying to approximate is the value five.

  • How can you identify a biased estimator from a dotplot of a sampling distribution?

    -A biased estimator can be identified from a dotplot if the distribution consistently leans to one side of the true parameter value, indicating that the estimator is either consistently underestimating or overestimating the parameter.

  • Which estimator from the dotplots has both low bias and low variability?

    -Based on the dotplots, estimator A has both low bias and low variability. It is reasonably balanced around the true parameter value and has a lower spread compared to estimator B, indicating that it consistently provides estimates closer to the true parameter value.

  • What does the term 'variability' refer to in the context of statistics?

    -In the context of statistics, 'variability' refers to the degree of spread or dispersion in a set of data points or a sampling distribution. Lower variability indicates that the estimator produces values that are more consistently close to the true parameter, while higher variability suggests greater fluctuation and less consistency.

  • How can you ensure that an estimator is unbiased and has low variability?

    -To ensure that an estimator is unbiased and has low variability, one would need to take a large number of samples, calculate the statistic for each sample, and analyze the resulting sampling distribution. The estimator would be unbiased if its sampling distribution is centered around the true parameter value, and it would have low variability if the estimates are consistently close to the true parameter with minimal fluctuation.

Outlines
00:00
πŸ“Š Analysis of Sample Median as an Unbiased Estimator

The paragraph discusses an experiment conducted by Alejandro to determine if the sample median is an unbiased estimator of the population median. He numbered ping pong balls from zero to 32, mixed them, and drew random samples of five balls, calculating the median of each sample. This process was repeated 50 times, with the results summarized in a dotplot, showing the sampling distribution of the sample median. The known population median is 16, and the distribution of sample medians appears balanced around this value, suggesting that the sample median is an unbiased estimator. The concept of an unbiased estimator is explained, emphasizing that for an estimator to be unbiased, its sampling distribution should be evenly distributed around the true population parameter.

05:01
πŸ“ˆ Comparing Estimators for Bias and Variability

This paragraph continues the discussion on statistical estimators by comparing three different estimators, represented by dotplots, to determine which has both low bias and low variability. The population parameter in question is five. It is noted that while all three estimators show some level of bias, estimator C is clearly biased as its sampling distribution is consistently to the left of the true parameter value. Estimators A and B appear to be reasonably unbiased, but the focus then shifts to variability. Estimator A is determined to have lower variability compared to estimator B, as the spread of its sampling distribution around the true parameter is narrower. Therefore, estimator A is identified as having both low bias and low variability, making it the preferred choice among the three.

Mindmap
Keywords
πŸ’‘unbiased estimator
An unbiased estimator is a statistical method that, on average, yields an accurate estimate of a population parameter. In the context of this video, it is used to describe whether the sample median is a good estimator of the population median. The video explains that an estimator is unbiased if its sampling distribution is centered around the true population parameter, which in this case is the median of 16.
πŸ’‘sampling distribution
The sampling distribution is the probability distribution of a given statistic based on a random sample of data. It shows how the sample statistic is distributed across many possible samples. In the video, the sampling distribution of the sample median is depicted as a series of dots, each representing a different sample median from the 50 trials conducted.
πŸ’‘population median
The population median refers to the middle value of a population's data set when it is ordered from least to greatest. In the video, the known population median is 16, and this value serves as the benchmark to evaluate the accuracy of the sample median as an estimator.
πŸ’‘sample median
The sample median is the middle value of a sample's data set, similar to the population median but derived from a subset of the data rather than the entire population. In the video, the sample median is calculated from a random sample of five ping pong balls, and this process is repeated 50 times to create the sampling distribution of sample medians.
πŸ’‘dotplot
A dotplot is a simple graph used to display the distribution of a set of data. Each dot represents a single observation or data point. In the video, the dotplot is used to visualize the sampling distribution of the sample medians, with each dot representing the median of one sample of five balls.
πŸ’‘random sample
A random sample is a subset of a population in which every member of the population has an equal chance of being included. It is used to make inferences about the population based on the sample data. In the video, Alejandro takes a random sample of five ping pong balls to calculate the sample median.
πŸ’‘parameter
In statistics, a parameter is a numerical value that describes a characteristic of a population. The video focuses on the population median as the parameter of interest. Parameters are often estimated using sample statistics.
πŸ’‘statistic
A statistic is a numerical value calculated from a sample of data and is used to estimate the corresponding parameter of the population. In the video, the sample median is the statistic used to estimate the population median.
πŸ’‘bias
Bias in statistics refers to the tendency of an estimator to consistently overestimate or underestimate the true parameter value. An estimator with low bias is desirable as it indicates that the estimator is likely to be close to the true value.
πŸ’‘variability
Variability refers to the degree of spread or dispersion in a set of data values. In the context of the video, it relates to how much the sample statistics vary from the true population parameter. Low variability indicates that the estimator produces values that are consistently close to the true parameter, which is a desirable characteristic.
πŸ’‘low bias and low variability
An estimator with both low bias and low variability is considered highly accurate and reliable. It not only provides estimates that are close to the true population parameter but also has consistent results across different samples.
Highlights

Alejandro's inquiry about the unbiased nature of the sample median as an estimator for the population median.

The use of ping pong balls numbered from zero to 32 to simulate a population.

The population median is established as 16.

A random sample of five balls is taken and the median of the sample is calculated.

The process of sampling, calculating the median, and replacing the balls is repeated 50 times.

The sampling distribution of the sample median is visualized through a dotplot.

For an estimator to be unbiased, its sampling distribution must be evenly distributed about the true population parameter.

The sampling distribution is balanced to the left and right of the true median of 16, suggesting the sample median is an unbiased estimator.

An example of a biased estimator is one where the sampling distribution is consistently to the left of the true parameter, underestimating it.

Three different estimators are compared using dotplots to show their sampling distributions.

The population parameter's actual value is given as five.

Statistic C is identified as biased due to its consistent underestimation of the true parameter.

Statistic A and B are considered reasonably unbiased based on their sampling distributions.

Statistic A is determined to have both low bias and low variability compared to statistic B.

The spread of the estimators around the true parameter is used to judge variability, with statistic A showing less spread.

The approximation of the sampling distribution is emphasized as a tool for understanding estimator performance.

The practical application of these concepts is to guide the selection of estimators that are both unbiased and have low variability.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: