Pearson's chi square test (goodness of fit) | Probability and Statistics | Khan Academy

Khan Academy
10 Nov 201011:48
EducationalLearning
32 Likes 10 Comments

TLDRThe script details a potential restaurant buyer's approach to validate the owner's claim about customer distribution throughout the week. Suspicious of the provided percentages, the buyer conducts a hypothesis test using a chi-square statistic to determine if the distribution is accurate. With a significance level of 5%, they calculate expected customer numbers, compare them with observed data, and find a chi-square statistic of 11.44. This value exceeds the critical value of 11.07 for 5 degrees of freedom, leading to the rejection of the owner's distribution hypothesis. The buyer concludes that the distribution is not a good fit, suggesting the owner's data may be unreliable.

Takeaways
  • 🍽️ The individual is considering buying a restaurant and seeks to validate the owner's claim about customer distribution throughout the week.
  • ⚠️ The owner provides a distribution claiming 10% of customers visit on Monday and Tuesday, 15% on Wednesday, and so on, with the restaurant closed on Sunday.
  • πŸ“Š The individual decides to conduct an observational study to gather data on the actual number of customers visiting each day of the week.
  • πŸ€” Suspicion arises regarding the accuracy of the owner's distribution, prompting the need for a hypothesis test to validate the claim.
  • 🧐 The null hypothesis (H0) is that the owner's distribution is correct, while the alternative hypothesis (H1) is that it is incorrect.
  • πŸ“‰ A chi-square test is chosen to determine if the observed customer distribution significantly differs from the owner's claimed distribution.
  • πŸ“š The chi-square statistic is calculated using the formula: (observed - expected)^2 / expected for each day, summed across all days.
  • πŸ”’ The expected number of customers for each day is determined based on the total number of customers observed for the week, proportionate to the owner's claimed percentages.
  • πŸ“ˆ The chi-square statistic is calculated to be 11.44, which is then compared to a critical value from the chi-square distribution table.
  • πŸ“Š The degrees of freedom for the test are determined to be 5 (number of days minus one), which is used to find the critical chi-square value from a statistical table.
  • πŸ“‰ The critical chi-square value at a 5% significance level with 5 degrees of freedom is found to be 11.07.
  • 🚫 Since the calculated chi-square statistic (11.44) is greater than the critical value (11.07), the null hypothesis is rejected, indicating the owner's distribution does not fit the observed data.
Q & A
  • What is the purpose of the chi-square test mentioned in the script?

    -The chi-square test is used to determine whether a given distribution of observed data fits a theoretical distribution. In this case, it is used to test if the owner's distribution of customer visits throughout the week is accurate.

  • What is the null hypothesis in the context of this script?

    -The null hypothesis is that the owner's distribution of customer visits is correct, meaning that the observed data should align with the expected percentages provided by the owner.

  • What is the alternative hypothesis in this scenario?

    -The alternative hypothesis is that the owner's distribution is not correct, suggesting that the observed data does not align with the expected percentages and that the distribution should be rejected.

  • What is the significance level used in this chi-square test?

    -The significance level used in this test is 5%, which means that if the probability of obtaining the observed data or more extreme data is less than 5%, the null hypothesis will be rejected.

  • How is the expected number of customers calculated for each day of the week?

    -The expected number of customers for each day is calculated by taking the total number of customers for the week (200 in this case) and multiplying it by the expected percentage for that day (e.g., 10% for Monday, 15% for Wednesday, etc.).

  • What is the chi-square statistic and how is it calculated?

    -The chi-square statistic is a measure used in the chi-square test to quantify the difference between observed and expected data. It is calculated by summing the squared differences between observed and expected values, each divided by the expected value.

  • What is the result of the chi-square statistic calculated in the script?

    -The calculated chi-square statistic in the script is 11.44.

  • What are degrees of freedom in the context of a chi-square test?

    -Degrees of freedom in a chi-square test refer to the number of independent pieces of information available to calculate the expected values. In this case, with six days of data and one total, the degrees of freedom are 5 (n - 1, where n is the number of categories).

  • How is the critical chi-square value determined?

    -The critical chi-square value is determined by looking at a chi-square distribution table or using statistical software, using the degrees of freedom and the significance level (alpha) to find the value that corresponds to the desired probability.

  • What does the critical chi-square value of 11.07 mean in this context?

    -The critical chi-square value of 11.07 means that there is a 5% chance of obtaining a chi-square statistic of 11.07 or higher if the null hypothesis is true. Since the calculated chi-square statistic is higher, it suggests that the observed data is significantly different from the expected distribution.

  • What conclusion is drawn from the chi-square test in the script?

    -The conclusion drawn from the chi-square test is that the owner's distribution is not a good fit for the observed data, as the calculated chi-square statistic (11.44) is more extreme than the critical value (11.07), leading to the rejection of the null hypothesis.

Outlines
00:00
πŸ€” Evaluating Restaurant Customer Distribution

The speaker is considering purchasing a restaurant and begins by questioning the current owner about the daily customer distribution. The owner provides a distribution claiming 10% of customers visit on Monday and Tuesday, 15% on Wednesday, and so on, with the restaurant closed on Sunday. To verify the accuracy of this distribution, the speaker decides to conduct a hypothesis test using observed customer data collected throughout the week. The null hypothesis is that the owner's distribution is correct, while the alternative hypothesis is that it is not. The test will be conducted at a 5% significance level, and a chi-square statistic will be calculated to determine if the observed data fits the owner's claimed distribution.

05:01
πŸ“Š Calculating the Chi-Square Statistic

The speaker proceeds to calculate the expected number of customers for each day based on the owner's distribution, using a total of 200 customers observed over the week. The expected numbers are calculated as percentages of the total: 20 customers expected on Monday and Tuesday, 30 on Wednesday, 40 on Thursday, 60 on Friday, and another 30 on Saturday. The chi-square statistic is then computed by taking the difference between observed and expected numbers for each day, squaring these differences, and dividing by the expected numbers. The sum of these values gives the chi-square statistic, which is found to be 11.44 after performing the calculations.

10:04
πŸ“‰ Interpreting the Chi-Square Test Results

With the chi-square statistic calculated, the speaker determines the critical chi-square value at a 5% significance level for 5 degrees of freedom, which is 11.07. The degrees of freedom are calculated as the number of categories minus one (in this case, 6 days minus Sunday's closure). The calculated chi-square statistic of 11.44 is compared to this critical value. Since 11.44 is greater than 11.07, it indicates that there is less than a 5% chance of observing such an extreme result if the owner's distribution were true. Therefore, the speaker concludes that the owner's distribution does not fit the observed data and decides to reject the null hypothesis, suggesting that the owner's distribution is not accurate.

Mindmap
Keywords
πŸ’‘Restaurant
A restaurant is a business that prepares and serves food and drinks to customers. In the video's context, the restaurant is the subject of potential purchase, and the script revolves around analyzing the distribution of customer visits to ensure the business's viability. The script mentions the restaurant's weekly customer distribution and its closure on Sundays.
πŸ’‘Distribution
In statistics, distribution refers to the way a certain variable is spread across a population. In the script, the owner provides a distribution of customer visits throughout the week, with percentages allocated to each day. The concept is central to the video's theme as it is the basis for the hypothesis test conducted by the potential buyer.
πŸ’‘Hypothesis Test
A hypothesis test is a statistical method used to make decisions about a population parameter based on a sample. In the script, the potential buyer conducts a hypothesis test to determine if the owner's claimed customer distribution is accurate. The test is a key part of the video's narrative as it helps decide whether to accept or reject the owner's claim about the business.
πŸ’‘Null Hypothesis
The null hypothesis is a statement of no effect or no difference that is tested with a statistical significance test. In the video, the null hypothesis is that the owner's distribution of customer visits is correct. The potential buyer aims to either accept this hypothesis or reject it based on the observed data.
πŸ’‘Alternative Hypothesis
The alternative hypothesis is what you might believe if the null hypothesis is not true. In the script, the alternative hypothesis is that the owner's distribution is incorrect, suggesting that the potential buyer should not rely on the provided distribution for decision-making.
πŸ’‘Significance Level
The significance level, often denoted by alpha, is the probability of rejecting the null hypothesis when it is true. In the video, a significance level of 5% is chosen, meaning there is a 5% chance of incorrectly rejecting the null hypothesis if it is actually true.
πŸ’‘Chi-Square Statistic
The chi-square statistic is a measure used in hypothesis testing to determine how likely it is that an observed distribution differs from an expected distribution. In the script, the chi-square statistic is calculated to test the owner's claim about the customer distribution, with the result used to decide whether to reject the null hypothesis.
πŸ’‘Expected Observed
Expected observed refers to the number of occurrences expected in a sample if the null hypothesis were true. In the video, the potential buyer calculates the expected number of customers for each day based on the owner's distribution and compares it to the actual observed data to determine if the owner's claim is credible.
πŸ’‘Degrees of Freedom
Degrees of freedom in statistics refer to the number of values that are free to vary in a calculation. In the script, the degrees of freedom for the chi-square test are determined to be 5 (n-1), where n is the number of categories (days of the week), and this affects the chi-square distribution used to evaluate the test.
πŸ’‘Critical Chi-Square Value
The critical chi-square value is the value from the chi-square distribution that corresponds to the significance level. In the video, the critical value of 11.07 is used to determine if the observed chi-square statistic is extreme enough to reject the null hypothesis. If the calculated statistic exceeds this value, the null hypothesis is rejected.
Highlights

The potential buyer is considering purchasing a restaurant and seeks to verify the owner's claim about customer distribution.

The restaurant owner provides a weekly customer distribution with percentages for each day, except Sunday when it's closed.

The buyer decides to conduct an observation to test the accuracy of the owner's customer distribution claim.

A hypothesis test is planned with a null hypothesis that the owner's distribution is correct and an alternative hypothesis that it is not.

The significance level for the hypothesis test is set at 5%.

The buyer will use a chi-square statistic to evaluate the fit of the observed data to the owner's distribution.

The chi-square statistic is calculated based on the difference between observed and expected customer numbers, normalized by the expected values.

The total number of customers observed in the restaurant for the week is 200.

Expected customer numbers are calculated based on the owner's distribution percentages applied to the total customer count.

The chi-square statistic is the sum of squared differences between observed and expected values, each divided by the expected number.

The calculated chi-square statistic for the observed data is 11.44.

The degrees of freedom for the chi-square test are determined to be 5, as one degree of freedom is lost for each expected value calculated.

The critical chi-square value at a 5% significance level with 5 degrees of freedom is 11.07.

The observed chi-square statistic of 11.44 is more extreme than the critical value, indicating that the owner's distribution is unlikely to be accurate.

Based on the hypothesis test, the buyer decides to reject the owner's claim about the customer distribution, as it does not fit the observed data.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: