A scenario where the measures of central tendency are skewed as a result of outliers

 

 

 

Provide an example of a scenario where the measures of central tendency are skewed as a result of outliers. How can such a situation be identified and addressed? Next, provide an example of what information SS provides us about a set of data. Under what circumstances will the value of SS equal 0 and is it possible for SS to be negative?

 

Sample Answer

 

 

 

 

 

 

 

 

 

Outliers can significantly distort measures of central tendency, particularly the mean. A common scenario is with income data.

 

📈 Skewed Measures of Central Tendency

 

 

Scenario

 

Imagine a company with ten employees. Nine of the employees are junior staff earning an annual salary of $50,000, while one employee is the CEO earning $5,000,000 per year.

Mean: To find the mean, you'd sum all salaries and divide by ten.

  • Mean=10(9×$50,000)+$5,000,000​=10$450,000+$5,000,000​=10$5,450,000​=$545,000

The mean salary is $545,000, which is not representative of what a "typical" employee earns. It's heavily skewed by the CEO's salary.

Median: The median is the middle value when the data is ordered. In this case, there are 10 salaries, so the median is the average of the 5th and 6th values, which are both $50,000.

  • Median=$50,000

The median of $50,000 is a much better representation of the central tendency for this group.

Mode: The mode is the most frequently occurring value.

  • Mode=$50,000

The mode also accurately reflects the most common salary.

In this scenario, the extreme outlier ($5,000,000) "pulls" the mean significantly to the right (positive skew), making it a misleading measure of the center. The median and mode are much more robust against the influence of outliers.

 

Identification and Addressing

 

Identification: One of the simplest ways to identify such situations is to calculate all three measures of central tendency (mean, median, mode) and compare them. If the mean is substantially different from the median and mode, it's a strong indicator of outliers or a skewed distribution. Additionally, using a box plot can visually reveal outliers as individual points beyond the "whiskers." You can also use statistical methods like the Interquartile Range (IQR) or z-scores to formally identify data points as outliers.

Addressing: When outliers skew the data, it's best to use the median as the primary measure of central tendency. The mean should only be used in conjunction with the median to highlight the skew. Depending on the analysis, you might also consider removing the outliers if they are due to data entry errors, or you can perform a data transformation (e.g., using a logarithmic scale) to reduce the effect of the extreme values.

 

📊 Sum of Squares (SS)

 

The Sum of Squares (SS) is a fundamental concept in statistics that measures the total variability or dispersion within a dataset. It quantifies how much the individual data points deviate from the mean of the dataset.

What it provides: SS provides a raw, unscaled measure of the total variation. A larger SS value indicates greater variability, meaning the data points are more spread out from the mean. A smaller SS value means the data points are more tightly clustered around the mean. SS is a foundational component for calculating other key statistics like variance and standard deviation. For instance, variance is simply the SS divided by the number of data points (or degrees of freedom), and standard deviation is the square root of the variance.

When SS equals 0: The value of SS will equal 0 if and only if all data points in the dataset are identical. This is because the formula for SS involves squaring the difference between each data point (