Skip to Content

What does a skewed data look like?

Data distributions can take many different shapes, but a common shape that arises is a skewed distribution. Skewed data refers to data that is not symmetrically distributed and tends to cluster more towards one side of the distribution versus being centered in the middle. There are two main types of skewed distributions – positively skewed and negatively skewed.

What is a positively skewed distribution?

A positively skewed distribution is characterized by having a long right tail. This means that the bulk of the data points cluster on the left side of the distribution, with fewer and fewer data points trailing off towards the right side. The mean and median will also differ in a positively skewed distribution, with the mean being pulled to the right of the median due to the influence of the small number of large values on the right tail.

Some examples of data sets that often have a positive skew include:

  • Incomes – there are a small number of people with very high incomes that pull the mean income up.
  • Test scores – most people cluster towards moderate scores, but there are a few very high scores.
  • House prices – most homes cluster around an “average” price, but there are some very expensive luxury homes.

Visualizing positively skewed data

Here is an example of what a positively skewed data distribution looks like in a histogram:

     _    
    | |    
    | |    
    | |    
    | |    
    | |    
____| |____
  |         |

The long right tail is clearly visible. The peak of the data is on the left side and declines slowly as you move to the right.

What is a negatively skewed distribution?

A negatively skewed distribution is the mirror opposite of a positively skewed distribution. It has a long left tail, with the bulk of the data clustered on the right side. The mean is pulled to the left of the median in this case.

Some examples of negatively skewed data sets include:

  • Product defects – most products will have very few defects
  • Emergency response times – most will be very quick, but a few will be slow
  • Test times – a few students might take a long time, but most complete quickly

Visualizing negatively skewed data

Here is an example histogram of a negatively skewed distribution:

  |         
  |         
  |         
  |         
  |         
  |         
 _|_________
|        __|

The long left tail is visible, with the peak on the right side. The data declines as you move left across the distribution.

Comparing positively and negatively skewed distributions

The key difference between positive and negative skew is which side the tail is on. Positively skewed has a right tail, while negatively skewed has a left tail. This leads to differences in where the mean and median lie:

Distribution Mean vs Median
Positively Skewed Mean > Median
Negatively Skewed Mean

The long tail pulls the mean away from the center in the direction of the skew. Meanwhile, the median always sits in the middle of the data and is not affected by skewness.

What causes data to be skewed?

There are a few potential reasons why data may have a skewed distribution:

  • Bounds – Data that has a natural lower or upper limit can lead to skew. For example, incomes cannot be less than zero, so there is only a right tail.
  • Processes – The process that generates the data may make large/small values less likely. Manufacturing processes often make defects unlikely, causing left skew.
  • Outliers – A few extreme outliers can pull the distribution in their direction. The bulk of the data remains unchanged.

Handling skewed data in analysis

Skewed data presents challenges for analysis because many statistical techniques assume a normal distribution. Here are some ways to handle skewed data:

  • Consider non-parametric tests instead of parametric tests – These make no distribution assumptions.
  • Try transforming the data – Log or power transforms can make skewed data more symmetric.
  • Use robust measures instead of the mean – Median and trimmed means are less influenced by skew.
  • Use regression models suited for skewed data – Gamma regression, for example.

Conclusion

In summary, skewed distributions are non-symmetric with a long tail on one side. Positively skewed data has a right tail and mean > median, while negatively skewed is the reverse. Bounds, processes, and outliers can cause skewness. For analysis, using non-parametric tests, data transformations, robust measures, and skewed-tolerant models can help overcome the challenges of skewed data.