Add Maple

How to remove the long tail

In survey data, you might have a column with many small categories that only have a few responses each. For example, a global survey with a country question could include many countries with very few respondents. Filtering out these smaller categories can be helpful because:

• Small categories may prevent statistical tests, like Chi-Square, from working properly.

• Charts and tables can become too large and difficult to read.

You can filter out categories with a small number of responses as follows:

This is an example of a country column with a long tail of responses. You can see from the stats box that there are 186 categories and that the median count per category is only 39.5, despite the average (mean) being 351. This shows that there is a long tail.

1
Country column example showing a long tail with many small categories.
Country column example showing a long tail with many small categories.

The first step is to add a filter, this can be done via the More menu in the top right.

2
Open Add Filter and choose the column to filter by minimum counts.
Open Add Filter and choose the column to filter by minimum counts.

Select the column that you would like to filter (country in this case). Then select "has more than" for the filter type. Then add the minimum number of responses per category, in this example we chose 200.

3
Set "has more than" to a threshold (e.g., 200) to remove the long tail.
Set "has more than" to a threshold (e.g., 200) to remove the long tail.