Avoid data sampling in Google Analytics

It has certainly happened to everyone who works with Google Analytics that they have fallen into the sampling trap during evaluations. One is focused on the data in the report and overlooks the small yellow sampling icon that appears in the top left of the report  and indicates that the calculations are only based on a subset of the data collected.

It is particularly annoying if you do not notice this immediately and only wonder when comparing different reports later that the data output does not match. We’ll show you how to deal with sampling problems.

What does data sampling actually mean?

Sampling in Google Analytics is nothing more than data collection based on random samples. Such sample surveys are widespread in statistics, as the analysis of subsets can be carried out much faster than the analysis of the total set with similar results. The results of the partial amount are then simply extrapolated to the total amount. Whether the results are reliable, however, depends heavily on the selection of the sample data.

When is the data sampled on Google Analytics?

As soon as the evaluation requirements in the Google Analytics reports are too complex, the data is sampled. According to Google  , sampling occurs automatically as soon as more than 500,000 sessions are recorded for a report.

How do I know if the data is being sampled?

Unfiltered data is always shown in the standard reports.

When using

  • custom reports
  • Filter
  • Segments
  • Secondary dimension
  • User or behavior flow
  • or when looking at long periods of time

the calculations may become too extensive and the data will be sampled.

You can see whether a report is being sampled in the upper left corner to the right of the report name. A small yellow icon appears there as soon as sampled data is used. If the calculation is based on the complete data, this icon is green. If you move the mouse over the symbol, a small window opens in which you can see the percentage of total sessions the calculation is based on.

The yellow sign next to the report name indicates that the report is based on a sample survey

When does the use of sample surveys become a problem?

When monitoring trends, a sample survey of 80 or 90% of the total sessions is completely sufficient.

The lower the sampling rate, i.e. the less data on which the data calculation is based, the greater the inaccuracies will of course be. With every new data query you will receive new results. This makes it difficult to compare data from different reports.

When comparing total traffic and SEO traffic, it can happen, for example, that the SEO traffic of a URL is higher than the total traffic, because the report for the total traffic contains unsampled data and the SEO report is based on sample surveys. This can be  fatal for you in a content audit and other evaluations. This can also be the case when comparing monthly reports and annual reports.

In order to work accurately with the numbers, especially for comparisons, these data are therefore useless.

What options are there to bypass sampling?

Fortunately, there are several ways to stop sampling.

1. Set higher accuracy

If the sampling rate is quite high, it may be sufficient to increase the sampling rate. To do this, move the mouse over the yellow sampling icon. In the window that then opens, you have the option of setting a higher level of accuracy.

You can set a higher accuracy to bypass the sampling

The higher accuracy is at the expense of the response time, which then increases.

At best, this will already switch off sampling. You can recognize this by the fact that the icon turns green.

2. Uses standard reports

The standard reports always use the entire amount of data and are not sampled.

Sometimes you can get the same results with standard reports to avoid using segments or secondary dimensions.

Example: If you want to count the sessions for the top landing pages via organic search, you can select the report Behavior – Website content – Landing pages and overlay the segment “organic traffic”. It is already being sampled 🙁

Segments like “organic access” ensure that the data is more likely to be sampled if you look at a longer period of time.

You will get the same data if you go to the Acquisition – All Access – Channels report and click on the organic search. Then you select the primary dimension landing page. And it is not sampled 🙂

The same evaluations can be obtained unsampled by using the corresponding standard reports.

3. Shortens the observation period

Another possibility is to shorten the observation period. This will reduce the number of visits. For example, if you want to create an annual report and you receive data on a sample basis, then you should try to pull the data quarterly or monthly instead. This gives you a smaller amount of data to process. Then you can put the numbers back together, for example in Excel.

4. Uses several filtered data views

If you want to look at a certain area more often and slip into the sampling simply by using a segment, then you should perhaps create your own data view for this segment area. The standard reports in this area are not sampled.

5. Divides data into different properties

The automatic use of sample surveys is carried out at the property level. For example, if you have many different country websites, you could create a separate property for each country website.

6. Works with tools that use the Google Analytics API

With the help of the API, the report requests can be broken up so that the use of sampling is canceled out. The amount of data for each individual query is kept small and all individual query data are then reassembled. So far, for example, we’ve had good experiences with NextAnalytics, but AnalyticsEdge is also recommended.

7. Google Analytics 360 Suite

If you work with such large amounts of data that your reports are sampled very quickly or very often, then you should consider using Google Analytics Premium or the Google Analytics 360 Suite . The use of GA Premium Version has many advantages, including the ability to create unfiltered reports. However, the web analysis tool is not free in this version.

Conclusion

The sampling function in Google Analytics creates reports on the basis of subsets of the collected data and comes into effect as soon as the work involved in calculating the selected total amount of data becomes too expensive. Reports that are created with sampled data are only partially meaningful and cannot be used for comparison with other reports. However, there are various ways of avoiding the sampling problems and obtaining reliable reports even for large amounts of data.

Leave a Reply