Not all traffic that arrives in Google Analytics is real traffic. Fake traffic through referrer spam falsifies your access figures. In order to get a clean statistic from Google Analytics, it is important to get rid of these uninvited visits as soon as possible. We show which filters can be used to stop spam in Google Analytics!
What is fake traffic through referrer spam?
This is not a real website visit. It is often shown in the Analytics reports as a referrer, e.g. 4webmasters.org, but it can also be shown as a search term, a page or as a direct access.
Inquiries are repeatedly created (possibly with an automatic script) so that this link is displayed in reports and logs for these websites.
How is a referrer passed?
A referrer is a designation that is transferred when a browser switches from one website to another website (using an http header) to indicate where the user is coming from.
What is the goal of the creator?
The goal of referrer spam is usually to generate traffic. People are naturally curious and want to know what is happening on their website. When looking at the reports, he is made curious and lured to the referral URL. Sometimes to advertise another website, sometimes to redirect the user to an online store or to install malware or Trojans.
These spammers hit thousands of Google Analytics accounts. So you can imagine the total amount of access that is generated with this method.
How is the spam generated?
Spammers usually use 2 methods : ghost spam or crawler spam.
To make this possible, the Measurement Protocol is used, which enables developers to send data directly to the Google Analytics server. All you need is a GA tracking ID, the rest is almost done by itself. Usually the tracking IDs are generated randomly. An automated script is then used to send fake data to the reports.
e.g. semalt.com, buttons-for-website.com, best-seo-solution.com
A web crawler is an internet bot that browses web pages, usually to do web indexing. Like Google Bots, which read pages and indexes so that they can be found in organic search; these are useful crawlers.
A crawler spam also surfs over websites but for a different purpose, as already described above. This crawler ignores all rules such as robots.txt, which are suggested to prevent certain areas from being crawled.
What is the difference between Ghost and Crawler Spam?
Crawler Spam is actually paying your website a visit. Ghost Spam use a fake hostname because it is not known beforehand whose website is hit.
How can I recognize referrer spam?
Check referral hostname
Ghost Spam has a host name that does not belong to its own website section, as the creator does not know who he is meeting with this method.
The actual host name cannot be seen on the screenshot as it is not even listed in the top 8.
How do I stop Google Analytics spam?
With the following 2 filters you can stop almost all spam access in Google Analytics:
- A host name filter that filters all ghost spam in Google Analytics (referral, organic or fake direct access)
- A campaign source filter with a regular expression that filters all known crawler spam.
How do I create the hostname filter?
It is important to make a list of all valid hostnames so that no legal traffic is lost.
In case of doubt there is still the raw data view without any filtering.
Our tip : Always create a raw data view without any filtering of each property . Once you apply filters in the data view, the excluded data is lost forever.
- Go to target group / technology / provider in Google Analytics and select the “Hostname” tab
- There you will see a list of hosts like in this example:Hostname
- Find all the hostnames belonging to your website.
In this example it’s only the 7th entry (it’s a single blog page).
There may also be other pages on which the same tracking ID has also been integrated, e.g. on third-party pages for the shopping cart or the payment process.
If you have many visitors from different countries who use a translation service, then these will also appear as valid host names.
These are for example translate.googleusercontent.com (Google Translate), webcache.googleusercontent.com (Google’s cached website version), translateservice.com, and web.archive.org (the Internet archive)
All other hostnames that you do not know from your own website environment are not valid. Beware of familiar names such as google.com or amazon.com (spammers use these names to deceive users), even if the host name is “not set”.
- Using a regular expression, create a host name filter for all valid host names belonging to the website. Tip : Help with creating regular expressions: Go to Manage / Filter in Google AnalyticsManage filters
- New filter
- Designation: valid hostname
filter type: Custom
Include Host Name: regex (xxx)
Tip: definitely click Test to see whether the regular expression works.Add filter to data view
When you are sure that all valid data is included, click on Save.
Please note: If a new host name is integrated in the website area, the host name filter must also be adapted.
Please note: This filter applies from the day it is set up and protects you from future ghost spam. In order to view historical data filtered, you have to create a segment with this filter.
For us as an agency it is very difficult to install the appropriate filter for customers with very extensive and for us confusing web offers. To do this, we use the exclude host filter as an alternative:
Not.set | excite | webmaster | (google.ru) | hulfington | lumb | hide | pandashield | anonym | burble | pr.xy | speedsurfing | ymig | miradis | bing | fanyi | redir
How do I create the campaign source filter?
Campaign sources use filters to filter crawler spam
Please note: This filter list must be updated regularly.
Go to Administration / Filters in Google Analytics
Click on Create New Filter
Name: Spam Filter
Filter type: custom
Exclude filter field: campaign source
A lot of Trump-related spam has been generated in the past few months:
- Secret.É ¢ oogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!
- oo-8-oo.com search shell is much better than google!
Here is a list of the current spammers for creating filters, this also excludes Trump spam (updated on December 28, 2016) :
redd | moth | 0481 | 100doll | error | 7Ã — 9 | abcd | cand | playto | best-seo | biglist | booh | button | chatan | cheap | check | dailyrank | detail | error | event | fix | forum69 | free | girlsgo | guardlink | love | monet | semalt | seo.united | serienjun | valu | success | test | tvgrin | for.your.busi | zum.de
Tip : Be sure to click Test to see if the regular expression works.
Exclude useful bots and spiders
Not all crawlers are bad, so you shouldn’t always block them. This would have a negative effect on the visibility and findability of your website (e.g. Google regularly crawls all websites).
But these bots and spiders also generate access that is not interesting for your evaluations.
Therefore you should also filter out these accesses. This is very easy with the new Google function.
Referrer spam should be filtered out if possible.
Two filters are helpful for this:
- Hostname filter (to remove the ghost span)
- Campaign source filter (to remove the crawler referrer spam)
In order to see historical data without spam access, you have to create segments with these filter settings .
You can find more tips on setting up your Analytics account correctly in the article “Setting up your Google Analytics account correctly