How Bloggers Can Avoid Spam in Their Google Analytics Reports

How Bloggers Can Avoid Spam in Their Google Analytics Reports

All of the best tools have their limitations, and even Google, with its near-infinite wisdom, has fallen victim to abuse. At the time of writing, Google Analytics will inevitably report some form of fake traffic, which I will refer to as spam. If you already have Google Analytics set up on your blog, you may have already seen some of this spam show itself in your Google Analytics reports. The infamous “Trump Spam”, which seemed to originate from a certain eastern-European country, plagued many Analytics users throughout the 2016 US election campaign, urging the Analytics user to “vote for Trump!”. The spam subsequently inflated Google Analytics traffic data for many bloggers, e-commerce stores and government organisations, and impacted some important metrics for many websites.

Unfortunately, spam of an even more sordid (and sometimes quite distressing nature), can occasionally make its way through to your Google Analytics reports. As users of the internet in general and as a blogger who likely has to moderate their own spam comments, this is probably nothing you’ve not seen appear in your timeline before. Google has confirmed it is working hard on a solution to avoid all spam, but until then I’m afraid we will have to fix the problem ourselves.

If you haven’t yet created a Master View for your blogging account, then I highly recommend you do that first.

What Exactly Is Analytics Spam?

Without going into too much detail, there are different forms of Google Analytics spam. Some spam consists of harmless, automated bots that arrive on your website to collect data – a practice which even Google carries out from time to time. Other forms of spam intend to shock the Analytics user, or sell services to them. The good news is that there is nothing to fear from seeing spam in Google Analytics, and any spam that you do see in your reports is not a result of your website being hacked. In other words, the spam that you see in Google Analytics has little to do with your website, and is mostly due to exploits in the Google Analytics program itself. There is nothing to worry about – the spam simply needs removing from your Google Analytics view.

How To Block Spam From Your Analytics View

The most effective way of blocking out spam is to only include traffic from a trusted hostname. The hostname is the source of traffic on which your Google Analytics tracking code was fired. For most bloggers, the hostname will predominantly be their own domain. As such, any traffic that arrives on your website from a hostname that does not match your own website domain is a candidate for spam.

Assuming you have correctly installed the Google Analytics tracking code and have some genuine traffic data to hand, we can quickly check which hostnames are reporting traffic to your website. The Hostname report can be found by navigating to Audience > Technology > Service Provider, and then clicking on the Hostname tab.

hostname tab

If you’ve just set up your Master View, then you will unlikely have much hostname data available. If this is the case, navigate to your raw, Unfiltered View so that you have some data to examine.

Examine the list of hostnames that are displaying in the report. You should hopefully see your domain name listed within the results, which may appear with or without the www prefix. If you don’t see any hostname data in the report, try changing the date range to a longer period of time where you may have received traffic.

In the below example, the blogger’s domain name is www.pushingthemoon.com – we can clearly see that the vast majority of traffic being generated to the blog is genuine, because the hostname matches the domain name of the blogger:

hostname spam

Any other hostname that does not match your own website’s domain name is suspected spam for now. This includes seemingly harmless domains, including those that occasionally say “Google” in an attempt to trick Analytics users. Some spam bots don’t even have a hostname attributed to them, in which case they will be marked as (Not Set).

There are some genuine exceptions as to why hostnames won’t contain your own domain name. These include:

  • Other websites or domains you own.
  • Translation services.
  • Other variations of your domain name, such as .com or .net variations.
  • Online tools you use that make use of your Google Analytics code.

From the above screenshot, I know from personal experience that the following hostnames are genuine Google services. If you see any of these in your hostname report, then I can confirm that they are more than likely genuine:

  • translate.googleusercontent.com
  • webcache.googleusercontent.com
  • {yourwebdomainname}.googleweblight.com

Examining the above screenshot, I can see that there are a few strange domain names that make no sense – why would the Google Analytics code be fired from them? One of them also uses Facebook in the hostname, which does not make use of the Google Analytics tracking code, so it would not be firing it. These other domains are all spam-related and need to be excluded.

First make a note of any hostname showing in the report which you believe to be genuine, including of course your own website domain. We’ll need this list of genuine hostnames so that we can tell Google Analytics that these are genuine sources of traffic. What we are going to do next is tell Google Analytics to only include traffic data from hostnames that we trust. Even if you don’t have any traffic data available at this point, you can still follow the below instructions – this will ensure that your Google Analytics data is mostly spam-free once traffic data starts being collected:

  1. Click on admin (the little cog icon in the bottom left-hand side of the screen).
  2. Under the View column, make sure that your Master View is selected.
  3. Click on Filters
  4. Click on the red Add Filter button
  5. Ensure Create New Filter is selected
  6. Give the filter a name. In this instance, we should call it something like “Include Whitelist Hostnames”
  7. Click on Filter Type and select Custom.
  8. Click on the Include option – ensure that Exclude is NOT selected.
  9. A drop-down box that says “Select Field” should appear underneath. Click on the Hostname option.
  10. In the input box labelled Filter Pattern, type in your domain name without the www prefix, nor the domain type suffix (such as .com or .org). For example, if your domain name is www.awesomeblog.com, just type in “awesomeblog”.
  11. If you have multiple hostnames to add, add these to the list and separate them, without spaces, using the pipe symbol (which is normally added by pressing SHIFT and the backslash keys together. Do not include the www prefix (or any other subdomain), as well as the domain type suffix (such as .com). For example, we will also want to include googleusercontent.com and googleweblight.com – to do so, we will simply enter googleusercontent|googleweblight to whitelist both of these domains.
  12. Continue until you have added all of your safe, genuine domain hostnames. Do not add a pipe at the end of the list as this will cause issues with your filter.
  13. Click save.

Below is an example of how pushingthemoon.com’s hostname filter setup will appear:

analytics include filter for spam

Make sure that you have set the filter to Include – anything else will be excluded. Once you have saved your filter, the vast majority of spam will be prevented from accessing your account.

Please bear in mind that when you add a filter to your Google Analytics view, that any historic spam is not retroactively removed from your reports. Only spam from the point at which you added the filter will stop showing in your Google Analytics reports.

Trickier Spam

Yes, there are other forms of spam that will cleverly match your hostname and thus avoid your hostname filtering. If you have spotted odd landing pages or seen some suspect traffic hikes from various locations, then your hunch is likely correct and the culprit will likely be spam. Huge traffic spikes from Romania, China and The Philippines should be considered suspect. If you have the know-how to identify such spam in the first-place, then the following example and fix will hopefully make sense to you.

For example, the below screenshot shows a screenshot of a Landing Page entry which appears in my blog. The page leads to Netpay MerPayB2C – no such landing page appears on my website:

netpay merpayb2c spam

Adding a Secondary Dimension to the Landing Pages report which shows the Country of origin reveals that this particular session originated from China. This particular territory, unfortunately, appears on my spam watch list, as it is one of the locations that consistently generate spam into Google Analytics accounts. This fact alone has instantly given me enough reason to block further more traffic from this particular ISP, particularly as it is showing bank-related landing page data to my blog, which has nothing to do with what I blog about, and that it derives from a country which is rife in terms of driving spam-related data in Google Analytics. Such unscrupulous behaviour is enough to prompt me to completely exclude any further traffic from this particular internet service provider, so I have excluded the ISP entirely from my Master View. To find the ISP of the culprit, I added Service Provider as a Secondary Dimension, and uncovered the following information:

china analytics spam

I can therefore exclude this particular ISP from my Master View by adding it as an exclusion filter, using ISP Organisation as the Filter Field:

exclude china spam

 

Follow:

Leave a Reply

Your email address will not be published. Required fields are marked *