It’s a real pain without a good solution.
I’ve been fighting a battle with referral spam over the past few weeks.
Referral spam is when a website sends lots of bogus traffic to your site in order to show up in your site analytics. They hope that you are excited and want to see the site that is sending you tons of new traffic, so you click through to the referrer website.
Also called referrer spam, this concept started with websites that sold services to website owners. It was a way to get visits from ideal prospects.
This type of spam is a major nuisance for a couple reasons. First, it messes up your analytics so that it becomes difficult to tell how well you’re really doing. Second, it uses hosting and services resources. I hit my traffic limit on Woopra (a live stats system) during the last billing cycle because of the spam.
On a typical business day, Domain Name Wire gets about 250-500 visitors per hour during working hours. Sometimes now I’m getting a couple thousand visits per hour for a couple hours at a time.
As best I can tell, this latest surge in referrer spam is actually caused by malware. I went to a few of the sites (with my firewall and antivirus software on) and they triggered warnings. I was even receiving bogus traffic from Australian domain registrar Netregistry. It seems that the site owners are unwitting participants in this scheme; the perpetrators are hacking their sites and then triggering the fake traffic.
One way to counteract the problem is to use a WordPress plugin called Block Referrer Spam. It adds a rewrite to your .htaccess file to deny the traffic. But this is a neverending battle as the sites that send the fake traffic changes daily, if not hourly. It’s like using a simple email blacklist tool.
Ideally, I’d like to see analytics services (including Google Analytics) create an easy way to delete a referrer and all of its data (including page views, time on site, etc) from your analytics. A simple (-) link next to each referrer would be a big help.
Josh says
This is a good idea. Seems like a service that would be in GoDaddy’s wheelhouse. I hope they are listening.
What hosting plan are you on now BTW? I remember you giving a review of a lower level GD plan a few years ago but interested to hear if you’ve upgraded since.
Eric Lyon says
This is an uphill battle both ways. Even if something was implemented to block bulk traffic senders, how would they tell the difference between a fake and legitimate source. That would suck if someone blog got a linkback in an article on Forbs or on New York Time’s site and the traffic was blocked because too many people clicked to visit.
I suppose, most the referral spam are bots and there are ways to detect bots and block them. However, there are some many different bots out there that it’s still a never ending battle trying to block them all. So back to square one again.
Thomas says
Hi Eric,
I agree with your last comment regarding this being a battle trying to block all types of BAD bots and will say that this is a continued and ongoing arms race, however I would strongly recommend that people facing these issues seek advice from vendors who are pioneers in being able to present a comprehensive solution in being able to help protect web/API channels against these types of threats.
Thomas Graham – London
https://uk.linkedin.com/in/thomas-graham-0955617
Jothan Frakes says
Perhaps this is an opportunity in disguise to create a service that allows for people to submit their blocking rules and receive the collaborative blocking rules from a centralized subscription service (akin to how sorbs or rbl work?) in a plugin, similar to how wordfence works in their subscription model.
John says
This is a real crime against people and society.
Drewbert says
Hardly. The real crime is all those webmasters using Google Analytics (and Facebook Like Buttons, etc) giving up their visitors privacy rights to Google etc.
John says
I don’t use Google Analytics and referrer spam is a serious problem.
Are you one who does that?
What do you expect people to do – not use social share buttons or Google Analytics if they choose? Google and Facebook are the bad guys when they are the bad guys, and they are not the bad guys this time. The referrer spammers are.
Drewbert says
Does what?
I would knock referral spamming on the head using that powerful do-anything tool, apache mod_rewrite.
Method 2 listed at http://www.loneshooter.com/how-to-block-referrer-spam-bots/ would be a good starting point.
http://www.thern.org/system-administration/apache-system-administration/dealing-with-apache-referrer-spam/ has a good script for pinpointing potential referal spammers from your log file.
John says
I was asking if you were a referrer spammer.
I have searched out and tried various .htaccess editing techniques and it’s still a big problem.
Okay, so perhaps I’ll check out and try what you have posted, and thanks for posting them. But in all honesty, I’m not optimistic about such techniques any more.
These spammers are costing people a lot of time and energy.
Drewbert says
OK John. Fail2ban is probably a good way to go, as it is ‘Set it and forget it”.
https://www.digitalocean.com/community/tutorials/how-to-protect-an-apache-server-with-fail2ban-on-ubuntu-14-04
Drewbert says
And here’s a live updated list of referer spammers that you can d/l to your apache server as often as you like…
https://github.com/desbma/referer-spam-domains-blacklist
John says
Thanks, I’ll check them out.
Danny says
There’s a fix that used to work fairly well for this. If you’re a bit adventurous, you can try setting up a filter in your Analytics account. See here:
distilled.net/resources/quick-fix-for-referral-spam-in-google-analytics/
Doesn’t block them, but at least you don’t need to see it in your Analytics data.
Julio Maysonet says
Last month when I saw 3 spam referral links in my awstats. I added the domain names to cpanel’s IP blocker this converted the domain names into IP addresses then they get added to the .htaccess file.
Jacob says
Please try Custom Referral Spam Blocker for WordPress.
It doesn’t require .htaccess changes and allows you to customize blocked domains, with a huge list of known spammers.
https://wordpress.org/plugins-wp/custom-referral-spam-blocker/
Drewbert says
Wordpress is a data/resource/energy/code pig. Leave this job to underlying Apache.
csmicfool says
I understand the sentiment, but that “pig” powers something like 27% of the web and the majority of those users have no clue what Apache even is, they just want a plugin to install.
if fighting off a DoS, then apache is the clear winner. However, we’re talking about junk analytics, not server-crashing performance issues.
Drewbert says
You are correct, unfortunately. So maybe the hosting providers should install a lot of this sort of stuff by default in their apache installs to help prevent the need for all these resource-hogging WP plugins?
As for WP itself, the guys running it a clearly making too much money to give a shit about cleaning up the mess their pig has produced. Powers 27% of the web, uses up 75% of the datacentre power budget and 75% of the data pipe bandwidth sending down crap that’s never actually used when it reaches the browser.
These people are complaining about comment spam filling up their logs, they should take a look at the # of file and SQL calls made by your average WP theme/plugin nightmare. I took a look at one simple WP install on an apache server – Just serving up the home page took over 80 mySQL queries and over 70 css and js files sent to the browser! Sheesh. No wonder computers keep running out of ram and cpu cycles.
csmicfool says
Again this comes down to server configs. Wordpress is a php app, not a native server application itself. It’s moreso a limitation of PHP than wordpress.
If you’re running WP at scale, use object and page caching with redis or memcached and watch 99% of your mysql queries disappear.
Again, I do not disagree with your sentiment but do consider the full environmental constraints. 75% of DC budget is overreaching. Sure, WP is going to be heftier than serving HTML or (most) small individual php scripts/pages, but that’s not based on the reailty of the web today – everything is dynamic. Most of my WP stalls are undetectable on my DC budget, nowhere near as costly as our beefed up .NET applications which require 10x the memory and far more environmental support to run at scale. Not to mention windows licensing per-host and SQL licensing per-core.
John J. Peterson says
Yea it can be annoying using htaccess to block the site worked for me! Hitting billing account with analytics thats new to me, I use gostats and GA and no limits on traffic for me.