How to Deal With Blog Content Scraping


A few weeks ago a friend posted a question about content scraping in a Facebook group for running bloggers that I started (you can join here if you’re interested, bloggers of any type are welcome). He had noticed that some of his posts were popping up on another site and was wondering how to deal with it.

Content scraping is basically the practice of taking content from another site and posting it on a site you manage in its entirety (you can read Google’s definition here). Sometimes this is done with attribution in the form of a link back to the source, but in its worst form it involves stealing entire posts without giving credit to the source at all.

Most typically, content scraping sites are littered with very prominent advertising banners. The tactic taken by these sites is to re-post content in order to draw people in so they will view or click on ads. There is usually no original content, and these sites often pull posts in from multiple different sources.

Over the years I’ve come across a number of sites that scrape content from my Runblogger blog. The first time I noticed this I got really pissed. A site had grabbed my RSS feed (as well as those of several other running blogs) and was republishing entire posts without any attribution (I refuse to link to any scraping sites here). There were a ton of ads, and no info provided on how to contact the webmaster (these sites are usually run anonymously). I wasn’t sure what to do, so I did a bit of reading and figured out that it would be easier to report the site to Adsense and get their ads taken down than to get the content removed or the site dropped from Google search (via a DMCA complaint). I reported the site to Adsense, and was pleased when eventually I noticed the ads on the site were gone. This was quite a long time ago, and since then Google has produced a form for reporting scraping sites. I’m not sure how effective it is, but you can find it here.

My attitude toward scraping has evolved a bit since that initial experience. I’ve come to realize that it’s not worth my time in most cases to bother with scraping sites as they don’t typically impact what I do (especially if they include a link back to the source). It’s been years since I’ve seen a scraping site outrank Runblogger on a running-related topic (it did used to happen on occasion several years ago). This might be in part be due to the fact that Runblogger’s authority in Google has improved, but it also might be due in part to major Google search algorithm changes (e.g., Panda and Penguin) over the past few years that have hammered sites with thin content.

The other factor that has eased my concern a bit is that I’ve found ways to ensure that in most cases my content contains a link to the original when it gets scraped (even if the owner of the scraping site doesn’t include the link on their own). It simply involves adding a link to source in the footer of your RSS feed (this only applies if you publish full posts to your feed, which I always have done since I like my readers to have options for how to consume my content). Most scraping sites operate by auto-posting from RSS feeds, and scrapers don’t typically take the time to edit out content from scraped posts.

Adding links to your RSS footer can be done in both Blogger and on WordPress. Below I show you how:

In Blogger

To add content to your RSS Feed Footer, go to your Blogger dashboard and click Settings:

Blogger Dashboard Settings

Then, under Settings click Other. In the Other page you will see a box labeled Post Feed Footer.

Blogger Post Feed Footer

This is where you place the text you want to appear in the footer of each post in your RSS feed if you have “Allow Blog Feed” set to Full. When I was using Blogger I had the following line in there:

This article was originally posted by Peter Larson on <a href=”http://www.runblogger.com/”>Runblogger.com.</a>

You’d want to change the URL and blog name in the above text to that of your own site. Each post in your feed will now have a link back to your homepage. You could also add a personal message, a text ad, or whatever else you might want to put in there.

WordPress

After migrating from Blogger to WordPress, I added similar functionality to my feed by installing Tentblogger’s RSS Add Footer plugin. Installing the plugin automatically adds a link back to the original post to the footer of each RSS feed entry. You don’t need to add any links or code, though you do have the option of adding additional text or links if you wish in the plugin dashboard:

Tentblogger RSS Add Footer plugin

Simple!

There you have it – if you have any good tips on how you deal with content scraping, let us know in the comments!




About Peter Larson

The Blogologist is authored by Peter Larson. I'm a former biology professor and pretty much an all-around geek who turned a blogging hobby (at Runblogger.com) into a full-time job. I tend to obsess about things that I'm interested in, and right now blogging, social media, science, and running are my passions. In my non-blogging life I work a few days a week at a sports injury clinic and chase around my three active little kids. You can find me on Twitter, Facebook, Pinterest, Instagram, and Google+.

Comments

  1. thanks for this peter – i’ve added a post feed footer now!

Speak Your Mind

*