October 26, 2014

Site Scrapers And RSS Bandits: Automated Plagiarism

Following on from his very helpful post last week, Plagiarised Text or Images – What To Do Guest Author Paul Ward explains How Site Scrapers and RSS Bandits can hurt us, even when they only duplicate a few links of our content.

Site Scraping

Along with the amateur who steals a page at a time, there are the more professional crooks who use programs called site scrapers. Give these programs a page URL and they follow links, grabbing every file they can: pages, images, scripts, style files. The better scrapers can be set up to automatically change things like affiliate ids. They can mimic browsers so webservers can’t detect and block them.

Add on a bit more software and the scraper can transfer all the files to a new host. Kick one off and go away for an hour or two – when you return you have a complete copied site, links, addresses and affiliate codes all changed to suit you.

RSS Bandits

The poor relation of a scraper is the RSS bandit. RSS feeds are a good thing in many ways: they aid search engines (Google likes them) and they can help you propagate your site to others (many of us have blog feeds and accept other people’s feeds).

An RSS feed typically contains a set of information for each page on the feed: in lens terms that’s URL, title, intro and any images associated with the intro. Grab a feed, add a bit of programmed code and you have a self-updating site.

Why Do They Hurt Us?

They steal traffic through beating the originals on search results. They typically mimic a frequently updated site to a search engine. By pulling in content from many places they impress more than each individual source. By stealing from Squidoo, where many people understand the basics of SEO, they further impress search engines.

In addition, the crooks may carry spam or porn ads. Nobody wants good original content associated with acne cures or worse, especially if their name is transferred with the content.

Don’t We Get Links From RSS Bandits?

Many such thieves will leave a link back to the original. This fools some people into thinking that there’s no real harm. There is: apart from the possible dubious ad associations, a full intro may contain enough text to convince Google that the original is the duplicate – so the original suffers in SERPs. There’s also a danger that your affiliate id may be copied unchanged – so your id is on a site that probably breaks the terms of your affiliate agreement. If you don’t do anything about it you run the risk of losing AdSense, Amazon and other reputable companies.

Why Do The Crooks Do It?

Several reasons: the obvious one is to make a quick buck from changed affiliate ids and other advertising. They may just be seeding the ground to sell the site or domain – get the money from a gullible idiot and disappear.

The amounts involved don’t have to be great. The costs are low and the profit can be a tidy sum in some countries, if not in US or UK eyes.

Should I Remove My RSS Feeds?

No: there are other ways to steal content anyway and you benefit from the feeds. Better to get the copies removed.

What Else Should I Do?

This is plagiarism. It’s illegal and there are quite straightforward measures to take. These are described in Plagiarised Text Or Images? What To Do.

There are forums and Facebook groups where you can get help and support. You can use sites like Web Of Trust to record bad sites. Spread the word, kill off the crooks.

StudioPress Premium WordPress Themes Writing Online runs on the Genesis Framework
The Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple - start using Genesis now!

Plagiarised Text Or Images? What To Do

Unauthorised use of text and images is rife on the Web and the situation seems to be getting worse. Finding your lovingly crafted article on another site is upsetting and fighting back seems daunting. Well, it needn’t be, if you follow the steps outlined below.

I’m not a lawyer and the steps below don’t argue over angels dancing on legal pins but they’re what I do when I find my property stolen and they do work.

Know Your Rights

Pretty simple: if you created it, it’s your property. That only changes if you explicitly say so. You don’t need to put a copyright notice on anything. The only reason for doing so is to deter casual thieves – the professionals won’t even look at it.

Fair Use

There’s a defence often put forward by copiers “it’s fair use”. This confuses the lawyers so I go by a simple rule: if the copying affects your subsequent sale of the property, or if people would not need to visit your original after seeing a copied portion, it’s theft. A quick example might be a slightly reduced size copy of a photograph: once such an image is out there you’ll never be able to sell the original to a respected purchaser such as a newspaper or magazine site.

For text, a small quotation is legal. A large chunk isn’t.

I Gave You A Credit/Link

That doesn’t lessen the offence in the slightest. Yes, you may allow someone to use your content if they attribute with a link but linking doesn’t magically bestow permission.

How Do I Know I’ve Been Ripped Off?

Watch for a drop in traffic. If you see a drop over a couple of days, copy a couple of sentences as a block into Google – look through the results. You can use Google’s image search to find your images on other sites – click on the magnifying glass in the search box and enter the URL of the image.

It’s best not to search for the opening sentences as that will often give false positives: if you have an RSS feed then those sentences may show up on lots of pages. Take text from lower down your page.

Found A Thief: Can I Kill Him?

No, sadly. Nor can you attack him any way other than legally. Tempting though it is, don’t ask a hacker you know to clobber his site – you’re the one who could end up being prosecuted.

Identify The Thief

Identify his domain and host. If your article on Bald Eagles is appearing on www.xxx.com then go to Google and enter “whois xxx.com”; “whois” has a particular meaning on the web and will take you to his domain registration details. You may need to try a couple of results as some sites offering info do vary, but you’ll possibly see the thief’s personal contact details, including email.

If he’s keeping details private, look for “name server”. That will identify where he’s hosting his site. For one of my domains, you would see:

Name Servers:
ns1163.hostgator.com
ns1164.hostgator.com

From that you see that HostGator is my host for that site.

DMCA Notice

The DMCA notice (letters pronounced separately) is the legal silver bullet – it states your ownerhip of the property and demands removal. Even if the thief is in a country that might not respect the notice, send it anyway. Emailed notices are legal. See a good example notice.

If you can’t find an email for the crook, go to the next step.

Tell The Hosting Company

If the thief doesn’t comply with the notice or you can’t track him down to send it, try his hosting company next. Look on their site for a contact method – many have a complaints form that serves as notice. Many hosting companies will take a site down quickly if they get enough complaints.

Hit The Thief In His Wallet

If the thief is carrying advertising, complain to the advertisers. AdSense is the biggest ad server and they take complaints very seriously (they’re owned by Google which doesn’t like its own revenue being threatened). See Google’s online form for reporting if the thief is using AdSense. (Mouse over a big ad on the page and look at the bottom of your browser: if “google” appears in the URL then it’s AdSense.)

Amazon is another common source of revenue for the crooks: they are very good at responding to this sort of problem. Go to Amazon’s online reporting system for Amazon US. If he’s using another Amazon, have a look around their Help pages.

Note: check the thief hasn’t just copied your affilate ids: most will change the ids though.

Other, smaller companies’ ads may be on stolen pages. Check their sites for contact details – you may have to go via Terms Of Service pages. Make a note for the next time you need the forms …

And That’s It

Hopefully, until the next time. Remember: don’t get mad, get even. And get other people involved if you see their content stolen. The more complaints, the faster the removal usually.

Further Reading

Copyright is Just the Tip of the Iceberg: What About Fair Use?

How To Credit an Image

Copyright Infringements

StudioPress Premium WordPress Themes Writing Online runs on the Genesis Framework
The Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple - start using Genesis now!

Plagiarism is an Ongoing Problem for All Writers and Publishers

I just finished reading a very good article by Adam Penenberg on Fast Company about Amazon’s Plagiarism Problem.  Penenberg makes excellent points in his article, exposing just a few blatant plagiarists that are earning money on Amazon through their so-called “published works” which are really out and out, copy and pasted books stolen from other authors.

The article focused on the erotica section, which has a very large internet audience.  But those of us who have been writing online for a while can tell you that plagiarism is rampant in all topic areas, including the driest subjects.  Plagiarism used to be much harder when books were published through the stringent auspices of  large publishing houses, but now that there are so many self-published authors, it is difficult to control how much content is being published new, and how much is being stolen.

While even the best publishers and book sellers can miss some authors who fall through the cracks, with the easy self-publishing available to anyone online, the cracks have grown much, much larger, and plagiarism has become the new focus for spammers.  People who are just out to make a quick buck have found an easy way to make money from Amazon and other online book sellers by selling other people’s work under their own name.

One of the interesting issues to come out of all this is the question of whose burden it is to look for and take down the plagiarists.  Most authors do not have time or the resources to become plagiarism police, to guard their own work from being stolen.  Filtering out every plagiarist is a costly, and labor intensive problem for publishers of self-published books.  At this time, Amazon most likely only removes plagiarism that is pointed out by complaints from readers or authors.

What is the solution to all this?  I really don’t know, but somehow the technology must be upgraded so that the author needs at minimum to prove their identity when they self-publish.  That might not take care of all the plagiarists, but at least it would take care of the people with multiple accounts under different names.

StudioPress Premium WordPress Themes Writing Online runs on the Genesis Framework
The Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple - start using Genesis now!

Plagiarism – What to do if your Content is Stolen

In my last post What is Plagiarism?, I outlined what costitutes content and image theft and how not to plagiarise. In this post I will give you some tips on what to do if you discover your content has been stolen from you, used without permission or presented as someone else’s thoughts and ideas.

Sometimes we remain in blissful ignorance that our content has been stolen. However, we can get a nasty wake-up call when we check our backlinks and find our content is posted on someone else’s site. OK, so we may have a backlink, but if the site is unrelated to the topic about which we are writing and/or the site is of poor quality and even “spammy”, then we may like to get that backlink removed along with the content.

Another way you can check to see if your content has been stolen is to copy and paste a chunk of text from your content into a Google Search Box. You may get one heck of a surprise :(

Your content has been stolen – what next?

There’s various things you can try, but be prepared to get “heavy”, very quickly if an initial approach to the site owner is ignored and frequently you will find that not only is there no way to contact the site owner, but comments are disabled too. A clear sign that they know exactly what they are doing.

So, if you cannot contact the Content Thief, or if they ignore you, the next thing to check is exactly where the Blog/Site is hosted. If it is Blogger or WordPress.Com, then you are in luck.

Blogger is owned by Google and Google is clear on its policy with regard to copyright infringement:

It is Google’s policy to respond to notices of alleged copyright infringement that comply with applicable international intellectual property law (including, in the United States, the Digital Millennium Copyright Act) and to terminating the accounts of repeat infringers. Details of Google’s policy can be found at http://www.google.com/dmca.html.

The link gives very comprehensive instructions as to how you can file a complaint with Google about anyone who has stolen your content and I can confirm, from personal experience, that they follow things up very quickly.

If the blog/site is hosted on WordPress.Com (note this is the free WordPress Blogging Platform and does NOT relate to sites using a WordPress.org template) then their Terms of Service leave users in no doubt that:

By making Content available, you represent and warrant that:

the downloading, copying and use of the Content will not infringe the proprietary rights, including but not limited to the copyright, patent, trademark or trade secret rights, of any third party;

WordPress manage their Complaints process via Automaticc and this can be found at: http://automattic.com/dmca/, where they have added a very handy form for you to use for your complaint that ensures you give them all the information they need to be able to check your allegation of Copyright Violation. And again, I have been very successful with complaints I have made using this process and in one case the whole blog got taken down – heh heh!

Just remember that if you use a form to submit a complaint, then this may not be saved in your email folder – keep a note of the date the complaint was submitted, together with a copy of what you say – this can be handy if you need to follow up due to a lack of response.

If the offending Blog is not hosted by Blogger or Wrodpress, then you can still take action, particularly if the site has Google Adsense on it. Google does not take prisoners and anyone stealing content is violating their Terms and Conditions. Violators will often find their Adsense Account being cancelled.

If none of the above applies, your final recourse is to find out which Company is Hosting the Blog. I have had mixed success here because some Companies just ignore your emails or they may say they will not take action without a Court Order. Others have been very helpful.

To find out how to go about finding the host of a website, Plagiarism Today has a post Finding the Host, which tells you what you need to know.

You will then need to check the Host’s TOS regarding content and follow their complaints’ process.

If you have any specific questions about Plagiarism, then head over to the Writing Online Forum where a thread has been started.

StudioPress Premium WordPress Themes Writing Online runs on the Genesis Framework
The Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple - start using Genesis now!

What is Plagiarism?

Dictionary Definition of plagiarise: “Publish borrowed thoughts as original; steal from thus”.

Before the days of the internet, when we were at school or college we were taught that when we researched topics we had to write our essays in our own words and that it was essential to cite our sources. It was easy for our teachers to catch us out if we did not cite our sources, because let’s face it, at age 12, not many of us would have had much original knowledge about Newton’s Theory of Relativity until we had done some research on it!

However, with the development of the internet and so much information now at our fingertips, plagiarism has taken on a whole new meaning. Often referred to as “copy and paste content”, it is just so easy to plagiarise. And it is rife!

The act of Plagiarism is often carried out in pure ignorance. Many online writers seem to have the genuine belief that once something is published on the world wide web, it is in the public domain and therefore it is permissable to copy content and publish it on our own sites.

However, what these people don’t stop to think about is, even if the content WERE free to use elsewhere, is it ethical to present it as one’s own thoughts? Isn’t that deception?

Others think that copying content is OK providing they link to the original content. But it is NOT OK to reproduce other’s work, other than a small excerpt (providing it is properly credited and linked to), unless you expressly have the original Author’s permission.

But sadly in many cases, the act of Plagiarism is a deliberate act carried out by people who know exactly what they are doing. They copy the content, present it as their own and nowhere do they credit the original author.

Here’s an example, using Star Wars.

My family loves the Star Wars films. My son can look at any still from the three original Star Wars movies and tell you what dialogue is being spoken. I could write an intro to a page and it would be so personal that you would then believe the rest of the page is my own work, because I can come across as being passionate about the topic.

But what I have seen done (many times) is an intro that is clearly personal followed by film reviews that are lifted from the IMDb (the International Movie Database) and presented as the page author’s own film reviews. And of course there’s no credit to the original source, because the plagiariser does not want to get caught out.

I have even seen a review done on a classic car and one of the sections is headed “My Review of…..” but the original author was someone in another country. The plagiariser even copied copyrighted pictures of the car.

But why is Plagiarism so bad?

Think about it. You have spent months building original content in a niche. It may be a niche that started as the result of a personal hobby or interest or it may be a niche that you came across in which you subsequently developed an interest and knowledge. It does not matter how the niche started, but you have worked hard to develop your content and it is getting traffic and making sales.

Your content could be on your own niche blog or part of a series of niche topics published on platforms like Wizzley, Squidoo or Hubpages.

Then along comes a plagiariser who copies your content and blatently presents it as their own! To add insult to injury, they have harvested the keywords that you so painstakingly researched. And to rub salt into the wound, their version of YOUR content starts ranking higher for those keywords than your page and starts to get what was originally YOUR traffic.

The whole thing gets even worse when plagiarised pages on Publishing Platforms gain valuable page rank on those platforms, which in turn pushes down pages with honest, original content and affects the potential income you can earn on those sites.

How to NOT Plagiarise

It is so easy to avoid plagiarising other people’s content. All you have to do is:

research facts and information from various sources
rewrite in your own words
ideally add your own thoughts and opinions
credit the original source, preferably with a properly formatted link to that source

By linking to the sources, you are acknowledging the valuable work done by the original authors AND you are giving them a backlink – it’s the least you can do!

Plagiarists are cheats. They do it purely for financial gain and because they are so hell bent on making pages and content quickly. They churn out page after page, while the rest of us are still stuck on one page as we make sure the content is original and that we give our sources the credit they are due.

In my next article I will explain how you can check to see if your content is being stolen and what you can do about it.

FURTHER READING:
Copyright Infringements – all about various forms of Copyright Infringement and how to avoid it

StudioPress Premium WordPress Themes Writing Online runs on the Genesis Framework
The Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple - start using Genesis now!

Copyright Infringement

Recently on my monthly slot on the Giant Squid Open Mike programme I do on Blog Talk Radio, we discussed Copyright. If there’s one issue that seems to confuse people it is where to get copyright free images and what images are legal to use.

Most people seem to learn very quickly that copying huge chunks of other people’s content and posting it on your own sites is not a good idea. This is because:

It is illegal

It is dishonest, as well as illegal, to present someone else’s work as your own

Google and the other Search Engines penalise duplicated content

It is against the Terms and Conditions of every reputable website and you could find your pages deleted without warning

It is against the Web Hosting Companies’ policies and you could find you blog or website taken down without warning

But not only is it illegal to copy content, it is also the same with images – only not everyone seems to realise. Basically, if you cannot find anything on a site that expressly gives you permission to use text or images from that site, then you CANNOT use it.

The fact that the content is publicly viewable does not mean that you can help yourself to it. You cannot duplicate that content in your own site without permission.

Think about it this way. Creating content, whether it be text or images is time. Time equals money. Why would anyone in their right mind want to give anyone else the opportunity to earn money as the result of their hard work?

And here’s another thought – newspapers, news websites. The photos are lovely. It’s oh so tempting. But…the majority buy the rights to those photos from sites like Associated Press or Getty Images or they pay a staff photographer to take them. It is a costly business getting those precious first pictures of Kate’s wedding dress or The Kiss on the Balcony.

And here’s something else that people don’t seem to realise. Taking that content, whether text or images, and then giving the original creator a link, does not let you off the hook. It is still content theft, it is still plagiarism, it is still not legal.

But if you put in a little time and effort, there’s plenty of places where you can get free images, even celebrity images.You will be required to link back to the original site in return for using these images, butit is a small price to pay.

Sites that offer free to use images (but check out the conditions of use) include flickr, AllPosters and Zazzle.

You can find more information about Plagiarism and where to get free images at the page I wrote from the notes I made for the Radio broadcast: Copyright Infringements.

StudioPress Premium WordPress Themes Writing Online runs on the Genesis Framework
The Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple - start using Genesis now!

Google+