11 December 2008

Web spam: How Google and other search engines catch spammers

What is web spam?

Put simply, web spam is content that is designed to trick search engines into directing traffic towards them. The idea is to make money: spammers direct traffic to their pages that are loaded with sponsored links, either through pay-per-click campaigns or through thin affiliates (i.e. sponsors who pay when a transaction is made through a click-through.

Web spam damages the relevance of search results. Search engines have been doing battle with spammers since the internet began and It is well known that Google review search results regularly to identify web spam.

Most SEO activity does involve trying to gain advantage in terms of improving search engine results. Most of these techniques are plain old marketing – i.e. relevance and promotion. However, it is worth knowing what kind of page behavior Google are looking for in order to identify spam.

What are the main types of spam?

Google have identified eight main spam categories based around techniques that are designed to trick the user.

1. Pay per click landing pages

Many pages are set up to collect pay-per-click revenue without providing any content of their own. They may contain fake content that is geared to look like a blog, forum or set of search results, but the key point is that they add nothing new except a bunch of sponsored links.

Spammers use a variety of techniques here to generate content that can fool search engines. Page scraping and wholesale copying from “free to use” sources such as Wikipedia and DMOZ are common techniques, as are sites that exploit RSS feeds. Template-based content is also frowned upon – i.e. pages based on templates that use an information source to automatically generate vast amounts of content that follow a generic format or pattern.

2. “Parked” domains

Domains that fall out of current use are often “parked” on a web page that displays a set of sponsored links or directories. Generic parked domains are generally marked as spam.

3. Thin affiliates

A thin affiliate page exists soley to deliver users to another website with a different owner. Keywords deliver the user to the thin affiliate page which in turn delivers the user to the merchant’s site via links. This is normally a commission-based arrangement where the merchant pays for any sales activity that takes place on their site.

This is not necessarily spam by any means, as price comparison and review sites often use these arrangements. Thin affiliate pages are regarded as spam when they offer no added value content, i.e. just contain a logo and some blurb copied from the merchant site with no other facilities such as reviews, price comparison or the ability to register and login.

4. Hidden text and hidden links

Hidden text is website copy that is visible to search engines but not to users. This includes text that is the same color as the background, text that is in a hidden area of the page and minute text. These techniques have long been considered a no-no by search engines and are always regarded as spam.

5. JavaScript redirects

Search engine spiders do not execute JavaScript, so many spammers have exploited this to display different versions of content to search engines on the one hand and users on the other. The search engine must see the true copy of content, or it’s spam.

6. Keyword stuffing

Exploiting the structure of copy to try and unnaturally raise keyword density was a common trick ten years ago. These days it will be treated as spam. Using excessive keywords, off-topic keywords or stuffing URLs with keywords are all examples that should be avoided.

7. 100% frames

Some web publishers mask their content by using a frames page where only one frame is visible to the user, where the search engine will see both. This is a misleading technique, so is regarded as spam.

8. Sneaky redirects

A sneaky redirect is where a page redirects a user to a different domain – the search engine indexes the first page, but the user only sees the redirected content. This technique often involves rotating domains, so that landing on the first page can give you a different domain every time.

The importance of adding value

The use of most of these techniques is a sure sign of spam, though there are some grey areas particularly with affiliates and landing pages.

Generally, pages are not added as spam unless they add value in some respect. Certain types of page that may appear “spammy” on first glance can be perfectly reasonable if they are offering content in an added value context.

Examples of sites that add value despite their apparent use of spam techniques include:

  • Price comparison and product review sites – these sites are often littered with affiliate links and sponsored content, but are still regarded as adding new information.
  • Amalgamations of recipes, lyrics and quotes.
  • Contact information pages
  • Affiliate pages that provide promotion codes and coupons can be regarded as adding value to the user.

Filed under UI Development.