Search Engine Spiders

All search engines have simple computer programs, which are commonly referred to as a “bots” or “spiders”. They are called spiders because of the way they “crawl” over the web, performing repetitive tasks at great speeds to analyse websites and return data to the search engines.

Because most web pages contain links to other pages, a spider can start almost anywhere on the web and then follow the links to other pages and other websites.

Information gathered by spiders from those pages is stored in search engine databases. The data will later be retrieved and its relevance considered when displayed in the Search Engine Result Pages SERPs.

Lost In Translation

Part of the art of Search engine Optimisation SEO is understanding the language of search engines and knowing what information we need to tell the “spiders” when they come and visit our sites. If we build our sites in such a way that prevents it from communicating with the spiders, then you could say we have built a “mute” website.

Spider a real scary one

Spider Roadblocks

There are a number of things that could cause problems for spiders when gathering information about your website:

Who's Been Framed?

Unfortunately, frames confuse spiders. Frames can seriously hinder or prevent proper indexing of a page by the search engines.
This is because the main page the frame is displayed in generally doesn't contain any content in itself. The content is pulled from a different location, therefore, the spiders cannot see it.

Not so Flash!

Macromedia's Flash software is favoured for producing high-quality, graphically-based websites which are extremely appealing visually. However there is a draw-back when compared to HyperText Markup Language HTML websites - Spiders cannot currently read Flash, therefore, if you build a page in Flash, the page might as well not exist as far as search engines are concerned.

However, there is nothing wrong with using Flash as a banner on a page or a flash image, although the spiders cannot read it, as long as you still have normal written text on the pages, then there is still some data for spiders to return to the search engine database.

JavaScript Navigation

Links that are within JavaScript code are currently not followed by search engines. If you want to use JavaScript for navigation within your site it's a good idea to also include traditional HTML links for the spiders to follow.

Dynamic URLs

Dynamically delivered pages that use ID= are no longer being indexed by Google. So if we built dynamically driven content pages which are presented in the address bar as Google tells us they now ignore these Uniform Resource Locators URLs. This is because spiders can fall into what is commonly known as a “Spider Trap”. URLs containing question marks can also cause problems for spiders.

In order to avoid the problems that dynamic URLs can present, use “workarounds” such as “mod rewrites”.

Splash Pages

Many site owners use splash pages as their main entry page into their site. These pages can be visually appealing, however they are of no benefit to a site in terms of search engine presence. Often these pages redirect to a new page on the site once the animation or flash presentation has completed.

The purpose of an index page should be to act as a sites table of content. The index page should not be used as a giant ad page. A splash page is essentially a giant advertisement.

Problems presented by Splash pages are:

Splash pages lack keyword-rich text

The most important text on a web page is the title tag and visible body text that can be copied and pasted into a text editor. A splash page contains no visible body text, except possibly, “skip intro”.

Splash pages have only one link and are rarely cross-linked

Most splash pages link to a single page within a site (usually, the real home page). This format tells search engines only one page on your site is important.

To ensure your website gets "spidered" regularly, we recommend creating and submitting an XML sitemap on a regular basis. Contact us for more details.