Cristina - web design         Cristina's®       Capt. Peter - web design

  "No-Index" Tags

If you have one or several pages you don't want search engines to show,
then you need a "no-index" tag or something equivalent.
This page is part of the Website Design Handbook.

Site Goldaward - Pakistani Maritime  International Association of Webmasters and Designers
Site Gold Awards for Excellence on the Web in 2004
Classification: Maritime, Marine, and Boating

Last up-dated: Aug. 31, 2010

At the bottom of the page, there is a link
to a print ready version.
Why "No-Index" Tags
Password Protection
Robots File
This page is best in any browser

Why "No-Index" Tags

There can be several reasons for not wanting a page indexed by search engines. The "no-index" tag comes second to last, just before the <title> tag in the <HEAD> section:
<meta name="robots" content="NOINDEX,NOFOLLOW">

This works with most search engines, but not with all. There are some that disregard it and if you want to be sure you have to add something more.

The usual reasons for wanting some pages to stay "secret" are:
  1. The pages are your CV (résumé). In this case the best solution is to make those pages in frames because search engine spiders cannot read frames. See Don Pedro's HTML Code for Frames.

  2. The pages are "printer-friendly" copies of your "text pages". It is always possible some search engines pick up these "copies" and then throw out and/or ban your whole site. To prevent this, use encrypted reference links to your print versions. Then the search engines cannot read your link and hopefully they don't find your printer-friendly pages.

  3. Then there are some search engines who "harvest" all files in your site folder. To stop these you need a special "robots file" telling them which pages are not allowed. Because they harvest all files they will get the "robots file" also.

  4. Or may be you have some experimental page(s) you are developing or preparing for release at a certain date. You don't want anybody seeing these before the time is right. The solution is obviously same as in point "C" above.

  5. Starting December 2007 both Google and Yahoo recognize a new tag: content="noarchive", which means that page will not be cached. Especially if you have changed a page very much but still keeping same file name you don't want the old page content to show up from the sear engine cache.

    You can combine this with the "noindex" tag and write: content="noindex,noarchive" instead of having two robots' meta tags.
Matt Cutts from Google made an experiment (Sept. 2006) with different search engines. After finding a webpage with a "noindex" tag, he checked in the major search engines if he could find that page. The results:
  • Google and Ask doesn't show it at all
  • Yahoo shows url reference and cached link, which gives cached page but no text snippet
  • MSN shows an url reference but no text snippet

Password Protection

Because spiders cannot enter passwords you can use this technique instead of an encrypted reference link. Go first to: about.com/, there you get a link to the JavaScript to use, but you need to do some changes in the script first. You get them from the about.com webpage. It doesn't matter if you give the password to your visitors directly. They can enter it - the search engine spiders will skip the whole script completely.  The password script is shorter than the encrypted reference link code.

Because this is a JavaScript you can get problems if you use it on a free server. There can be a collision between this and the banner script - and you are not allowed to add any script in the <HEAD> section - may be your script won't work. Anyway I prefer the encrypted reference link as it's more easy for the visitors to use.
Facebook Buttons By ButtonsHut.com
Cristina's Website
Design and Promotion
Home   -   Site Map

Free Backgrounds

Free Pictures

Website Design Handbook

What's No-Index ?

Computer Viruses and Worms

Hide Your E-mail Address

How to Choose Website Colours

How to Change my Pictures and Photos

Reduce Picture Size

Reduce Picture File Size

Reduce Download Time

Increase Picture Size

How Protect my Pictures

Webpage Optimization

Find Best Keywords

SEO Check-List

Website Promotion

Search Engine Marketing

List of Search Engines

Robots File

You make a new file which you upload on your server together with all your other page files, because this is a file you want the spiders to read. See Don Pedro's Search Engine Marketing. You upload the file in the "Root", i.e.:
http://www.example.com/robots.txt

The file you write as follows:

The asterisk (*) indicates this concerns all search engines. "searchPrint.html" and "promoPrint.html" are here given as examples only. After "User-agent:" and after "Disallow:" you leave one empty space. The file name of this file is: robots.txt.

It is not completely sure which search engines follow which rules, because due to the possibility for misuse, very much is kept secret by the search engine operators. Note: leave no empty space to the left of each text line, that could render the robots file meaningless.

User-agent: *
Sitemap: your sitemap URL
Disallow: searchPrint.html
Disallow: promoPrint.html
Disallow: etc.

You should include your frames files too. Even if the search engines do not read frames, you need protection against those spiders who harvest all your files.

You must place the robots txt file at the first level on your server, like this:
http://www.d-pwebdesign.com/robots.txt

To be sure the important search engines find your robots.txt file, I suggest you include it in your sitemap.xml , if you are using that one.

As of April 2007 you can include the location of your xml sitemap immediately after the "Useragent" text line (as above). This is to assure the search engines find it, as they are supposed to always read the robots file.

In June 2008 Microsoft, Yahoo, and Google agreed on detailed and more specified documentation on how they implement the robots.txt file, details for instance in Live Search Blog. If you, for instance use the "noarchive" tag, it doesn't mean the search engine won't cache the page. The effect of this is only that the cached copy of your page will not be shown to any user.






Website Design and Promotion Search
Powered by Google

As this file is not to be read by any browser you don't need to include the DOCTYPE tag.

There is an example of a robots.txt file by Abakus Internet Marketing in English and an other one in German. The example file includes a long list of undesired spiders, which are blocked from accessing any file at all. Regarding info on "user-agent" go to robots txt org.

See also "Robots Organization" for more details on spiders and crawlers. ( Robots' site index )
Locations of visitors to this page

Get version (2 pages)

© by Cristina and Peter Forsberg.
You are allowed to print out the text for your personal needs.
You are also allowed to copy and distribute the printout for educational purposes when free of charge,
as long as you give the source: www.donpedrowebdesign.com/no-index.html.


Last updated:
Aug. 31, 2010

Visitor counter
Since June 04, 2004,
according to www.digits.com/

eXTReMe Tracker