® 

| At the bottom of the page, there is a link to a print ready version. |
Why "No-Index" Tags Password Protection Robots File |
|
Why "No-Index" TagsThere can be several reasons for not wanting a page indexed by search engines. The "no-index" tag comes second to last, just before the <title> tag in the <HEAD> section: This works with most search engines, but not with all. There are some that disregard it and if you want to be sure you have to add something more. The usual reasons for wanting some pages to stay "secret" are:
Password ProtectionBecause spiders cannot enter passwords you can use this technique instead of an encrypted reference link. Go first to: about.com/, there you get a link to the JavaScript to use, but you need to do some changes in the script first. You get them from the about.com webpage. It doesn't matter if you give the password to your visitors directly. They can enter it - the search engine spiders will skip the whole script completely. The password script is shorter than the encrypted reference link code.Because this is a JavaScript you can get problems if you use it on a free server. There can be a collision between this and the banner script - and you are not allowed to add any script in the <HEAD> section - may be your script won't work. Anyway I prefer the encrypted reference link as it's more easy for the visitors to use. |
|
Home |
Robots FileYou make a new file which you upload on your server together with all your other page files, because this is a file you want the spiders to read. See Don Pedro's Search Engine Marketing. You upload the file in the "Root", i.e.:The file you write as follows: The asterisk (*) indicates this concerns all search engines. "searchPrint.html" and "promoPrint.html" are here given as examples only. After "User-agent:" and after "Disallow:" you leave one empty space. The file name of this file is: robots.txt. It is not completely sure which search engines follow which rules, because due to the possibility for misuse, very much is kept secret by the search engine operators. Note: leave no empty space to the left of each text line, that could render the robots file meaningless.
You should include your frames files too. Even if the search engines do not read frames, you need protection against those spiders who harvest all your files. You must place the robots txt file at the first level on your server, like this: To be sure the important search engines find your robots.txt file, I suggest you include it in your sitemap.xml , if you are using that one. As of April 2007 you can include the location of your xml sitemap immediately after the "Useragent" text line (as above). This is to assure the search engines find it, as they are supposed to always read the robots file. In June 2008 Microsoft, Yahoo, and Google agreed on detailed and more specified documentation on how they implement the robots.txt file, details for instance in Live Search Blog. If you, for instance use the "noarchive" tag, it doesn't mean the search engine won't cache the page. The effect of this is only that the cached copy of your page will not be shown to any user. |
Powered by Google |
As this file is not to be read by any browser you don't need to include the DOCTYPE tag. There is an example of a robots.txt file by Abakus Internet Marketing in English and an other one in German. The example file includes a long list of undesired spiders, which are blocked from accessing any file at all. Regarding info on "user-agent" go to robots txt org. See also "Robots Organization" for more details on spiders and crawlers. ( Robots' site index ) |
|
Last updated: |
|
Since June 04, 2004, according to www.digits.com/ |
|
|