Robots.txt- What Is The Program All About?

Robots.txt- What Is The Program All About?

Robots.txt file is used in websites to give instructions about the website to web robots; this is called (The Robots Exclusion Protocol).txt4

How does this thing work?

It starts with a robot wants to vists a Web site URL, like…. http://www.example.com/welcome.html. Before it does so, it first checks for http://www.example.com/robots.txt, and finds:

txt

The “User-agent: *” means this section applies to all robots. The “Disallow: /” tells the robot that it should not visit any pages on that website.

There are two important considerations when using /robots.txt:

1. robots can ignore your /robots.txt.

Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention to such.

2. the /robots.txt file is a publicly available file.

This means Anyone can see what sections of your server you don’t want robots to use. So don’t try to use /robots.txt to hide information.

How to create a /robots.txt file and where to place it

you can place the file in the top-level directory of your web server. When a robot looks for the “/robots.txt” file for URL, it strips the path component from the URL (everything from the first single slash), and puts “/robots.txt” in its place.

Example-  for “http://www.example.com/sell/index.html, it will remove the “/sell/index.html“, and replace it with “/robots.txt“, and will end up with “http://www.example.com/robots.txt”.

And as a website owner,  you need to place it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site’s main “index.html” welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

One more thing to Remember is to use all lower case for the filename: “robots.txt“, not “Robots.TXT".

What should I  include in the the file?

The “/robots.txt” file is a text file, with one or more records. Usually contains a single record see image below.

txt1

In here, three directories are excluded. You will need a separate “Disallow” line for every URL prefix you want to exclude,  you may not be able to say “Disallow: /cgi-bin/ /tmp/” on a single line, you also may not have blank lines in a record, as they are used to delimit multiple records.

Note:

Globbing and regular expression are not supported in either the User agent or Disallow lines. The ‘*’ in the User-agent field is a special value meaning “any robot”. Specifically, you cannot have lines like “User-agent: *bot*”, “Disallow: /tmp/*” or “Disallow: *.gif”.

Your server has a lot to do with what you want to exclude. Everything not explicitly disallowed is considered fair game to retrieve.

Here are some examples to follow…

txt2

txt3

Click here For more information on robots.txt

Hope this was of a great help,  I dearly thank all those who contributed to this important topic.

Sincerely yours

Sam Ammouri

Like To Meet Sam?

Visit my business profile page  and the about me page  to know my exciting and bumpy life story.

If you need any further assistance, Please do not hesitate to write me using the comment box below or use the contact page to ask for anything I might be able to help you with.

Don’t forget to visit the website home page to learn more on how to master the online world. And the greeting page to learn more about the great and cost Free Online opportunity I’m promoting and a member myself.

Also hope you won’t forget to share this important information with those you love and care for, help me spread the good news.

3781621t

Best Regards

samy

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Comments Protected by WP-SpamShield Anti-Spam