Site proprietors utilize the/robots.txt document to give guidelines about their webpage to web robots; this is known as The Robots Exclusion Protocol.
It works prefers this: a robot needs to vists a Web website URL, state http://www.example.com/welcome.html. Before it does as such, it firsts checks for http://www.example.com/robots.txt, and finds:
A robots.txt document lives at the base of your site. In this way, for site www.example.com, the robots.txt document lives at www.example.com/robots.txt. robots.txt is a plain content document that pursues the Robots Exclusion Standard. A robots.txt document comprises of at least one standards. Each standard squares (or permits) access for an offered crawler to a predetermined record way in that site.
Here is a basic robots.txt record with two principles, clarified underneath:
- First Rule
Client specialist: Googlebot
2. Second Rule
Client specialist: *
There are two important considerations when using /robots.txt:
robots can overlook your/robots.txt. Particularly malware robots that examine the web for security vulnerabilities, and email address gatherers utilized by spammers will give careful consideration.
the/robots.txt document is an openly accessible record. Anybody can perceive what segments of your server you don’t need robots to utilize.
So don’t endeavor to utilize/robots.txt to shroud data.
- Can I block just bad robots?
- Why did this robot ignore my /robots.txt?
- What are the security implications of /robots.txt?
How to create a /robots.txt file
Where to put it – in the top-level directory of your web server.
- What program would it be a good idea for me to use to make/robots.txt?
- How would I use/robots.txt on a virtual host?
- How would I use/robots.txt on a common host?
What to put in it
The “/robots.txt” document is a content document, with at least one records. Typically contains a solitary record resembling this:
Client specialist: *
In this precedent, three indexes are Disallow:.
Note that you require a different ” Disallow:” line for each URL prefix you need to bar – you can’t state “Deny:/cgi-container//tmp/” on a solitary line. Additionally, you might not have clear lines in a record, as they are utilized to delimit numerous records.
Note likewise that globing and customary articulation are not bolstered in either the User-specialist or Disallow lines. The ‘*’ in the User-operator field is an exceptional esteem signifying “any robot”. In particular, you can’t have lines like “Client specialist: *bot*”, “Deny:/tmp/*” or “Refuse: *.gif”.