Here we have talking about a vary small part of website, But it’s most important in your social or online presence.
All search engines have there own Crawlers, Crawler or Search Engine Robots it’s automated script which is grab data from your website and show it too user. When users are try to find or search relevant data which you have published on your website, they have find your website name in a search result with your published data.
Now this is the most important part, which part you want to show when users are searching over internet.
Crawler first fetch the information from robots.txt then crawler go to crawl sitemap. In Sitemap your all website links are stored which you are decided to indexed with search engine.
But what crawler do with other sensitive directory and links which is available on your server?
Crawler is a script, it doesn’t know which is your sensitive data or which is not. Script will do work as they design for.
You need to block your important directory or links from crawler to stop crawling important data.
Here is one example of robots.txt to prevent to Crawl important directory of WordPress and some part of important data or scrap data which you don’t want to publish over internet.
# Below mention disallow paths are important to block from search engines User-Agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ disallow: /*?* Disallow: *?replytocom Disallow: /comments/feed/ Disallow: /readme.html Disallow: /xmlrpc.php # If you don't have load any css js from wp-content/plugin you will also include Disallow: /wp-content/plugins/ User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: http://www.kevinmuldoon.com/sitemap.xml Sitemap: http://www.kevinmuldoon.com/post-sitemap.xml Sitemap: http://www.kevinmuldoon.com/page-sitemap.xml Sitemap: http://www.kevinmuldoon.com/category-sitemap.xml Sitemap: http://www.kevinmuldoon.com/author-sitemap.xml