Bots, Spiders, Crawlers are all nick-names for the little programs that are constantly scouring the Internet indexing every file they find. You can see which Bots have visited your Website in your site’s stats/logs.
For example, Googlebot is Google’s little indexing bot. If you do not see Googlebot in your stats — that means Google is not visiting your site.
These Bots are how search engines become aware of your site and any new additions/pages so you can be ranked accordingly. Understanding how they work is pretty important to every site owner because you can control how often they visit (in some cases having to throttle their visits so they do not use up your site resources) as well as what they index.
If you don’t want to let these critters in to index certain areas of your site — images, member or private directories for example — it is important that you set up a robots.txt file. These little algorithms, at least the well-behaving ones, will look for that file before they proceed in case you want to designate areas on your site/server that you do not want indexed.
Well-behaving ones? Yep, there are some bots that ignore your robots.txt file requests. Not much you can do about that. In general though, most bots do look for that file which is important if they are causing resource problems and memory issues. Hence the need to control access and frequency.
Twitterbot has been a problem for some of my sites recently causing me to have to block and/or throttle. Google does allow you to change visit frequency in Webmaster Tools, but cautions that you only want to do so if you know for sure you are having bandwidth issues due to Googlebot.
Read up and understand what Webcrawlers, Webrobots, Bots, Crawlers and Spiders are all about @ http://en.wikipedia.org/wiki/Web_crawler. Then, use that information to create your robots.txt file and control what information on your site is indexed and preserve resources if need be.
At your service,