|
The Vortex Web Indexing Robot
Note: Vortex ceased operation in 2006; it no longer crawls the web.
It seems that some other bot on the internet is using the same name, but it is completely unrelated to this project.
Vortex is a friendly and considerate web crawling robot that is part of a study I am conducting on internet link distribution. It acts very respectfully towards web servers, and unless you carefully analyze your logfiles you'll never know it was there.
Frequently Asked Questions
What does Vortex do?
Vortex crawls the web scanning documents and evaluates the content based upon research criteria.
How often will Vortex access my site?
Short answer: Very rarely. You'll probably never know it was there unless you very carefully study your logfiles (pat yourself on the back if you do!). Long answer: Vortex uses an internal loading algorithm to spread requests to the same domain over a period of time, usually no more than one hit per minute.
How do I stop Vortex from crawling parts or all of my site?
Simply add a robots.txt file to the root of your website. The robots.txt file is a simple set of instructions to inform crawlers what parts of a website (if any) that they are allowed to crawl. You can find more information about the robots.txt exclusion standard on this website. Remember, changes to your server's robots.txt file won't be immediately visible, they will be reflected the next time your site is crawled.
Why isn't Vortex obeying my robots.txt file?
In order to save bandwidth, Vortex only downloads your robots.txt file at most once per day. If you have made recent changes to your robots.txt file, it may take a little while before the crawler gets a new copy.
Another possibility is that your robots.txt file is incorrect or malformed. We recommend that you verify your robots.txt file against the standard at http://www.robotstxt.org/wc/exclusion.html#robotstxt. Finally, make sure that your robots.txt file is in the top level directory of your domain; if the robots.txt file is in a subdirectory it won't have any effect.
For more information, please see the Robots FAQ.
Update 02-Jan-05 I have identified a small bug in the robots.txt parser that caused a few rules to be ignored. I'm fairly certain that this issue is fixed now and all future crawls should fully obey the robots.txt rules. If the crawler has downloaded a link on your website that should have been denied, please let me know asap so I can look into the problem, and so I can post-filter the data set to remove any URLs that weren't properly caught by the robots.txt parser.
Can you tell me the IP addresses from which Vortex crawls so that I can filter my logs?
The IP addresses used by Vortex change often. The best way to identify Vortex is by the user agent (vortex).
Can I submit my site for indexing?
No, the only way to get indexed is to get linked to by other websites; Vortex will eventually follow a link to your site.
I have a question that isn't answered here.
Send me an email! I look forward to hearing from anyone who has any questions or comments about Vortex.
|