Marty Anstey Logo
 About Me   Hobbies   Business   Programming   Photos   Projects   More... 

Vortex

The Vortex Web Indexing Robot

Note: Vortex ceased operation in 2006; it no longer crawls the web.

It seems that some other bot on the internet is using the same name, but it is completely unrelated to this project.

Vortex is a friendly and considerate web crawling robot that is part of a study I am conducting on internet link distribution. It acts very respectfully towards web servers, and unless you carefully analyze your logfiles you'll never know it was there.

Frequently Asked Questions

What does Vortex do?

Vortex crawls the web scanning documents and evaluates the content based upon research criteria.

How often will Vortex access my site?

Short answer: Very rarely. You'll probably never know it was there unless you very carefully study your logfiles (pat yourself on the back if you do!). Long answer: Vortex uses an internal loading algorithm to spread requests to the same domain over a period of time, usually no more than one hit per minute.

How do I stop Vortex from crawling parts or all of my site?

Simply add a robots.txt file to the root of your website. The robots.txt file is a simple set of instructions to inform crawlers what parts of a website (if any) that they are allowed to crawl. You can find more information about the robots.txt exclusion standard on this website. Remember, changes to your server's robots.txt file won't be immediately visible, they will be reflected the next time your site is crawled.

Why isn't Vortex obeying my robots.txt file?

In order to save bandwidth, Vortex only downloads your robots.txt file at most once per day. If you have made recent changes to your robots.txt file, it may take a little while before the crawler gets a new copy.

Another possibility is that your robots.txt file is incorrect or malformed. We recommend that you verify your robots.txt file against the standard at http://www.robotstxt.org/wc/exclusion.html#robotstxt. Finally, make sure that your robots.txt file is in the top level directory of your domain; if the robots.txt file is in a subdirectory it won't have any effect.

For more information, please see the Robots FAQ.

Update 02-Jan-05 I have identified a small bug in the robots.txt parser that caused a few rules to be ignored. I'm fairly certain that this issue is fixed now and all future crawls should fully obey the robots.txt rules. If the crawler has downloaded a link on your website that should have been denied, please let me know asap so I can look into the problem, and so I can post-filter the data set to remove any URLs that weren't properly caught by the robots.txt parser.

Can you tell me the IP addresses from which Vortex crawls so that I can filter my logs?

The IP addresses used by Vortex change often. The best way to identify Vortex is by the user agent (vortex).

Can I submit my site for indexing?

No, the only way to get indexed is to get linked to by other websites; Vortex will eventually follow a link to your site.

I have a question that isn't answered here.

Send me an email! I look forward to hearing from anyone who has any questions or comments about Vortex.




  :: News
 January 1 2010
Wow, is it 2010 already?
 January 1 2009
Welcome to 2009!
 July 29 2008
If you're into folk-rock music, check out the latest album by my dad, R.G. Anstey
 January 1 2008
Happy new year!
  :: Features
  :: Links
  :: Search the Site

Home - Writing/Poetry - Programming - Projects - Music - Travel - Guest Book - Calendar
Business Ventures - PHP Scripts - Web Spiders - Search Engine - Links - Contact Me

Constructed entirely by hand using only TextPad and PhotoShop
Modified Tuesday June 9, 2009 - 18:45 UTC

(C) Copyright 2000-2010 Marty Anstey ~~ I didn't rip you off, so don't rip me off.