|
One of my longest running projects and certainly one of the most expensive and time consuming has been
the search engine. This is a project that I began in the late 1990's and was transformed into a corporation
during the dot-com era that eventually went public before being sold. Unfortunately, I received very little
from the run-up and sale of the company mostly due to the economic factors of the time and the business
practices of the companies I worked with. A hard lesson learned, but it allowed me to take a break and
re-focus on the search engine itself and the technology surrounding it.
In 2002 and 2003, I made great strides towards building a brand-new search engine, crawler, software
suite and hardware from scratch, and without even taking a glance at the old code. The search engine
environment had changed completely since 1998, and success meant adapting to the new environment. New
search engines (primarily Google, but also others including Teoma, WiseNut and GigaBlast) had upped
the ante considerably, and adapting to this new world required a large amount of planning and insight.
I began the project in several phases, to span several years to accomodate my available time and budget.
In late 2003, I completed the first revision of the new engine, and in early 2004 I began work on the
new crawler. Some issues surfaced in this department, and I ended up spending nearly six months arranging
internet connectivity to accomodate the new crawler and it's insatiable bandwidth needs.
In late 2004, I resumed the task of programming the crawler and moving the search engine into Phase II.
The previous prototypes of the engine and crawler had allowed me to identify significant areas where
improvement was needed. At the time, I had been really stretching myself thin with too many projects, and
so I decided to take a new strategy with time management and focus on one project at a time. In August of
2004 I decided to focus on the WeatherCity project (www.weathercity.com),
a global weather forecasting system and website, which was completed in January, 2005. In February, I resumed
the Search Engine project beginning with the new crawler. I had assembled a new crawl platform which had cost
a considerable amount, and with bandwidth no longer an issue, testing and development went ahead at a much
quicker pace than ever before.
That brings us to the present timeframe; development is going ahead quite well. Substantial work during 2005 led to a
development of my latest crawler, Vortex. During a crawl over the 2005 Christmas holidays (a logical choice since total internet
traffic is generally slightly lower during that time period), I succeeded in downloading about 100 million documents,
of which was pared down considerably during a procedure to remove junk webpages. I am expecting deployment of the latest
engine at some point this year. My journal
details day-to-day accounts of my progress - among other things - so check it out frequently. If you have
any questions about the engine, I'm always glad to answer questions. Visit the contact
page to send me your questions or comments.
|