As you might tell, I've had WWW indexing on the brain lately.
I recently downloaded and installed the University of Colorado Harvest
software from http://harvest.cs.colorado.edu/:
"Harvest is an integrated set of tools to gather, extract, organize,
search, cache, and replicate relevant information across the Internet. With
modest effort users can tailor Harvest to digest information in many
different formats, and offer custom search services on the Internet.
Moreover, Harvest makes very efficient use of network traffic, remote
servers, and disk space."
Harvest acts as a local web crawling robot.
I have created an initial index of UVM webspace: Summarized:
There are 599 objects in the database from 7 Internet server(s), including:
http://mole.uvm.edu/ (305 objects)
http://salus.med.uvm.edu/ (138 objects)
http://uvmce.uvm.edu:443/ (83 objects)
http://salus.uvm.edu/ (68 objects)
http://dna.med.uvm.edu/ (3 objects)
http://natrium.med.uvm.edu/ (1 objects)
http://220.127.116.11/ (1 objects)
The database may be searched via the interface found at:
Of course, this needs a bit of tuning, too, but it's still pretty exciting.
Comments are encouraged.
| Wesley Alan Wright <[log in to unmask]>; http://mole.uvm.edu/~waw/ |
| Academic Computing Services * * __0__ * |
| Room 238 Waterman Building / \ | \ * |
| University of Vermont * * \77 * |
| Burlington, Vermont 05405-0160 * \\ * |
| U.S.A. Voice: (802) 656-1254 * * vv * |
| FAX: (802) 656-8148 This message copyright 1996 WAW & UVM |