Isite/Isearch Quick Start Guide
or, Installing and Using Isite/Isearch for the Hopelessly Impatient
with Slightly More Thorough Instructions for Those who Care.
Isearch is a text search system developed by CNIDR. CNIDR is the Center for Networked Information Discovery and Retrieval. They are part of a non-profit corporation known as MCNC. MCNC used to stand for "Microelectronics Center of North Carolina", but they dropped the long form and now they're legally just MCNC. Anyway, the lawyers thought you'd like to know that.
Isearch pretty much does one thing: it lets the user search through a bunch of text by hunting for words. It doesn't "understand" your text collection, it doesn't try to parse it, it just uses slightly-less-than-brute-force statistical methods to hunt for words in documents. It does this by building indexes. An Isearch index is a just like the index in a book: if you look for a word in the index of a book, it tells you a page number to look at. Likewise, if you search for a word with Isearch, it looks in the index for a filename associated with that word and then shows that file to you. It can do more complicated things, but that's the basic game plan.
Installing Isite requires five steps:
Really, installation usually isn't tricky at all. Start to finish should take around thirty minutes the first time, and about ten when you download new versions.
STEP 0: GET GCC and GZIP
Okay, we lied: There's an extra step. You really should have gcc and g++ on your machine, and you must have gzip (and its companion gunzip) so you can unpack the archive that Isearch comes in. You can get gzip and gcc via anonymous ftp from prep.ai.mit.edu in the directory /pub/gnu. Gzip installation is fairly simple. Unpack the tar file and follow the instructions in the file "README". Installing gcc and g++ is a lot more complicated. In principle, you could use a vendor-supplied C++ compiler. The Solaris SparcWorks C++ compiler is known to work, and the CenterLine C++ compiler has also compiled Isearch successfully. The bad news is that there are several subtly different versions of the C++ specification and each compiler views it a little differently. Getting Isearch to work with a different compiler is about like porting Isearch to another machine. Isearch is also fairly dependent on compiler versions, especially with gcc. Gcc users should make sure they are using at least 2.7.X. Note that the old, pre-compiled gcc for Solaris is not meant to be a general-purpose compiler: it's just enough of a compiler to compile a newer version of gcc.
Incidentally: You're also going to have to install libg++ if you haven't already. SAme ftp site, same directory. Install it after you install g++.
STEP 1: GET ISITE
You should always work from the latest version of Isite available. Bugs are being corrected and new features are being added daily (literally). You can always get the newest version from ftp.cnidr.org. Here's a sample session where we will download today's version of Isearch from ftp.cnidr.org by logging in as "anonymous", sending our email address as our password, changing the current directory to "/pub/software/Isite", making sure we're in binary mode for ftp, and getting the file:
sti-gw% ftp ftp.cnidr.org Connected to kudzu.cnidr.org. 220 kudzu FTP server (Version wu-2.4(1) Sun Jan 1 17:43:49 EST 1995) ready. Name (ftp.cnidr.org:escott): anonymous 331 Guest login ok, send your complete e-mail address as password. Password: 230- Welcome to kudzu.cnidr.org 230- 230- ftp logged in from sti-gw.sti-ext.com at Mon Mar 25 11:14:59 1996 230- 230- There are currently 7 users of the maximum 25 logged on. 230- 230- If you have trouble using this experimental FTP client, try including 230- a dash in front of your username (as your login password). This will 230- disable informational messages such as these. 230- 230- Comments??? Send mail to `bmv@k12.cnidr.org` 230- 230- 230 Guest login ok, access restrictions apply. ftp> cd /pub/software/Isite 250-Please read the file README 250- it was last modified on Fri Mar 22 15:41:06 1996 - 3 days ago 250 CWD command successful. ftp> ls 200 PORT command successful. 150 Opening ASCII mode data connection for /bin/ls. total 4995 Isite-2.00-linux.tar.gz Isite-2.00-osf1.tar.gz Isite-2.00-solaris.tar.gz Isite-2.00-sunos.tar.gz Isite-2.00.tar.gz README archive untested 226 Transfer complete. 772 bytes received in 0.031 seconds (24 Kbytes/s) ftp> binary 200 Type set to I. ftp> get Isite-2.00.tar.gz 200 PORT command successful. 150 Opening BINARY mode data connection for Isite-2.00.tar.gz (1225536 bytes). 226 Transfer complete. local: Isite-2.00.tar.gz remote: Isite-2.00.tar.gz 1225536 bytes received in 1.12 seconds (9.9e+02 Kbytes/s) ftp> quit sti-gw%
Notice a few things:
STEP 2: UNCOMPRESS AND UN-TAR THE DISTRIBUTION
You should now have a copy of Isearch as a gzipped tar tar. The ".gz" suffix indicates a gzipped file, and the ".tar" suffix indicates it's tarred. The first thing to do is to uncompress the file:
sti-gw% gunzip Isite-2.00.tar.gz
You now have a (much larger) file called "Isite-2.00.tar".
The next step is to un-tar the file we just uncompressed. In our example, we're going to want to install Isearch so it has the path name "/local/project/Isite-2.00". To do this, copy or move the file to "/local/project":
sti-gw% mv Isite-2.00 /local/project
Finally, we'll un-tar the distribution:
sti-gw$ cd /local/project
sti-gw% tar xf Isite-2.00.tar
There should now be a directory named /local/project/Isite-2.00, and it should contain the Isearch distribution:
sti-gw% ls /local/project/Isite-2.00 CHANGES Makefile TUTORIAL doctype COPYRIGHT README bin src
If you see all of that, then you probably got it right.
STEP 3: EDIT THE MAKEFILE
This is the part that makes people nervous, but it really isn't that bad at all. You need to edit the Makefile with your favorite editor, and essentially just follow the instructions. Edit /local/project/Isearch-1.10/Makefile in this example (you won't have to edit the makefiles in the subdirectories, they automatically inherit what they need).
The first choice is for compiler. Probably 99% of the world should leave the default "g++". Isearch is developed by people who use g++, so that's your best bet. It's also hard to beat the price.
The second choice is for "CFLAGS", which are the options to pass to the compiler. There are canned selections for you to choose from. If you don't know what to select, then "CFLAGS=-g -DUNIX -Wall" is a good starting guess.
The third choice is for the location to install the finished programs. The default "/usr/local/bin" is a good guess. Note that if you don't elect to actually "make install" in step 4 then you'll never have to set this.
The rest of the choices probably should be left alone.
While on the subject of editing files, it's worth noting that the file /local/project/Isite-2.00/Isearch/doctype/dtconf.inf describe how many "doctypes" your copy of Isearch will be aware of. A doctype is essentially a file type handler; there are doctypes for simple ASCII files, files of separate paragraphs, and so forth. You can almost always just leave this file alone and take the default, which is "Give me all of 'em!".
STEP 4: COMPILE
At this point, you should be able to
sti-gw% cd /local/project/Isite-2.00 sti-gw% make
And all the right things will happen. Note that many compilers will print a few warning lines along the way, but they're warning about pretty harmless stuff. If there are any errors, then the compilation will grind to a halt. Otherwise, when you're done there will be new files in the "bin" subdirectory:
sti-gw% ls bin Iindex Isearch Iutil libIsearch.a libz3950.a libsapi.a izclient zclient zping zserver zbatch zcon zgate
If you got that far, then you can (optionally) :
sti-gw% make install
and the compiled executables will be copied to /usr/local/bin.
STEP 5: INDEX SOME TEXT AND SEE HOW IT WORKS
Now let's index a little text and see if Isearch really works. First, let's pick a couple of files to index. In this example, we'll index the files "CHANGES" and "COPYRIGHT" since we know everyone has them.
sti-gw% cd /local/project/Isite-2.00 sti-gw% mkdir testIndex sti-gw% cd testIndex sti-gw% /local/project/Isite-2.00/bin/Iindex -d tester ../CHANGES ../COPYRIGHT Iindex 1.20 Building document list ... Building database tester: Parsing files ... Parsing /local/project/Isite-2.00/CHANGES ... Parsing /local/project/Isite-2.00/COPYRIGHT ... Indexing 1870 words ... Merging index ... Database files saved to disk. sti-gw% ls tester.dbi tester.inx tester.mdg tester.mdk tester.mdt
That created five index files to describe the two files we indexed. Don't worry, we could have indexed every file on your system and still only had five index files. Now let's do a little searching and see how well we did:
sti-gw% /local/project/Isite-2.00/bin/Isearch -d tester RSET table ../bin/Isearch -d tester RSET table Isearch 1.20 Searching database tester: 1 document(s) matched your query, 1 document(s) displayed. Score File 1. 100 /local/project/Isearch-1.09.09/CHANGES Select file #:
At this point, if you enter "1" and press return, it will display the contents of "CHANGES". If you press return without a number, Isearch will exit. This is sort of a trivial example, but if you index thousands of files then the above search will list the ones that contain either "RSET" or "table". If you run Isearch without any arguments then it will give a list of all of its options, what they mean, and some examples of their use. For more information, see the Isearch Tutorial included with the distribution.