User Manual for the Search Program
for the Serendipity Website-on-CDThe Serendipity Website-on-CD contains a search program for searching all HTML files on the CD for all or any words in a set of words and for searching for exact phrases. Note that this search program runs under Windows (any version from Windows 98 to Windows 7). It can be used on a Macintosh only if the Mac has a Windows emulator or is a dual-processor Mac which runs Windows.
The first two sections of the user manual on the CD-ROM, namely, "Installation" and "Specifying the index file", have been omitted from this page.
At right is a screenshot of the program showing specification of the index file (on the CD-ROM), a set of search words and a list of files found containing all the search words. (Click here to see the result of this search.)
Searching a single file To search a single file, rather than all files on the CD, select the ‘Search a single indexed file’ option and specify the file in the usual way (allow a few seconds for the file names to be read from the CD-ROM).
Of course, the CD-ROM may be in a drive other than E:.
Search words and search type You can search the files on the Serendipity CD-ROM for a single word, any word in a set of words, all words in a set of words, or an exact phrase.
There is no limit on the number of search words which can be given for a search. Searches are not case-sensitive (that is, no distinction is made between upper and lower case).
Characters other than letters are ignored, and the words in the searchable files and in the ‘Search words’ textbox are delimited by non-letters. Thus search words may not contain numerals, apostrophes or other non-letter characters (except hyphens, as noted below). For example, Jack and Jill is three words, Jack/Jill is two words (Jack and Jill) and Jack1 likes Jill2 is three words (Jack, likes and Jill).
Hyphenated search words can be used, but only in exact-phrase searches, such as a search for Over-The-Counter derivatives.
There are files on the CD in five different languages, but only files in English and German have been indexed, so only files in these two languages are searchable. Search words may contain German letters, such as the ä in läuft.
Stem searches A stem is a sequence of letters which may be the initial segment of some word. Stems are marked by a terminating asterisk, e.g., comput*. This software allows searching for multiple words by using stems as search words.For example, if you search on terroris* the program will find all files (on this CD-ROM there are 432 of them) which contain terrorise, terrorised, terrorising, terrorism, terrorist, terroristic and terrorists, plus all files which contain German words which begin with this stem.
The asterisk can only be used at the end of a search word. You cannot search for, e.g., *like.
The search words may include more than one stem, e.g., religio* fundamentalis*.
A stem is equivalent to the set of all words which occur in some file and which begin with that stem, so searching on that word is equivalent to doing an any-word search on all the words which begin with that stem. Thus a stem search must be an any-word search; stems may not be used in an all-words search or an exact-phrase search.
When a report is generated, if the search words include one or more stems then the actual words searched for will be displayed (labelled as ‘Expanded [search words]’. If no words in any file match a search word (whether or not it is a stem) then that search word (or its expansion) will not appear in this list of actual words searched for. So, for example, if you search on bird gibbon dog* emu*, and bird, dog and dogs occur in some file (not necessarily the same file) but gibbon does not, nor does any word beginning with emu (such as emulate), then the list of expanded words will be bird, dog, dogs.
Searching on multiple stems is equivalent to doing an any-word search on all words which begin with any of those stems, so such a search may find a large number of files. For example, a search on religio* fundamentalis* will return all files containing any of the words fundamentalism, fundamentalist, fundamentalists, religion, religionists, religions, religiosity, religious and religiously, perhaps resulting in a superabundance of files found.
Output options The results of a search can be displayed either in a textbox within the software or as a web page in your default web browser. The display of results in the web page is preferable, since the search words are then displayed in boldface and there are links to the files found. The textbox option is provided in case there is some problem with displaying results in the default web browser.
If you have checked the ‘Generate report’ checkbox then the software will display the results automatically either by opening a textbox or by displaying a web page in your default web browser (you may have to switch to the browser manually). If you have not checked this checkbox then you can generate the report by clicking on the ‘Report’ button. The results can be preserved either by copying from the textbox to the clipboard (and from there to some text editor program such as Notepad) or by saving the web page to disk.
[For example, click here to see the result of doing a search as per the specification in the screenshot above.]
You can control whether the filepaths of the files found are displayed in the report by checking or unchecking the corresponding checkbox. The description tag will be displayed in the report if the corresponding checkbox is checked.
You can also control the maximum number of extracts which will be displayed in the report. You can also control the size of these extracts (see more on both of these points below).
If the ‘Sort files found by search word occurrence’ is not checked then the files will be displayed in the order in which the index module found them when creating the index file, their physical order.
If this checkbox is checked then the program will take note of the relative frequencies of the search words found in the files. Files are then displayed in descending order of the sum of these frequencies for all search words. Thus files in which the relative frequencies of the search words are higher will be displayed earlier in the output.
If this checkbox is checked then the search will take a little longer.
There is no limit on the number of files which can be searched but there is a limit on the number of matches (files found containing one or more of the search words). At most 300 files can be returned in a search.
Number and size of extracts When the report is generated (but not during the search itself) every occurrence of a search word is extracted together with several words before and after each search word, making a phrase called an extract. If the textbox next to ‘Maximum number of extracts’ is left blank then all these extracts will be displayed in the report. In an any-word search, or in a stem search, for each file in which at least one search word is found there could be over a hundred extracts. If you don’t wish to see them all, but only, say, the first ten, then you can limit the output by specifying the maximum number of extracts to be displayed.
The number of words before and after the occcurence of a search word is determined by the value selected for ‘Size of extract’. For example, here are extracts (following a search for the exact phrase Marvin Bush) with this value set to 1, 3, 5, 7 and 9 respectively:
... that Marvin Bush (W's ... ... reserves?/ Did you know that Marvin Bush (W's youngest brother) was in ...
... aggression to secure oil reserves?/ Did you know that Marvin Bush (W's youngest brother) was in charge of security at ...
... board for a global war of aggression to secure oil reserves?/ Did you know that Marvin Bush (W's youngest brother) was in charge of security at WTC and Dulles airport, ...
... the American public on board for a global war of aggression to secure oil reserves?/ Did you know that Marvin Bush (W's youngest brother) was in charge of security at WTC and Dulles airport, and his contract expired ...
Serendipity Website-on-CD Serendipity Home Page