Edith Hamilton LibraryEdith Hamilton LibraryEdith Hamilton LibraryEdith Hamilton LibraryEdith Hamilton LibraryEdith Hamilton LibraryEdith Hamilton Library

 

Internet Search Guide

INTRODUCTION

In September of 2000, there were an estimated 2,000,000,000 addresses on the Internet. (Cyveillance, July 10, 2000) There are perhaps 500 times that many that elude indexing by the technology of present search engines. (Search Engine Report, Aug.2, 2000)  There is as yet very little standardization within the Web, and the non-hierarchical linking of sites does not lead to efficient information retrieval.  No search engine or subject directory comes close to including all that is on the Web.  And if these were not enough difficulties, the flux of technology creates additional problems.  Using the Internet effectively has been compared with nailing jello to the wall or herding cats.  It's hard to do, but definitions help as well as becoming familiar with common search practices and using the best sources.   The following short introduction should help; fuller explanations are available in the Webliography.

Definitions: World Wide Web, URL, MIME File Types, Helpers, and Plug-Ins, Domains, the Invisible Web
Boolean Searching:
Subject Directories:
Search Services (a.k.a. Search Engines):
Evaluation of Web Sites:
Webliography

DEFINITIONS

A very good dictionary is FOLDOC - Free On-Line Dictionary of Computing.

A few of the most important concepts are touched on below:

 A splendid succinct graphic-enhanced explanation of the Internet is offered in "How the Web Works."

The Uniform Resource Locator, or URL, is the address of a file stored on a host computer connected to the Internet.  URLs are translated into numeric addresses using the Internet Domain Name System (DNS).  The DNS is the actual address, but strings of numbers are difficult for people to remember, so alphanumeric addresses are employed.  These, however, have become quite complex and hard to remember in themselves, so try always to copy and paste the address or bookmark for later.  The URL's standard format is Protocol, Host computer name, Second-level domain, Top-level domain, Directory name, File name.  Example:
                         http://www.yorku.ca/dept/psych/classics/
Protocol=http ; www=host computer name ; yorku.=2nd level domain name ; ca=top level domain name ; dept=directory ; psych=subdirectory ; classics=file.  The name tells us that this is a site from York University in Canada, from its Psych. Dept. and it concerns classics, probably in the field of psychology.

Domains you should be able to recognize:

.com = commercial

.edu = educational

.gov = governmental

.mil = military

.org = non-profit organ.

.net = network

.ca = Canada

.de = Germany

.uk = Great Britain

The Invisible Web is comprised of dynamically changing databases that can be searched on an individual basis to produce an infinite number of reports.    An example is the Nobel Prize Database, which one can access online, but which is not listed as a source of information about individual prize winners when searching via a search engine.  Search engines presently reject database websites because they do not and cannot endlessly duplicate entries that are very similar.   One gets around this difficulty by using Subject Directories (see below) and by using some special search engines (see "Invisible Web and Database Search Engines" Search Engine Watch, Feb. 2002)

BOOLEAN SEARCHING

When searching the Internet, certain ways of stating what is sought can mean the difference between an almost random list of several thousand 'hits', or a few sites that are highly relevant.  Since the point-and-click techniques for searching are mindlessly simple, one is tempted to believe that by just throwing a word or two into the search window, one will miraculously emerge with the best of all possible sites.  But it isn't so.   One can waste hours reading descriptions of endless numbers of possible sites that might help, or one can use one's time investigating meaty sites that have answers.   Superior students choose efficiency.

Computers and their robotic arachnoid buddies who search the Internet have been programmed to recognize three (sometimes more) logical operators:  AND, OR, NOT.

This doesn't seen like much at first glance, but one can do quite a lot of narrowing and specifying with just these three words (operations).   (Don't forget what computers can do with just '0' and '1'). 

When two terms are connected with OR, the results are always more than with the single term.  Example: dogs OR cats will produce items that contain 'dogs' and items that contain 'cats'.  If we added monkeys, the resulting hits would be greater.   OR is most often used to search synonymous things.  The default for most search engines is OR.  To be sure of an AND search, use + before each term

AND narrows a search--contrary to the intuitive feel of 'and'.  It does so because it requires that two (or more) items be present in a single hit.  Thus, dogs AND cats produces a result only when both are contained in a single place, article, title, url, etc.  To assure that AND is being used when you want this effect, always use + before each term.

NEAR is a choice in some searching services such as Alta Vista and LycosPro.  It functions as a restrictive AND. 

NOT is used to eliminate term(s) not wanted.  It translates into looking exclusively for one term.  Example: raisins NOT grapes

NOTE: Sometimes, particularly in using the advanced search techniques offered by most searching services, a template is offered, and the AND, OR, NOT, and NEAR become phrases.  OR = can/should contain these words; match any term;  AND = all of these words/must contain these words, match all terms; NOT = must not/should not contain these words.

Also, one can combine terms (use AND, OR, NOT in a single search) and nest terms (ask that one operation be done before any other) and truncate terms (to search for all forms of a word at the same time) for additional specificity.  Example: (Navajo* OR Navaho*) AND Hopi* NOT "pueblo Indian*"  Not all services support all these methods.   Note that double quotation marks should always surround words that should be treated as a unit.

A more detailed tutorial (upon which much of the above was based) is Laura Cohen's Boolean Searching on the Internet.

SUBJECT DIRECTORIES

A subject directory is a service that offers a collection of links to Internet resources that have been selectively evaluated by human beings and sorted into subject categories.  There are two main types: academic/professional and commercial.   The most famous subject directory is Yahoo!  It is a commercial service, and very large, but it is not academically very selective. 

Use a subject directory when you want to see what professionals and experts say are the most important or useful sites, and when you want to avoid viewing lots of useless pages of containing little data.  When you are beginning your research with a broad topic and want to examine the breadth of sources available, this source is the best.

Recommended Subject Directories

A good up-to-date guide, "How to Choose a Search Engine or Research Database" is available from Laura Cohen, SUNY Albany.

SEARCH SERVICES, A.K.A. SEARCH ENGINES

A search engine is a searchable database of Internet files that has been collected by a computer program called variously a spider, crawler, or worm, etc.   Indexes are created from this web-crawl--also computer generated, and the results are presented in some order.  No selection criteria are employed in collecting the files.  Search engines are not human centered.  This is in stark contrast to search directories, which are human-centered collections.  Search engines actually contain the content of the web pages they retrieve; search services are only links to pages.  Search engines search millions of pages; directory queries are much more limited, but more likely to supply relevant information. 

Use a search engine when you have a narrow or obscure topic to research; when you are looking for a particular site; when you don't mind retrieving thousands of pages; when you want to search for certain types of documents--either by file type, source, language, dates, etc., or when you want to take advantage of new retrieval technologies such as concept clustering, ranking by popularity, link ranking, etc.

    Types of search engines:                                                Examples:
   

Individual (Single) search engines Alta Vista, Excite, HotBot, Infoseek, Lycos
Meta-search (searches multiple engines simultaneously) MetaCrawler, ProFusion
Innovators (primarily because they sort results differently Ask Jeeves, Direct Hit, Google!, Inference Find, MetaFind, Northern Light

Individuals, metas, and the new innovators all permit some kind of  power searching, though the command that brings forth the usual form is mighty small and hard to locate on some busy pages.  Power searching means you can use full Boolean syntax. Some search engines permit results to be listed in user-defined ways. 

EVALUATION OF WEBSITEs

Examine a site for its content, the authority of the creator, its organization, its searchability, the stability of the information and site, the appropriateness of its format (art site with no color visuals), and equipment requirements.

In doing scholarly research, be careful that the content is reliable and up-to-date. Ask yourself for whom the site is written.  Who or what group mounted this page, and is it being maintained?  Can you find what you are looking for easily?  Does it have a site search engine or a good index?  Is it attractive?  Are the illustrations it uses necessary and appropriate?  Does it charge fees?  Does it have such sophisticated software and hardware requirements that you cannot utilize it properly?

Remember, .edu sites are not always scholarly; .org sites and .net sites may be sponsored by hate groups; .com sites are sometimes wonderful and scholarly and free; some .gov sites are badly organized and full of bad connections while others are models of efficiency.  Train your judgement by practicing and thinking!

The very best guide the Library has found on Web evaluation is IC YouSee: T Is for Thinking, by John R. Henderson of Ithaca College.

BIB- AND WEBLIOGRAPHY

Cohen, Laura.  "Boolean Searching on the Internet."  University at Albany Libraries, c1999.  Accessed 01/31/02  (Approx. 9 pages)

_______________. "IC YouSee: T is for Thinking."  Ithaca College Library, c1999.    Accessed  -1/31/02

"Index of Topics," from Learn the Net, copyright Michael Lerner Productions.   Accessed Oct. 01/31/02. (Approx 1 page)  [A good overall guide to almost everything.]

"Internet Exceeds 2 Billion Pages," Cyveillance Press Release, August 2, 2000. Accessed 01/31/02.  (Approx. 2 pages with graph of growth)

Lutans, John.  "When Students Hit the Surf : What Kids Really Do on the Internet and What They Want from Librarians.  School Library Journal,  Sept. 1999.
pp. 144-147.  

Silverstein, Alan.  "Under the Hood of the World Wide Web," from Learn the Net, copyright Michael Lerner Productions.   Accessed 01/31/02.  (Approx. 8 p.)

____________. "Search Assistance Features"  Copyright Search Engine Watch,  Internet.com Corp., 1999   Accessed 01/31/02 (about 12 p.)

____________. "Search Links : Specialty Search Engines : Invisible Web," Search Engine Watch,   Internet.com Corp., 2000   Accessed 01/31/02 (about 5 pages)

 

December, 2007