Using Meta Search Engines for Chemistry

Part of The Alchemist's Lair Web Site
Maintained by Harry E. Pence, Professor of Chemistry, SUNY Oneonta, for the use of his students. Any opinions are totally coincidental and have no official endorsement, including the people who sign my pay checks. Comments and suggestions are welcome (pencehe@oneonta.edu).

Last Revised Mar. 3, 2002

YOU ARE HERE> Alchemist's Lair > Web Tutorial > Meta Search Engines for the WWW.

(This article appeared in the Spring 2002 issue of the Computers in Chemical Education Newsletter.)


Meta Search Engines for the WWW, Harry E. Pence, SUNY Oneonta, Oneonta, NY, pencehe@oneonta.edu


INTRODUCTION

The basic idea of a meta-search engine seems very attractive. Meta engines do not search their own database of web pages, but instead simultaneously submit the keyword(s) which are entered to several different traditional search engines, compile the results from the various engines, and present the results to the user. If it is true that few search engines cover the entire searchable web (not to speak of the potions of the WWW that cannot be accessed by spiders) it seems like a very good strategy to combine the results from several different engines to obtain greater coverage and create a correspondingly greater chance of finding the best reference to the topic being searched. In some cases, this approach is actually advantageous, but it is essential to understand that the basic assumption is probably no longer accurate, and there are also some fundamental problems with meta-engines.

Comparison of Meta-search Engines

As noted earlier articles in this series, the most comprehensive search engines at present are Google, FAST, and Northern Light. Any meta-search engine that doesn't access at least one of these engines is not very comprehensive, no matter how many other engines it may cover. In fact, few meta engines will include these choices, because the big three either do not permit meta engines to submit queries, or else charge more for the service than the meta-engine will pay. A second problem is the failure of most meta-search engines to translate Booleans into the format appropriate for the engine that is being used. For example, if you query with ionic + solvents, will the meta-engine convert this into the format ionic AND solvents for engines that don't recognize plus and minus signs? This also contributes to another problem with meta-engines, namely, that it is often more difficult to refine a search. Finally, some of the meta-engine results fail to include anything from some of the engines that the meta-engine claims to search.

The table below compares several of the most popular meta-search engines, listed in alphabetical order. In each case, the list of search engines used is based on the claims of the engine. In some cases an engine was claimed, but no hits were returned, even though that term should have produced hits on the engine in question. One possible explanation for this failure is that some meta-engines only produce a preset maximum number of returns, but it seems unlikely that the omitted returns should always be from the same, major search engine.

Meta-engine Search engines used Comments
Dogpile About, Ah-ha, AltaVista, Direct Hit, Dogpile Web Catalog, ePilot, FindWhat, Kanoodle, LookSmart, Open Directory, Overture, RealNames, SearchHippo, Sprinks, WiseNut and Yahoo! Dogpile returns up to 10 hits per engine, which are not combined into a single list
Ixquick AOL, Altavista, Excite, Find,What, LookSmart, Lycos, MSN, Open Directory, Overture, Sprinks Yahoo, and alltheweb. Ixquick returns a single list, with the quality of the sites indicated by awarding one star for each search engine that placed it in its top ten. Ixquick also translates searches that include wild cards and Booleans to match the different engines.
Mamma Ask, FindWhat, Lycos, MSN, and Overture. (I could not find a listing of the engines used. This is a list of all the returns that were noted for several searches.) It is possible to specify the Boolean terms, And, Or, or to search for the exact match of a phrase.
MetaCrawler AltaVista, DirectHit, Find,What, Google (?), Internet Keywords, Kanoodle, LookSmart, MetaCatalog, Open Directory, Overture, and Sprinks by About.   Even though Google is listed, I didn't get any results from that engine despite doing several different one word searches. There were also no hits from Altavista.
ProFusion About, Adobe PDF Online, All the Web,   AltaVista, AOL, Britannica, LookSmart,   Lycos, MSN, Netscape, Raging Search, and   Yahoo!  Can be customized to search the best 3, fastest 3, or all the available engines. You may select three engines based on speed, accuracy, or personal choice. Profusion returns a single list, ranked by relevance. It is said to modify Boolean searches to work on different engines. (Boolean terms must be capitalized!)
Surfwax Yahoo, AllTheWeb, YahooNews, WiseNut, AOL, MSN, OpenDirectory, Encarta, SearchEdu, Lycos, FirstGov, Excite, Thunderstone, and HotBot. It includes engines that search both government sources (i.e. FirstGov) and current news sources, both of which are highly desirable. It does use FAST, but not Google, Northern Light, nor AltaVista. It does return a consolidated list, with the URLs listed in order of relevance and also is has several useful features, the best of which is called site snap. This allows you to click on the magnifying glass next to a site to obtain a summary of the site. This may help to produce a better focused search.
Vivisimo Yahoo, FAST, AOL, MSN, OpenDirectory, Direct Hit, Looksmart, AskJeeves, Lycos, and HotBot. This engine also provides a number of useful options, including CNN, NYTimes, Britannica, and PubMed@NIH. This search engine will automatically cluster the search results into folders. Yes! Even in chemistry it knows enough to organize enediynes into research, vita, chemistry departments, etc. A really great feature!

Comparing Meta-search Engines

This comparison was based on the logic in the previous article in this series, using search terms that have been found to give relatively few returns. The two terms selected were enediynes and ionic solvents. To establish a benchmark, these two terms were searched on Google to serve as a basis of comparison in terms of both number of hits as well as the relevance of the hits returned. The meta-engines are listed in the table below roughly in order of preference, based on the number of sites returned and the apparent relevance of those sites.

Engine used enediyne hits "ionic solvents" hits (see note) relevance (Comparison to Google)
Google (the standard) claimed 515, actual 328 142

Standard

Vivisimo 120 66 excellent
Surfwax 107 134 very good
IXquick

37 ("best from 369")

see text below

20 ("best from 82")

see text below

very good
Profusion 20 19 very good
Mamma 20 30 good
Dogpile 10

8 (+108 poor ones on other engines)

see text below

good
Metacrawler 10 10 fair

Note: A Google search on ionic + solvents gave about 17,500 hits

The first thing that must be noted is that even on search terms where the number of hits returned on a meta-engine would not be expected to be limited by the parameters of the engine, none of these engines gave as broad of a search as Google alone. (Notice that IXQuick suggests that the actual number of hits returned is selected from a larger number that were found on the engines searched, but there did not seem to be any way to access this larger number of hits.) That suggests that the argument that using a meta-engine will allow one to cover more of the web is incorrect.

In this list, the Vivisimo engine seemed to be in a class by itself. The ability to organize hits into folders is an excellent feature by itself and when this is combined with a comprehensive engine like FAST, it is a powerful combination. This meta engine has recently received an award from Search Engine Watch, and it deserves it. Three of the meta-engines returned results that were, in the opinion of this reviewer, comparable to those returned by Google alone, and so were rated as very good on relevance. Mamma returned fewer hits than the first three, and the relevance seemed significantly poorer than Google, while Dogpile returned even fewer hits, but those returned were considered more relevant. For the "ionic solvents" search,Dogpile listed several secondary engines that gave 108 more hits, but they seemed so little related to what a chemist would want that they were considered to be not worth counting. Both Dogpile and Mamma seemed to be less useful than the three engines in the first group, so they were placed in a second, lower, category. Metacrawler gave few hits and those did not appear to be very relevant, and so it was placed in the lowest category.

One reason for choosing the compound term "ionic solvents" was to test whether the various meta engines would carry the quotation marks on the searches they performed. As may be seen in the note, elimination of the quotation marks or replacement with a Boolean AND produced over 17,000 hits on Google, and so the relatively low number of hits on all the engines tested indicates that the parentheses were, indeed, accepted by each of the search engines used.

Based on this small study, with only a few search terms, it would appear that Vivisimo is in the top class because of the ability to do a comprehensive search and group related hits into folders; three of the meta-search engines, Surfwax, IXquick, and Profusion, are roughly comparable in usefulness for chemical searches; Mamma and Dogpile are less satisfactory but still might be of some use; and Metacrawler seems to be a poor choice. The paid placements in Metacrawler seemed to be especially distracting. None of these engines seems to really use Google, but the three rated best, Surfwax, IXquick, and Profusion, do include alltheweb, which is powered by FAST, one of the most comprehensive engines. Surfwax had two features that might make it of special interest: the site snap feature and the inclusion of FirstGov, which searches government sites. Dogpile, Mamma, and Metacrawler do not combine the hits from the various engines into a single list, nor do they attempt to correlate the relevance ratings of the different engines to provide an overall listing. The presence of this feature is convenient and provides another reason for preferring Surfwax, IXquick, or Profusion.

Vivisimo alone may present a good answer to the question, "Why use a meta-engine?" Vivisimo was added to the list of meta engines for this article relatively late, and so there has not been much time to explore its capabilities, but it appears to have a lot to recommend it. The other meta-search engines, however, did not seem to do as well as a good single engine, like Google, and since it is harder to craft a really focused search phrase, there seems to be no compelling reason to use one of these engines. The conservative advice is still to choose one comprehensive search engine, like Google, FAST, or NorthernLight, and learn it well; how strong a case can be made for Vivisimo will be examined in later articles in this series.


Return to The Alchemist's Lair Web Site Home Page

Return to Web Tutorial Home Page at the Alchemist's Lair.

Return to the Fall 2001 issue of the Computers in Chemical Education Newsletter.)

You are the visitor to the Alchemist's Lair site since Jan. 10,1997.