What's New in Online Search for Chemists - 2007

Part of The Alchemist's Lair Web Site
Maintained by Harry E. Pence, Professor of Chemistry, SUNY Oneonta, for the use of his students. Any opinions are totally coincidental and have no official endorsement, including the people who sign my pay checks. Comments and suggestions are welcome (pencehe@oneonta.edu).

Last Revised September 25, 2007

YOU ARE HERE> Alchemist's Lair > Web Tutorial > What's New in Online Search for Chemists 2007

Return to the contents page of the Fall, 2007 issue of the Computers in Chemical Education Newsletter.)

What's New in Online Search for Chemists 2007, Harry E. Pence, SUNY Oneonta, Oneonta, NY

Introduction
When the previous article in this series appeared last fall, it appeared that the world of search engines was moving towards a battle of epic proportions. Yahoo! and Microsoft both seemed to be maneuvering to challenge the dominance of Google. Yahoo! had purchased several major search engines and was expected to leverage the technical knowledge that it had obtained to produce a Google-killing engine, and Microsoft was reported to be planning to spend whatever it took to achieve dominance in the search arena. Instead of a cliff-hanging battle, however, the competition has moved with the predictability of a WWE wrestling Smackdown.

According to a recent web report that summarized the results from three major firms than attempt to measure Internet traffic, Google has continued to expand its share of Internet search to between 50 and 70%, depending on the rating service. Meanwhile, Yahoo! has managed to hold its own at 15-28% and Microsoft is falling further behind with ten percent or less. The remaining competitors, like Ask and AOL, have also lost market share. In fact, the failure of Yahoo! and Microsoft seems to have contributed to the resignation of the CEO of Yahoo! and rumors that these two firms might form some sort of business partnership to better oppose the dominance of Google.

By no means does this suggest that these companies have just been marking time. Google continues to purchase small companies to add to its menu of search options, and has recently nnounced its own presentation software. In April, 2006, Microsoft introduced a new search tool for on-line scholarly articles called Windows Live Academic Search. The initial response seems to be that the Microsoft response to Google Scholar is too little and too late. A cursory examination by this author suggested that there was little here for chemists, but this is to be expected since Windows Live Academic Search is focused on papers in physics, electrical engineering, and computer science. Microsoft may soon catch up, but past experience makes one skeptical about whether it can accomplish this. There are a number of other changes that are being discussed by the various search engines, but since these will probably not have a direct impact on chemistry searchers for some time, the remainder of this article will focus on topics that may be of more immediate interest.

Personalized Search
Personalized search is receiving a great deal of attention because it may represent a new, improved way to determine relevance of search results. At present, the results returned by a search engine depend only on the search term or phrase used, not on the individual user making an inquiry. A single search word or even a phrase is often too ambiguous to serve as the basis to effectively determine the most relevant web sites. Some engines, like Google, already suggest a list of similar searches that have been performed when the searcher types the search term into the query box, but this is not usually considered to be true personal search. In personalized search, the engine “learns” from the past behavior of the searcher (or of a group that includes the individual searcher) what sites are most likely to be relevant.

There are many roadblocks on the road to developing personal search. Most people do not create narrowly focused search phrases, and it may take several hundred searches before an individual has clicked on enough sites to allow an algorithm to make a significant improvement in the relevance of the results. If someone else shares the computer, it may become impossible to ever arrive at a more focused search. With these serious problems, why is personalized search such a hot topic? The answer is that web advertizers are very interested in using personalized search to develop a customer profile that allows for more accurate targeting of advertizements. Some customers may not be enthusiastic about allowing a business to know this much about them, but, judging from the popularity of Amazon, there are many who are willing to exchange decreased privacy for improved service.

What advantages does personalized search offer for chemists? Chemists often search for very specialized topics. Those sites which are most relevant to the general population are not necessarily the most relevant to chemists. Of course, even chemists sometimes search for Britney Spears, but it would be helpful if a search engine recognized that when a chemist uses the search term, like radical, it does not refer to the followers of Marx and Lenin.

The most promising proposal is that a group of individuals with similar information needs (like a group of chemists) might work together to create a composite user profile that would become the basis for evaluating sites when determining relevance. At best this means that a chemist might form a group consisting of only inorganic chemists or medicinal chemists. Each time anyone in the group clicked on a web site listed in a search, this preference would be added to the database of those in the group. Over time, this should mean that searches by anyone in the group would become more focused on topics of interest to members of the group.

There are already two companies that claim to offer personalized searches, Eurekster and Findory. Both offer personalized search results based on the past search behavior of everyone who is identified as being part of a social network. At present, neither seems to have attracted the attention of enough chemists to determine whether or not the idea will really be useful (Added Note: Findory will shut down on November1, 2007.). Eurekster encourages users to create a customized search profile, called a Swicki, to focus the results of the search engine based on the behavior of a defined group of users (called an information nation), who have similar information needs. The good news is that this service is free and at this writing almost 900 Swickis have been created; the bad news is that only a few of these Swickis are related to chemistry and the ones that do exist do not seem to be particularly useful. Whether or not Eurekster in particular or personalized search in general will evolve into something that is useful to chemists remains to be seen, but the concept does seem to be worth following.

Federated Search

On the other hand, federated search is a topic that is arousing considerable interest among librarians, and may well have some immediate use for chemists. The basic idea of federated search is easy to explain, but, as is often the case, the devil is in the details. Most college libraries subscribe to a number of proprietary data bases, ranging from AnthroSource to Women’s Wear Daily. In general, these resources are provided by a variety of companies that seem unable to agree on a common set of search procedures, and sometimes seem more concerned with protecting their digital rights than with providing a useful service to the user. There may be a large number of these data bases, depending on the size of the library. These documents are not normally open to the web crawlers used by normal Internet search engines and so are sometimes called the invisible or deep web. The library user is expected to know which one or ones will be most useful and also to know the search syntax specific to each database.

Federated search software takes a single user query and sends it to a large number of these databases, then removes duplicate references, and presents the results in a unified format. As might be expected, the names of these programs run the spectrum from the simple, like Central Search, to the creative, like Agent Search or Deep Web. Two of the popular programs are called Web Feet and Web Feat just to magnify the confusion. (For more about federated search and webfeat, see an article in Information Today.) There is also an implementation of federated search that specializes in government documents, called Science.gov.

As anyone who has used these proprietary databases knows, they can be very helpful but also very confusing. Despite this, some librarians seem ambivilent about recommending federated search to library patrons, even when it is available. The main concern seems to be that by using a federated search facility instead of going to the individual databases, users will not recognize which database is best for a given topic and will not create searches that best fit the syntax required for each different database. In simple terms, some librarians seem to be concerned that users understand the array of available databases; many users just want to get answers to their questions. For anyone who does use these campus databases, it may be well worth the effort to ask if a federated search engine is available (since the existence of this service may not be well publicized).

RSS for Chemists

A search-related topic that chemists should find to be immediately useful is a feature called RSS (a mnemonic for Real Simple Syndication). Although it is not strictly speaking a search function, RSS allows one to automatically aggregate information from a variety of web sites into a single web page. Many different web sites, ranging from blogs to scientific journals, offer an RSS feed, usually indicated by an orange rectangle that contains the letters XML or RSS or an orange rectangle with the dot and the two white arcs. Even though RSS is relatively simple to set up and can be very useful, it does not appear to have been broadly adopted by chemists. Of course, the RSS journal feeds are most useful for those who have access to the online version of the journal and simply need to click through to the actual article.

In order to create an RSS desktop feed it is necessary to choose an aggregator, and then connect the desired feeds to the aggregator. The RSS aggregator allows an individual to subscribe to web sites, such as journals, blogs, newspapers, or magazines, then automatically monitors the web site and periodically provides information about new information on the site. , I would recommend NetVibes, but there are many other excellent aggregators available (http://www.newsonfeeds.com/faq/aggregators). In addition, some search engines, such as Yahoo! and Google, have added search aggregator features. NetVibes works with both Mac and PC computers, is free, and creates a personal page with not only RSS feeds but also a number of other features that create a very attractive, customized information space. To install NetVibes, go to the homepage (www.netvibes.com) and follow the instructions.

Once the aggregator is installed, there are two main ways to set up a feed. In many cases, clicking on the orange RSS or XML button will lead to a set of aggregator choices, and it is only necessary to select NetVibes. For example, this procedure works with the list of ACS journals found on the web. In some cases, it is more difficult to set up an RSS feed. One such example is Science, and this demonstrates how to use the “cut and paste” command to establish a feed. Clicking on the orange button on the Science page, leads to a page of complicated-looking code instead of friendly buttons. The entire page is the actual feed. It is not necessary to worry about this code, simply highlight the URL for the entire page and copy it to the clipboard. Open the NetVibes page and click on the text “add a feed” in the upper left hand corner. This will open a window that says “add a URL.” Paste the URL that was just copied into this window, click on the ADD button, and the selected feed will appear on the NetVibes page. Goggle News and some other search engines make it possible to set up ongoing automatic searches with RSS feeds. It is easy to set up an RSS feed that will continually update the search on a term such as acid rain. Go to the Google news site and click on the “advanced news search.” Insert the phrase in the advanced search window labeled “exact phrase” and click Google search. Click on the usual RSS button then cut-and-paste the URL for the page of complicated code that appears. This RSS feed will now continually offer updates the latest news articles about acid rain.

The Semantic Web

When Tim Berners-Lee originially created the World Wide Web, he envisioned that it would not just be accessaible to humans but also to computers. He believed, and still does, that web sites should not just be written in normal language, but also in a format that computers could readily manipulate. Berners-Lee believes that we are moving towards a semantic web, a situation where all the data on the web would be written in a form that was readily accessible to software agents, so that computers could manipulate and analyze that data. If this vision comes to pass, it would affect many aspects of our lives, especially scientific publishing.
How would scientific publishing change if all scientific articles were available on line as part of a semantic web? It would almost certainly provide a better integration between the traditional articles and the large scientific databases that are becoming so important in many disciplines. Even more importantly, it would create journals that changed with time, rather than the fixed publications of today. As new work appears, references and comments could be added to articles on that topic which had been published previously. Thus, the scientific literature would become a constantly evolving narrative representing many different voices instead of the unchanging format that currently exists. There is not enough time in this brief report to even scratch the surface of what this might mean, but Declan Butler has written at some length about this topic (including references) and the interested reader should refer to his article, which is available online at http://www.nature.com/nature/debates/e-access/Articles/opinion2.html

Closing Thought

One of the problems with only writing one column a year is that change is moving so fast on the web that it is difficult to select only a few topics that are potentially most interesting to chemists. Hopefully, this article has provide a little grist for everyone’s mill, ranging from the immediately practical (RSS) to the possible future developments. I would like to close with a question inspired by danah boyd, one of my favorite web commentators. Boyd says that traditional publishing models assume that knowledge can be congealed for future consumption in physical artifacts, like books, but this is no longer the case. Web sources make information fluid and flexible; that which is there today may be gone or changed tomorrow. The semantic web promises to make information sources even more mutable. This will surely change the way that we access and use information. How will this change the way we do research, and most importantly, does this change the way we teach?

Return to the contents page of the Fall, 2007 issue of the Computers in Chemical Education Newsletter.)

Return to The Alchemist's Lair Web Site

Return to Web Tutorial Home Page .