The Reader's Helper: A Personalized Document Reading Environment

THE READER'S HELPER

The Reader's Helper (RH) application integrates existing technology—a WWW browser, highlighting key words, and probabilistic reasoning—with a unique information visualization tool in support of readers who read both online and paper documents. The RH uses information about each reader in evaluating the content of a document. It calculates a relevance value to determine if a document is applicable to a reader. The current prototype is composed of a specialized WWW browser and an annotation agent responsible for recognizing the reader's topics of interest in a document. The way the annotation agent understands what is important to the reader is through a reader profile which contains personal information about the reader. In the following sections the electronic document browsing environment is described in terms of the user interface and followed by a description of the paper-based version of the RH.

The User Interface

One of the most important aspects of an electronic document reading environment is the user interface. The difficulty of reading electronic documents is well known [18]. Readers therefore often rely on the printed document because of the high resolution and flexibility offered by paper. A design goal for the RH was to provide the reader with an easy to use interface that emphasizes the relevant content of the document, and, as a consequence, personalizes the document. This should not only increase the appeal of reading electronic documents but also the efficiency with which they can be consumed. There are three main methods that work in concert to improve the readability of an electronic document. First, a relevance score is computed for each topic that is important to the reader with respect to a document. Each topic score is an estimation of the relevancy of the document to the reader, offering a first appraisal of the document. This is directly associated with the horizontal reading trend mentioned earlier: most readers do not have time to completely analyze all of the documents they must process and so it is desirable to have a quantitative measure of each document's relevancy. Second, a new information visualization tool, called the Thumbar, shows an overview of both the document and the results of annotating the document. Hence, the reader can quickly navigate to the more relevant parts of a document based on the visual information present in a thumb-nail. Thirdly, after the RH has completed the analysis of a document, it is automatically annotated to depict the most relevant portions of the document. As the reader reads the document, key phrases are pointed out by way of highlighting to help guide the reader through the text.

The Thumbar

Figure 1 shows the RH document browser displaying a HTML source document (from the CHI '97 Online Proceedings [19]). On the left hand side of the display is the Thumbar ( and ). The Thumbar is a unique information visualization tool whose name is derived from the concept of a thumb-nail image combined with the functionality of a scrollbar. The reader may drag a lens up and down to reposition the view of the document in the document area in the manner of a scrollbar. The Thumbar allows readers to look ahead in the text to see more of the document's structure and content. By using a Thumbar, a reader can quickly scan the document for a desired image, chart or even text structure and scroll accordingly. The work by [22] supports the notion that humans are very good at recognizing small images. Recent work by [20] further exploits this concept in a system where icons of documents are used as the retrieval cues to a large document database. It was found that users could recognize the thumb-nail images of documents (based on the text structure and the images in the text) to retrieve several documents which "looked" like the document they were seeking.

The Thumbar is a dynamic representation of the document contained in the browser. Its contents and shape change as the document itself is personalized. For instance, after annotating a document, the Thumbar presents relevant keyword phrases as red lines (instead of the typical gray or black lines). This clearly indicates to the reader where the relevant information is located in the document, similar to the way attribute-mapped scroll-bars [24], TileBars [10] and the Mural [12] depict relevant information in a document. This visual information, however, can be changed. For example, by "turning off" a concept that has scored well in the document, the red lines in the Thumbar change back to the normal gray or black lines as if there were no annotation at that location at all. Thus, the reader can create a representation of the document based on a combination of concepts either turned on or off. Another way of altering the annotation is by using the sensitivity threshold meter which can be used to manipulate the concepts that are active in the document. This is done by setting a threshold and only allowing concepts whose similarity scores equal or surpass this threshold to be visible as annotations.

As with any HTML document, there is no formal pagination as long as the document is contained in the browser (the pagination is set when the document is sent to the printer). Instead, there is simply the concept of "a screenful" of the document which can change if the user resizes their browser window. The Thumbar represents a reduced version of the original document based on a user defined reduction ratio (e.g. in figure 1, the reduction ratio is 6 which means that the Thumbar is 1/6th the size of the document area ). The entire representation of text in the Thumbar is presented using a proportionally reduced line for each word. This method of portraying a navigable thumb-nail of text is similar to [4], used for software visualization.