System Description

The overall system architecture is depicted in Figure 1. Capture and processing modules run on a stand-alone OBlog server, in order to avoid using resources from the user's computer. The server controls the capture, compression, classification, linking, post-processing, and distribution of the multimedia information

figure 1
Figure 1. Overivew of OBlog system architecture

The OBlog system stores and indexes digital images, which are uploaded from a digital camera's memory, audio/video in the office, the user's PC screen, and the user's text annotations. This multimedia data is stored in a database on the OBlog server and is browsable via a web interface. In addition, our system supports easy incorporation of printed documents and captured meetings, which reside in outside databases, into the user's blogs.

OBlog supports classification and indexing of a user's digital images. Digital images are playing an increasingly important role in the capture and sharing of visual information in the work place. For example, an office worker may use a digital camera to capture the contents of a whiteboard, a document, information about a set of slides, a business card, or a scene with other people. Workplace studies have shown that people would use a digital camera in the office to capture these kinds of images if a camera were available [4].

However, organizing these images remains a challenge. In order to solve this problem, OBlog integrates a novel business image classification algorithm, described in Section 3, that identifies document images, whiteboard images, business card images, slide images, and regular images. Using the OBlog interface, users can search or browse images in different categories and easily incorporate them in their blogs. Moreover, based on the classification results, these images are automatically cross-linked to printed documents and meeting recordings.

Printed documents are automatically captured and stored on the user's personal computer as depicted in Figure 1. The OBlog server automatically links the captured multimedia to the printed documents by matching their time-stamps (i.e., print time and capture time) to the times when audio or visual recordings and PC screen images were captured. PDFs of printed documents are also linked to the document images captured by a digital camera via content matching. This is achieved by first OCR'ing the document images, forming horizontal and vertical n-grams (n=2), and matching these n-grams to those of electronic documents in the database. This matching method identifies the original electronic documents with high accuracy if the document image contains sufficient OCR'able text.

In the OBlog system, automatic linking of document images to electronic documents enables many useful features. The document images captured during a meeting can be used to access a pdf document by just clicking on the document image. Similarly, the document images or the electronic document can easily be added to a blog. Moreover, when a user prints a document image from the OBlog, a print dialog box is displayed that gives user the option of either printing the document image or the entire pdf document that it is linked to, making access to the content seamless.

Meetings and presentations are captured by a system that resides in our conference room [5]. The audiovisual recordings and slides are stored on a database that is accessible by OBlog servers as shown in Figure 1. When a user captures slide images with a digital camera during a presentation and uploads these to the OBlog server, the slide images are automatically linked to the presentation recordings by matching the contents of the image to the contents of the slides captured during presentation [6]. This way, a user can easily access presentation recordings by just clicking on the pictures she captured during that presentation.

Office events are captured by a USB camera and microphone system that is mounted on the wall of the office and connected to the OBlog server. The OBlog server is started manually to continuously capture audiovisual data in real-time. In order to detect real office conversations, we perform a simple audiovisual analysis. We detect motion in the room by computing the differences in consecutive frames, and conversations in the room by measuring the average amplitude of the audio in 30 second time-frames. If either of these measures exceeds a threshold, an office event is detected and the first frame of the event is saved as a key frame. Note that as people move around, a single office conversation may generate many events and key frames. The part of the video recording that does not contain events is spliced out from the video stream and discarded. In the blogging interface, key frames are presented to the user on the timeline. The user can either include individual key frames in their blog, which are linked to the start time of the video recording, or specify a range to incorporate into the blog. Currently, a range can be selected by specifying a start and stop key frame.

A user's computer screen is recorded by tapping into the VGA output of the user's computer and capturing it with a video capture card on the OBlog server. The screen images are saved when there is a significant change on the screen. The captured screen images are OCR'ed and indexed by key words.