2. System Description
The system architecture for the meeting recorder is shown in . The hardware configuration consists of a special capture device, a touch screen monitor, as shown in Figure 2, and a PC. The capture device is composed of an omni-directional camera in the center and 4 microphones positioned at the corners. The camera has a parabolic mirror that captures a panoramic view of the meeting in a single donut video stream. Audio signals are fed into a multi-channel sound card and processed in real-time to determine the direction of speakers. The results are postprocessed to produce a meta file that controls playback. The video, along with digitally mixed stereo audio, is sent to a video capture card and recorded as an MPEG-2 file. Encoding is done at 640x480 at 30fps.
Recording is controlled via a simple VCR-like interface on a 6.5-inch color touch screen panel (see Figure 2). When recording is started, the interface shows the amount of time recorded, the time left on the hard-drive, and a video preview window. Every recording session is automatically assigned an ID number. When recording is stopped, the results of sound localization and the video are post-processed to produce a meta file.
The results of sound localization are processed to produce viewing parameters for a virtual camera. A special viewer, shown in Figure 4, uses these parameters to automatically center on the speaker during playback. However, users can manually control the view using pan, tilt, and zoom operations. A compass at the bottom of the display shows the orientation of the current view with respect to the entire panorama.

Figure 2. An overview of the meeting recorder system.

Figure 3. A touch screen controlled meeting recorder.
The data on speaker directions is also used in combination with skin detection to extract face images of meeting participants. Background images are extracted from the video to identify the meeting location. This information is displayed in a meeting description document in HTML format along with user added annotations. The audio is further analyzed to detect significant events. Based on motion analysis performed on the compressed data stream, events involving large spatial activities are identified. All the information associated with the meeting is written to a meta file. The video and meta file are archived and made available on a database server.

Figure 4. Meeting viewer interface.






