5. Conclusions
The design and implementation of a portable meeting recorder was presented. Even though today's prototype requires a small PC and is not easily moved, it is an excellent test bed for the development of the algorithms that will be required when suitably small devices become available in the near future. We also described novel algorithms developed for meta data extraction, including a four-channel sound localization technique, a view selection method, and a meeting location recognition technique. A meeting viewer interface (MuVIE) was described that displays the meta data, the transcript, as well as views of audio and video activity in a meeting. It allows users to easily find information in a recorded meeting and helps overcome the natural reluctance of people to search for information in medium that's difficult to navigate.
The prototype system has been in regular use in our lab for nearly five months. The reliable capture system, coupled with a web-based retrieval interface, has provided data that's easy to use and apply to common office-related applications.
6. References
| [1] | Foote, J. and Kimber, D., "FlyCam: Practical panoramic video and automatic camera control," Proceedings of International Conference on Multimedia & Expo, vol.3, pp.1419-1422, 2000. |
| [2] | Gross, R., Bett, M. Yu, H., Zhu, X., Pan, Y., Yang, J., Waibel, A., "Towards a multimodal meeting record," Proceedings of International Conference on Multimedia and Expo, pp. 1593-1596, New York, 2000. |
| [3] | Sun, X., Foote, J., Kimber, D., and Manjunath, "Panoramic video capturing and compressed domain virtual camera control", ACM Multimedia, pp. 229-238, 2001. |
| [4] | Rui, Y., Gupta, A., and Cadiz, J., "Viewing meetings captured by an omni-directional camera", ACM CHI 2001, pp. 450-457, Seattle, March 31- April 4, 2001. |
| [5] | Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., and Zechner, K., "Advances in automatic meeting record creation and access", Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 597– 600, 2001. |
| [6] | Hauptmann, A. G., and Smith, M., "Text speech and vision for video segmentation: The informedia project," Proceedings of the AAAI Fall Symposium on Computational Models for Integrating Language and Vision, 1995. |
| [7] | Maybury, M., Merlino, A., and Rayson, J., "Segmentation, content extraction and visualization of broadcast news video using multistream analysis", AAAI, 1997. |
| [8] | Myers, B. A., Casares, J. P., Stevens, S., Dabbish, L., Yocum, D., and Corbett, A., "A multi-view intelligent editor for digital video libraries", Joint Conference on Digital Libraries, Roanoke, VA, June 24-28, 2001. |
| [9] | Foote, J., Boreczky, J., Girgensohn, A., and Wilcox, L., "An intelligent media browser using automatic multimodal analysis", ACM Multimedia, pp. 375-380, 1998. |
| [10] | Lee, D. "Segmenting People in Panoramic Meeting Videos Using Gaussian Mixture for Background and Object Modeling," submitted to ICME 2002. |
| [11] | Stauffer, C. and Grimson, W.E.L, "Adaptive Background Mixture Models for Real-Time Tracking," Proceedings of Computer Vision and Pattern Recognition, pp. 246-252, 1999. |
| [12] | Gross, R., Yang, J., Waibel, A., "Face Recognition in a Meeting Room", IEEE International Conference on Automatic Face and Gesture Recognition, pp. 294-299, 2000. |
| [13] | Hsu, R.L., Abdel-Mottaleb, M., and Jain, A. K., "Face detection in color images", Proc. International Conference on Image Processing , pp. 1046-1049, 2001. |
| [14] | Yang, M.H., Kriegman, D.J., Ahuja, N., "Detecting Faces in Images: A Survey", PAMI(24), No. 1, pp. 34-58, January 2002. |
| [15] | Kapralos, B., Jenkin, M., Milios E., and Tsotsos, J.: "Eyes 'n Ears Face Detection", 2001 International Conference on Image Processing, vol 1, pp. 66-69, 2001. |
| [16] | Abdel-Mottaleb, M. and Elgammal, A., "Face Detection in complex environments from color images,'' IEEE ICIP, pp. 622-626, Oct. 1999. |
| [17] | Yang, J., Zhu, X., Gross, R., Kominek, J., Y. Pan, Waibel, A., "Multimodal People ID for a Multimedia Meeting Browser," Proceedings of ACM Multimedia, pp. 159-168, 1999. |
| [18] | Pingali, G. S., Opalach, A., Carlbom, I., "Multimedia retrieval through spatio-temporal activity maps", ACM Multimedia, pp. 129- 136, 2001. |
| [19] | Divakaran, A., Vetro, A., Asai, K., Nishikawa, H., "Video browsing system based on compressed domain feature extraction", IEEE Transactions on Consumer Electronics, vol. 46, pp. 637 - 644, 2000. |
| [20] | Erol, B., Kossentini, F., "Local motion descriptors", IEEE Workshop on Multimedia Signal Processing, pp. 467-472, 2001. |
| [21] | Dorai, C., Kobla, V., "Perceived visual motion descriptors from MPEG-2 for content-based HDTV annotation and retrieval", IEEE 3rd Workshop on Multimedia Signal Processing, pp. 147-152, 1999. |
| [22] | Sun, X., Divakaran, A., Manjunath, B.S., "A motion activity descriptor and its extraction in compressed domain," Proc. IEEE Pacific-Rim Conference on Multimedia (PCM '01), pp. 450-457, 2001. |
| [23] | ISO/IEC JTC1/SC29/WG11, "Multimedia Content Description Interface - Part 3 Visual". Publicly available at http://mpeg.telecomitalialab.com/working_documents.htm, March 2001. |
| [24] | Aramvith, S., and Sun, M.T., "MPEG-1 and MPEG-2 video standards", Handbook of Image and Video Processing, pp. 597-610, Academic Publishers, 2000. |
| [25] | ISO/IEC, "Information technology - generic coding of moving pictures and associated audio information: Video," 13818-2, 1995. |
| [26] | Arons, B., "Speech skimmer: A system for interactively skimming recorded speech", ACM Transactions on Computer-Human Interaction, vol 4, pp. 3-38, 1997. |
| [27] | Pfau, T., Ellis, D.P.W., and Stolcke, A., "Multispeaker Speech Activity Detection for the ICSI Meeting Recorder", Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 2001. |
| [28] | Kimber, D., and L. Wilcox, L., "Acoustic segmentation for audio browsers," in Proc. Interface Conference. Sydney, Australia, 1996. |
| [29] | A. Tritschler and R. Gopinath, "Improved Speaker Segmentation and Segments Clustering using the Bayesian Information Criterion", Proc. of Eurospeech, pp. 679-682, 1999. |
| [30] | Johnson, S.E., "Who Spoke When? - Automatic Segmentation and Clustering for Determining Speaker Turns", Proc. Eurospeech, Vol. 5, pp. 2211-2214, 1999. |






