Currently, our system requires the document scan be performed using the automatic feed of the scanner, instead of using manual page-by-page scan. This ensures that all scanned pages of a presentation handout are placed in a single pdf file. As a side benefit, the scanned pages have minimal skew. Nevertheless, if page by page scan is desired, skew correction should be performed prior to segmentation.

3.2 Template Matching

In this step, segmented slide candidate regions are compared against 6 commonly used presentation handout layouts: 1, 2x1, 3x1, 2x2, 2x3, and 3x3 slides per page. Each scanned page is represented by a feature vector, , where Pn is the page n, mn is the number of slide candidate regions on Pn , and , where and are the normalized width and height of a candidate slide region on page n and, and are the x and y coordinates of the slide region relative to and , which are the coordinates of first region in the raster scan order. The same feature vector is also computed for each handout template . Then, a directed distance between S(Pn) and S(Tx ) is computed as follows:

Some scanned pages, particularly the last pages, may contain fewer slides than that of the template Tx. Using the above distance measure, instead of a correlation based measure for example, ensures robust matching of scanned pages to the correct template regardless of the number of slide regions on a page. The matching handout template is found as the template that has the smallest directed distance to the scanned page. For a given input document, if 2/3rd of the pages match to the same template, the scanned document is identified to be a valid presentation handout. In that case, each slide in the handout is segmented out and numbered based on the page number and the raster scan order. This results in a collection of scanned slide images, . These images are used for presentation and slide matching as described in the following sections. If the input document does not match to any of the handout templates, then further processing is not applied and the server e-mails the scanned document to the user, without modifying its contents.

3.3 Presentation Linking

In order to retrieve the presentation recording session where the scanned handouts were presented, we employ n-gram matching. Scanned slides are OCR'd and n-grams are formed with words that contain at least 4 characters. The n-grams are compared to n-grams of the text extracted from each recording session in the database. More details of this algorithm can be found in [8]. If there is more than one matching presentation recording, which may occur if the same slides were presented in more than one presentation session, then the recording with the most recent date and time is selected as the matching presentation.

3.4 Mapping of Slides with Handwriting

The segmented slide images from scanned handouts, {S1, S2, ...,Sn} are mapped to the slide images captured by the Presentation Recorder in session . Since each PR image,, is time-stamped with tm at capture time, we can determine when each slide on the scanned handout was presented. Using this information, each slide is linked to the video stream.

We employ an edge histogram-based slide matching algorithm for finding matching 's to Sn. In [8], we employed a similar slide matching algorithm to map PowerPoint slides to PR images, which yielded a high accuracy. Here, we map scanned slide images to PR images. In this case the difficulty is increased because of image degradation caused by printing and scanning and the existence of handwritten annotations.

To improve the slide matching accuracy, we identify regions that potentially contain handwriting and exclude them from the matching process. Given a slide image, text-like regions are identified by finding strong edges with the Canny edge detector, smearing the edges with a 64x2 smearing filter, thresholding, and performing connected component analysis. The connected components that do not possess a specified height and width ratio are filtered out as non-text regions. Usually, text regions with handwriting are less horizontal than machine-printed text regions. Furthermore, because letters are connected in handwriting more often than in machine print, the average height-to-width ratio of connected components in a handwritten text region is much smaller than that of machine print. Motivated by these, we compute fitted line Li in the direction of the text region spread. We also compute and , where N is the number of text boxes, hi is the height, wi is the width, and nci is the number of connected components (corresponding to letters) in text box i, respectively. Finally, the text boxes that do not have horizontal spread, , or that have low height to weight ratio, , are marked as handwriting text regions. An example of handwriting detection in a slide image is given in Figure 4.

Figure 4

The detected handwriting regions are ignored during slide matching, yielding a significant improvement in the matching accuracy. Note that our method cannot be used to detect other user markings such as arrows, lines, etc. Nevertheless, since our method is based on edge histograms, these markings do not affect matching accuracy as much as the edge-dense handwriting segments.

3.5 Smart Handout Composition

Once each slide on a scanned handout is mapped to one or more images captured by the Presentation Recorder, the final pdf document is composed by including video key frames and media links. Recall that the PR images are time-stamped and they are For review purposes only. Further dissemination of the content is prohibited. synchronized with the video. First, video frames corresponding to these times are extracted from the video stream. Then, template matching results are used for determining the optimum positions for inserting video key frames in the scanned handouts. For each given handout template, the preferred locations for video key frames are designated in advance. Before inserting a key frame in the pdf file, luminance variance analysis is applied to the region to detect the presence of user's markings. If the luminance variance is lower than 16, then the key frame is inserted as an opaque image, if it is higher than 16, then, the key frame is inserted with 50% transparency so that the user's markings are visible. The Adobe Acrobat SDK is used for inserting key frames and links in the pdf file [9].