Currently, our system requires the document scan be performed using the automatic feed of the scanner, instead of using manual page-by-page scan. This ensures that all scanned pages of a presentation handout are placed in a single pdf file. As a side benefit, the scanned pages have minimal skew. Nevertheless, if page by page scan is desired, skew correction should be performed prior to segmentation.
3.2 Template Matching
In this step, segmented slide candidate regions are compared
against 6 commonly used presentation handout layouts: 1, 2x1,
3x1, 2x2, 2x3, and 3x3 slides per page. Each scanned page is
represented by a feature vector,
,
where Pn is the page n, mn is the number of slide candidate
regions on Pn , and
, where
and
are the normalized width and height of a candidate slide region
on page n and,
and
are the x and y coordinates of the
slide region relative to
and
, which are the coordinates
of first region in the raster scan order. The same feature vector
is also computed for each handout template
. Then, a directed distance between
S(Pn) and S(Tx ) is computed as follows:

Some scanned pages, particularly the last pages, may contain
fewer slides than that of the template Tx. Using the above
distance measure, instead of a correlation based measure for
example, ensures robust matching of scanned pages to the
correct template regardless of the number of slide regions on a
page. The matching handout template is found as the template
that has the smallest directed distance to the scanned page. For a
given input document, if 2/3rd of the pages match to the same
template, the scanned document is identified to be a valid
presentation handout. In that case, each slide in the handout is
segmented out and numbered based on the page number and the
raster scan order. This results in a collection of scanned slide
images,
. These images are used for
presentation and slide matching as described in the following
sections. If the input document does not match to any of the
handout templates, then further processing is not applied and the
server e-mails the scanned document to the user, without
modifying its contents.
3.3 Presentation Linking
In order to retrieve the presentation recording session where the scanned handouts were presented, we employ n-gram matching. Scanned slides are OCR'd and n-grams are formed with words that contain at least 4 characters. The n-grams are compared to n-grams of the text extracted from each recording session in the database. More details of this algorithm can be found in [8]. If there is more than one matching presentation recording, which may occur if the same slides were presented in more than one presentation session, then the recording with the most recent date and time is selected as the matching presentation.
3.4 Mapping of Slides with Handwriting
The segmented slide images from scanned handouts, {S1, S2, ...,Sn} are mapped to the slide images captured by the
Presentation Recorder in session
. Since
each PR image,
, is time-stamped with tm at capture time, we
can determine when each slide on the scanned handout was
presented. Using this information, each slide is linked to the
video stream.
We employ an edge histogram-based slide matching algorithm
for finding matching
's to Sn. In [8], we employed a similar
slide matching algorithm to map PowerPoint slides to PR
images, which yielded a high accuracy. Here, we map scanned
slide images to PR images. In this case the difficulty is increased
because of image degradation caused by printing and scanning
and the existence of handwritten annotations.
To improve the slide matching accuracy, we identify regions
that potentially contain handwriting and exclude them from the
matching process. Given a slide image, text-like regions are
identified by finding strong edges with the Canny edge detector,
smearing the edges with a 64x2 smearing filter, thresholding,
and performing connected component analysis. The connected
components that do not possess a specified height and width
ratio are filtered out as non-text regions. Usually, text regions
with handwriting are less horizontal than machine-printed text
regions. Furthermore, because letters are connected in
handwriting more often than in machine print, the average
height-to-width ratio of connected components in a handwritten
text region is much smaller than that of machine print.
Motivated by these, we compute fitted line Li in the direction of
the text region spread. We also compute
and
, where N is the number of text boxes, hi is the
height, wi is the width, and nci is the number of connected
components (corresponding to letters) in text box i, respectively.
Finally, the text boxes that do not have horizontal spread,
, or that have low height to weight ratio,
,
are marked as handwriting text regions. An example of
handwriting detection in a slide image is given in Figure 4.

The detected handwriting regions are ignored during slide matching, yielding a significant improvement in the matching accuracy. Note that our method cannot be used to detect other user markings such as arrows, lines, etc. Nevertheless, since our method is based on edge histograms, these markings do not affect matching accuracy as much as the edge-dense handwriting segments.
3.5 Smart Handout Composition
Once each slide on a scanned handout is mapped to one or more images captured by the Presentation Recorder, the final pdf document is composed by including video key frames and media links. Recall that the PR images are time-stamped and they are For review purposes only. Further dissemination of the content is prohibited. synchronized with the video. First, video frames corresponding to these times are extracted from the video stream. Then, template matching results are used for determining the optimum positions for inserting video key frames in the scanned handouts. For each given handout template, the preferred locations for video key frames are designated in advance. Before inserting a key frame in the pdf file, luminance variance analysis is applied to the region to detect the presence of user's markings. If the luminance variance is lower than 16, then the key frame is inserted as an opaque image, if it is higher than 16, then, the key frame is inserted with 50% transparency so that the user's markings are visible. The Adobe Acrobat SDK is used for inserting key frames and links in the pdf file [9].






