3. System Description
Many different types of time-based media objects, such as video, audio, face, text, and other animations can be converted into a static representation. In this section we describe the encoding and decoding process when the media object type is a video sequence.
Figure 2 shows an overview of the process for generating a static representation for video. Our implementation is based on an MPEG-4 codec, however, any compression scheme that uses reference frames and predictive coding can be employed. First, a video sequence is MPEG-4 coded such that there is only one reference (I-) frame. The reference frame is already a 2D representation, so it can be printed. The MPEG-4 stream is then processed to take out the bits representing the I-frame, which we refer to here as an MPEG-4 bitstream*.
In order to represent the MPEG-4 bitstream*, we employ QR codes [8] that can encode up to 2,953 bytes of binary data. Depending on the barcode size and the maximum number of barcodes that can be printed, only a limited number of bits, BBITS, can be used to represent binary data. The barcode contains HEADER information besides MPEG-4 bits. The HEADER, which is of length HBITS, includes an identifier which is recognized by our application and other information such as a chroma key for segmentation and relative location of the reference frame. If the sum of MPEG-4 bitstream* bits and HBITS is larger than BBITS, MPEG-4 encoder parameters are set such that there is a larger quantization value for P frames and a reduced frame rate. The video clip is re-encoded until the desired MPEG-4 bitstream* length is achieved. At the end, the HEADER and MPEG-4 bitstream* are encoded in the QR code[8].

Figure 2. Creation of a stand alone image-based representation from video.
An overview of the decoding and reconstruction process at the client device is presented in Figure 3. First, an image of the QR code and the reference frame is captured. Then the QR codes are located and decoded. If the QR code has an invalid header then the decoding is terminated, otherwise the MPEG-4 bitstream* headers are decoded to obtain video resolution.
Next, the reference frame is segmented from the captured image. Since all the motion and prediction error information is computed based on the reference frame, it is critical to correctly register the reference frame for good playback accuracy. Several methods can be employed for segmenting the reference frame. One method is to print distinct markers at the corners of the key frame as shown in Figure 4.a. Another method is to perform chroma keying by printing a unique color around the reference frame, as illustrated in Figure 4.b. Alternatively, as shown in Figure 4.c QR codes can be placed around the reference frame and the reference frame location is obtained from the barcode decoding software. The HEADER indicates the method of segmentation as well as a chroma key value, the shape of markers and other information required for segmentation.
The reference key frame is segmented based on the method indicated in the HEADER, scaled to the size of the video resolution and dewarped in order to obtain the best representation for the reference (I-) frame. The I-frame and MPEG-4 bitstream* is then passed to a modified MPEG-4 decoder. The modified decoder is implemented based on the MPEG-4 specification with the difference being that the bits encoding the first reference frame are not decoded from the bitstream but obtained from another source. As a result, motion vectors and prediction errors are applied onto the image captured from paper to obtain full motion video.

Figure 3. Decoding video via rendering of printed key frame.

Figure 4. Several techniques for printing key frames that help registering the reference frame accurately: (a) Placing markers to the corner of the printed frame, (b) framing image with a chromakey color, and (c) positioning QR codes at the edges of the frame.






