Image Classification

In this section, we describe the key technology component of the OBlog system, which is the automatic image classification algorithm that allows user to easily browse business images and enables automatic linking of images to the right content. The classification algorithm organizes the business images captured with a digital camera into the following five groups: document images, whiteboard images, business card images, slide images, and regular images.

3.1. Feature Extraction and Classification

First, the text regions are identified, skew correction is performed, and the text regions are binarized by adaptive thresholding. A connected component height histogram is computed from the text regions, which is useful for separating machine print from hand writing. Based on the component height histograms, we compute 2nd and 3rd order X moments and the spread of histogram bins. These features are useful for differentiating whiteboard images from document images.

Because letters are connected in handwriting more so than that of machine print, we also compute the average height-to-width ratio of connected components. Whiteboard images typically have a low connected component height-to-width ratio in text regions, where document, slide, and business card images have a high ratio.

Text is extracted using a commercial OCR package. Then a number of text features, such as text confidence, the number of characters, the ratio of words starting with capital letters to the number of words, the ratio of words starting with numerical characters to the number of words, the ratio of number of text lines starting with a bullet point to the total number of text lines, etc. are computed. Such features are useful to discriminate documents, business cards, and slide images. Layout features, obtained by horizontal and vertical projections of text lines, and text and background colors are also computed and used as features. For classification, all the image features are normalized and classified with Support Vector Machines [7]. SVM is a binary classifier. In order to achieve a multi-class classifier, we train an SVM classifier for each semantic class pair. For example, business card images vs. document images, regular images vs. document images, and so on. This results in 10 SVM classifiers. An input image is assigned to a semantic class if 4 of these SVM classifiers agree on the same decision.

3.2. Performance

We evaluated the performance of our classifier on a database of 1025 business images. Our database contained 151 documents, 132 business cards, 278 slides, 80 whiteboards, and 384 regular images captured during conference or tradeshow trips. Regular images include both indoor images and outdoor or scenery images.

Ten SVM classifiers were trained with 30 images from each class. The classification results are presented in Table 1. Here, precision is the ratio of correctly labeled images in a given category to all labeled images as being in that category, and recall is the ratio of correctly labeled images in a given category to all images in that category. As can be seen from the table, regular, slide, and whiteboard images were identified with 95% or higher accuracy, 94% of document images, and 92% of business card images were correctly labeled. In classifying document images, our observation was that the black and white documents are very accurately classified and misclassification happened mostly on images of colored magazine pages that contain large fonts and many photos. Moreover, some of the regular images taken at conferences that have posters in the background are misclassified as whiteboard images, which yield to a lower precision score for the whiteboard category. In the future, a separate class may be considered for images with posters. The overall correct classification rate of our business image classifier is above 95%.

Table 1. Business image classification results
Image ClassN of Images in DatabasePrecisionRecall
Document151%100%94
Business Card132%87%92
Slide278%100%96
Whiteboard80%80%95
Regular384%97%99