Project 1

This is the page for Project 1. Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection.

Back to Portfolio

Introduction

A single image is consisted of three subimages, with each subimage containing pixel values for R, G, or B component. The task is to align three subimages so that one can recover the colored image from three subimages with the least misalignment as possible.

Method Overview

Metric: I used normalized cross-correlation (NCC), a dot product between two normalized image vectors, as a measure of alignment.

Optimization: For small-sized images, I used a brute-force search to find the best alignment. The search space is defined by three dimensions: given a reference image and a target image, how much x-translation / y-translation / rotation should be applied to the target image to maximize the alignment metric. The search space is discretized uniformly: x(y)-translation by 1 pixel, and rotation by 0.1 degrees. However, I empricially found out that the rotation does not affect the quality of result significantly, so I set the rotation to 0.0 degrees.

For large-sized images, I used a hierarchical approach by employing Gaussian pyramid. The idea is to downsample the images and find the best alignment at the coarser level. Then, the alignment is adjusted at the finer level by using the alignment at the coarser level as an initial guess. It can be interpreted as binary search of the parameter space with an assumption that the alignment metric is convex in the alignment parameter space.

Optimization detail: Some images, as provided in the Results section, exhibits some misalignment. To enhance the result, I used an edge instead of the raw pixel values. The edge image is obtained by applying a Sobel filter to the raw image. The edge image is more robust to the misalignment than the raw image, as the edge image is less sensitive to the absolute pixel values and thereby serve as a better indicator for alignment. In the Results section, for large-sized images, Sobel operater is used unless explicitly mentioned otherwise.

Also, I cropped the 5% edges from all four directions when calculating the alignment metric, as edge often contains boundaries that reduce the faithfulness of the alignment metric.

Results

Here are the results of the alignment for small-sized (.jpg) images.

cathedral.jpg

Cathedral Green shift: (x,y,rotation) = (2,5,0.0), Red shift: (3,12,0.0)

monastery.jpg

Monastery Green shift: (x,y,rotation) = (2,-3,0.0), Red shift: (2,3,0.0)

tobolsk.jpg

Tobolsk Green shift: (x,y,rotation) = (2,3,0.0), Red shift: (3,6,0.0)

Here are the results of the alignment for large-sized (.tif) images. Typical runtime per image is around 5 seconds in my local laptop. I could not find a case of significant misalignment.

church_use_edge.jpg

Church Green shift: (x,y,rotation) = (4,25,0.0), Red shift: (-4,58,0.0)

emir_use_edge.jpg

Emir Green shift: (x,y,rotation) = (23,49,0.0), Red shift: (40,107,0.0)

harvesters_use_edge.jpg

Harvesters Green shift: (x,y,rotation) = (17,65,0.0), Red shift: (13,123,0.0)

icon_use_edge.jpg

Icon Green shift: (x,y,rotation) = (17,41,0.0), Red shift: (23,90,0.0)

lady_use_edge.jpg

Lady Green shift: (x,y,rotation) = (9,56,0.0), Red shift: (13,119,0.0)

melons_use_edge.jpg

Melons Green shift: (x,y,rotation) = (10,80,0.0), Red shift: (12,177,0.0)

onion_church_use_edge.jpg

Onion Church Green shift: (x,y,rotation) = (25,52,0.0), Red shift: (36,107,0.0)

sculpture_use_edge.jpg

Sculpture Green shift: (x,y,rotation) = (-11,33,0.0), Red shift: (-26,140,0.0)

self_portrait_use_edge.jpg

Self Portrait Green shift: (x,y,rotation) = (28,77,0.0), Red shift: (37,176,0.0)

three_generations_use_edge.jpg

Three Generations Green shift: (x,y,rotation) = (12,54,0.0), Red shift: (9,111,0.0)

train_use_edge.jpg

Train Green shift: (x,y,rotation) = (2,41,0.0), Red shift: (29,85,0.0)

Here are the results of the images I selected.

choice_1.jpg

Choice 1 Green shift: (x,y,rotation) = (10,39,0.0), Red shift: (7,89,0.0)

choice_2.jpg

Choice 2 Green shift: (x,y,rotation) = (33,56,0.0), Red shift: (60,125,0.0)

choice_3.jpg

Choice 3 Green shift: (x,y,rotation) = (19,37,0.0), Red shift: (25,65,0.0)

Appendix

Here are the results of the alignment for large-sized (.tif) images without using edge image. By comparing with the results using edge image, one can see that using the edge leads to a slightly better result.

emir_without_edge.jpg

Emir Without Edge Green shift: (x,y,rotation) = (24,49,0.0), Red shift: (-1,99,0.0)

lady_without_edge.jpg

Lady Without Edge Green shift: (x,y,rotation) = (8,55,0.0), Red shift: (12,110,0.0)