This is the page for Project 1. Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection.
Back to PortfolioA single image is consisted of three subimages, with each subimage containing pixel values for R, G, or B component. The task is to align three subimages so that one can recover the colored image from three subimages with the least misalignment as possible.
Metric: I used normalized cross-correlation (NCC), a dot product between two normalized image vectors, as a measure of alignment.
Optimization: For small-sized images, I used a brute-force search to find the best alignment. The search space is defined by three dimensions: given a reference image and a target image, how much x-translation / y-translation / rotation should be applied to the target image to maximize the alignment metric. The search space is discretized uniformly: x(y)-translation by 1 pixel, and rotation by 0.1 degrees. However, I empricially found out that the rotation does not affect the quality of result significantly, so I set the rotation to 0.0 degrees.
For large-sized images, I used a hierarchical approach by employing Gaussian pyramid. The idea is to downsample the images and find the best alignment at the coarser level. Then, the alignment is adjusted at the finer level by using the alignment at the coarser level as an initial guess. It can be interpreted as binary search of the parameter space with an assumption that the alignment metric is convex in the alignment parameter space.
Optimization detail: Some images, as provided in the Results section, exhibits some misalignment. To enhance the result, I used an edge instead of the raw pixel values. The edge image is obtained by applying a Sobel filter to the raw image. The edge image is more robust to the misalignment than the raw image, as the edge image is less sensitive to the absolute pixel values and thereby serve as a better indicator for alignment. In the Results section, for large-sized images, Sobel operater is used unless explicitly mentioned otherwise.
Also, I cropped the 5% edges from all four directions when calculating the alignment metric, as edge often contains boundaries that reduce the faithfulness of the alignment metric.
Here are the results of the alignment for small-sized (.jpg) images.
Cathedral Green shift: (x,y,rotation) = (2,5,0.0), Red shift: (3,12,0.0)
Monastery Green shift: (x,y,rotation) = (2,-3,0.0), Red shift: (2,3,0.0)
Tobolsk Green shift: (x,y,rotation) = (2,3,0.0), Red shift: (3,6,0.0)
Here are the results of the alignment for large-sized (.tif) images. Typical runtime per image is around 5 seconds in my local laptop. I could not find a case of significant misalignment.
Church Green shift: (x,y,rotation) = (4,25,0.0), Red shift: (-4,58,0.0)
Emir Green shift: (x,y,rotation) = (23,49,0.0), Red shift: (40,107,0.0)
Harvesters Green shift: (x,y,rotation) = (17,65,0.0), Red shift: (13,123,0.0)
Icon Green shift: (x,y,rotation) = (17,41,0.0), Red shift: (23,90,0.0)
Lady Green shift: (x,y,rotation) = (9,56,0.0), Red shift: (13,119,0.0)
Melons Green shift: (x,y,rotation) = (10,80,0.0), Red shift: (12,177,0.0)
Onion Church Green shift: (x,y,rotation) = (25,52,0.0), Red shift: (36,107,0.0)
Sculpture Green shift: (x,y,rotation) = (-11,33,0.0), Red shift: (-26,140,0.0)
Self Portrait Green shift: (x,y,rotation) = (28,77,0.0), Red shift: (37,176,0.0)
Three Generations Green shift: (x,y,rotation) = (12,54,0.0), Red shift: (9,111,0.0)
Train Green shift: (x,y,rotation) = (2,41,0.0), Red shift: (29,85,0.0)
Here are the results of the images I selected.
Choice 1 Green shift: (x,y,rotation) = (10,39,0.0), Red shift: (7,89,0.0)
Choice 2 Green shift: (x,y,rotation) = (33,56,0.0), Red shift: (60,125,0.0)
Choice 3 Green shift: (x,y,rotation) = (19,37,0.0), Red shift: (25,65,0.0)
Here are the results of the alignment for large-sized (.tif) images without using edge image. By comparing with the results using edge image, one can see that using the edge leads to a slightly better result.
Emir Without Edge Green shift: (x,y,rotation) = (24,49,0.0), Red shift: (-1,99,0.0)
Lady Without Edge Green shift: (x,y,rotation) = (8,55,0.0), Red shift: (12,110,0.0)