Image Alignment

I. Overview

From 1907 to 1915, Sergei Mikhailovich Prokudin-Gorskii, a Russian chemist and photographer, travelled around Russia documenting the Russian Empire with colour photography. He was able to do this on black and white film, by taking three exposures of the same scene with red, green, and blue filters. The three photographs can then can be projected through red, green, and blue filters which superimposes into a single, fully coloured photograph when aligned correctly.

The original cathedral.jpg showing the three channels that it will be cropped into.

The Library of Congress purchased these plates from Prokudin-Gorskii's heirs, and scanned these glass plates and uploaded the collection online. The goal of this project is to write a program that can take these RGB scans and align the three images and combine to display them in full colour the way they were meant to be viewed.

II. Image Alignment

To recreate these photographs, the three separate red, green, and blue filtered images (from now on to be referred to as the R, G, B channels) must be aligned correctly. Since the original images are stacked on top of one another and have artifacts, some preprocessing must be done.

This is my step by step approach:

Crop the original image into three separate channel images by using height / 3 as each channel's height, ch_height.
i.e. The red channel would have corners (0, ch_height * 2), (width, ch_height * 2), (width, ch_height * 3), (0, ch_height * 3)
Crop each channel by some arbitrary percentage on each side.
This is done so that the meat of the image can be considered in the alignment, without the borders and other blemishes affecting any calculations.
Align the cropped red and green channels to the blue channel to get (x, y) displacements for both channels.
There are many algorithms that can be used to achieve this, and will be discussed below.
Apply the displacements onto the uncropped versions of their corresponding channels.
The channels without their borders cropped is used instead of the cropped versions to preserve as much of the image as possible.
Merge the three channels together into one final image and save.

The images above show the area of the channels that will be used in aligning.

Naïve Algorithms

There are many image aligning methods, but the simplest way is to exhaustively compare the R, G, and B channels to find matches in intensity. To align the images, the R and G channels are being compared against the B channel separately in some arbitrary x, y shift range. I chose to use a 31 by 31 search range (shifts 15 pixels left, right, up, and down).

I explored two heuristics to determine the shift amount for the best alignment:

Sum of Squared Differences (SSD)

ssd = sum( sum( (image1 - image2) ^ 2 ) )

The SSD is an enumeration of similarities in colour intensities. The smaller this value, the smaller the difference, and thus, the better the match.

Normalized Cross Correlation (NCC)

ncc = innerproduct( image1 / ||image1||, image2 / ||image2|| )

NCC normalizes the image being shifted, and then takes the inner product of that image and the one it is being compared to as a vector. The larger this value, the closer the two images are, and thus, the better the match.

Both heuristics were used to align the images below. The result displacements (using SSD and NCC) were the same for each image. The resulting full colour images are displayed below, along with their corresponding file names and full sized images. Note that the displacement values are actually (y, x). e.g. (5, -2) represents shift 5 down, and 2 left.

cathedral.jpg
R: (12, 3)
G: (5, 2)

monastery.jpg
R: (3, 2)
G: (-3, 2)

nativity.jpg
R: (7, 1)
G: (3, 1)

settlers.jpg
R: (14, -1)
G: (7, 1)

tobolsk.jpg
R: (6, 3)
G: (3, 3)

SSD

NCC

cathedral.jpg

2.645 seconds

5.768 seconds

monastery.jpg

3.019 seconds

5.315 seconds

nativity.jpg

3.139 seconds

5.248 seconds

settlers.jpg

2.922 seconds

5.181 seconds

tobolsk.jpg

2.947 seconds

6.854 seconds

Although both methods gave the same displacement result, SSD is a slighty faster calculation than NCC, with SSD completing the alignments in around 3 seconds, and NCC in around 6 seconds.

Image Pyramids

Exhaustively searching through the entire image works for the previous JPEG files since they were all relatively low resolution.With larger file sizes (the collection has TIFF files that are around 4000 by 3000), the naïve method can no longer get the job done in a reasonable amount of time.

This is where image pyramids come in. The driving concept is that a smaller image is faster to search through than the original size. An image pyramid is a series of images that are resized smaller and smaller going up the levels of the pyramid. Even though a smaller image would be a lower resolution, it can provide an approximate value to shift for alignment, and further calculations can be done to calculate the smaller scale shifts.

A couple of values must be set before the process can be used. First, we have to decide how much smaller the resized image will be than the image below. I chose to go with decreasing by a factor of 2. We then have to set the number of levels in the pyramid. I chose to expand the pyramid until the image's width or height is under 200 pixels. For these TIFF files that are around 4000 by 3000, that is 5 levels, including the base (original, unresized image). SSD or NCC will be used, though this time the search will be within a 15 by 15 search range.

This is my step by step approach:

Crop the original image to its R, G, B channels and then crop off an arbitrary amount so that we only use the centre of the image to make comparisons.
Create an image pyramid for each channel.
Size down the width and height by half with each level and did this until the highest level has width less than 200 pixels.
Search through a cropped section of each level of the red and green pyramids against the blue pyramid with SSD or NCC to get displacement values.
The image has to be cropped smaller or else there is no decrease in computation speed. After testing, the best crop amount seems to be a 200 x 200 pixel patch in the centre. The displacement amount has to be doubled each time we move down a level to preserve the relative displacement.
Apply the displacements onto the uncropped versions of their corresponding channels.
Merge the three channels together into one final image and save.

bridge.tif
R: (69, 8)
G: (13, -4)

harvesters.tif
R: (124, 17)
G: (60, 19)

lady.tif
R: (112, 10)
G: (49, 7)

melons.tif
R: (180, 12)
G: (77, 4)

In most cases, SSD and NCC gave me the same results. SSD is once again slighty faster than NCC, with SSD at around 15-17 seconds, and NCC at around 20 seconds.

SSD

NCC

bridge.tif

13.714 seconds

14.021 seconds

harvesters.tif

18.645 seconds

20.756 seconds

lady.tif

15.062 seconds

19.762 seconds

melons.tif

21.746 seconds

20.110 seconds

However, I did run into an issue with one of the images, emir.tif. Unlike the other images so far, the centre of this image is predominantly blue. As you can observe in the channels, the gown is "dark" in the red channel as there is very little amounts of red in the full colour photo. Since the gown is at completely different ends of the intensity spectrum for the red and blue channels, the algorithm will not align the channels as expected. This prompted me to look into other alignment methods.

III. Edge Detection

Comparison in colour intenstities has its limitations, and does not work in some cases. Edge detection does a comparison between the changes in intensities instead. There are many approaches to edge detection such as Sobel edge detection and Canny edge detection.

Sobel Edge Detection

The Sobel operator uses a horizontal (x) kernel and a vertical (y) kernel which are convolved with the image to calculate derivates. I implemented my own convolution, but it was slow and took around 20 seconds to apply the Sobel operator on one of the smaller JPEG images. So instead, I used SciPy's signal.convolve2d().

These are the R, G, and B images after the Sobel operator are applied to them. Although there are some differences in intensity, the matching algorithms can now use the edges for aligning.

The process of aligning images with edge detection isn't much different except for an added step. The Sobel operator is applied to each image before comparison using SSD or NCC.

Here is a comparison of emir.tif with and without edge detection. The alignment is dramatically improved.

Edge detection is also helpful in aligning images that have a lot of colour warping.

Here is a comparison of olgin_bridge.jpg with and without edge detection.

IV. Contrast Adjustment

In some images, certain features may not be as distinguishable as you would like. Increasing the contrast of an image creates a larger difference in luminance.

The idea of contrast adjustment is that the minimum and maximum intensities are not at 0 and 255 respectively, and the intensity values can be scaled to have them at the true minimum and maximum values. I ran into a problem implementing the filter this way. All the images that I have tested already have 0 and 255 as their minimum and maximum values, so I had to think of another way to find a scaling factor.

I decided that I would simply create a contrast filter that the user can input a scaling factor for. The filter scales the luminance by some value 1 + n, with n being the inputted scaling factor.

Of course, each image should have a different scaling factor applied, and use a histogram and some illuminance threshold to determine this scaling factor. However, this was not implemented for this project and the value is left as user input.

-0.8

-0.6

-0.4

-0.2

entrance.jpg
R: (12, 2)
G: (5, 1)

+0.4

+0.8

+1.2

The series of images above is entrance.jpg with various levels of contrast applied to it. The original image seems faded. Increasing the contrast of this image makes it look much clearer.

V. Automatic White Balancing

In some images, the "white-est" colour is the image is not pure white and is rather tinted another colour. White balancing elimintes this tint. This is accomplished by finding the "white-est" RGB value, some (r, g, b), and scaling each R, G, B channel by 255/r, 255/g, and 255/b respectively.

The images on the left are the originals, and the images on the right are the results after white balancing.

Although the images above show significant difference that improved the overall quality, white balancing for most the images did not change them very much. This is because the white-est values for these images were close to pure white already. Some examples are shown below.

The images on the left are the originals, and the images on the right are the results after white balancing.

VI. Hubble Space Telescope Imagery

Final result with the three CYM channels.

There are other applications of merging several images using individual colour channels. Images taken in astronomy are not necessarily in the visible spectrum, but we can assign these channels some arbitrary colours to create beautifully coloured images. In fact, more than three channels can be used to create a more colourful image.

Unlike the images in the Prokudin-Gorskii collection, these images do not need to be aligned as the telescopes used are likely quite stable and quick. I simply applied an arbitrary colour to each image and superimposed them on top of each other. The colours I chose were based on personal preference, and any combination of colours can be used to achieve different results.

The next steps would be to implement image rotation and cropping to eliminate the outer grey portion.

The above images are different wavelength images of HST_10419_11 from the Hubble Legacy Archive.

The same images above after applying magenta (255, 0, 255), cyan (0, 255, 255), and yellow (255, 255, 0) filters on them respectively.