Image Warping and Mosaicing
I. Introduction
The objective of this project is to be able to stitch a mosaic from a series of images with manually selected corresponding points. The process can be described in four steps:
- Shoot images so that the transforms between them are projective. These images must be shot from the same point of view so that we can unproject, rotate, and then reproject (this is called homography) to align them.
- Recover the homography of the images. Given a series of (at least four) corresponding points on two images, we can estimate a homography and find a homography matrix. I referenced slides 25-30 from Robert Collins' CSE486 Lecture Slides.
- Warp images using the homography. Now that we have the parameters of the homography matrix, H, it is a simple process of inverse warping the image by H.
- Blend the warped images together. Finally, blend the two images together to create a (hopefully) seamless mosaic.
To improve upon this, automatic mosaicing is also implemented. This entails the automatic selection of likely points and further refining those points and finding the correspondences. A more detailed, step by step explanation is available below.
II. Image Rectification
After retreiving a homography matrix, whether or not the implemented warping is working correctly or not can be determined by using it to rectify an image. This is simply taking an image of an object from an angle for which we know the object's true shape, and morph the image into that shape.
I did this on a photo of the portrait I was working on. The picture was taken at an angle, so the drawing board is trapezoidal in the photo even though we know it to be rectangular. Image rectification recovers the rectangular drawing board as if it were viewed head on. The original points were chosen manually, and the new points were chosen manually as well, through guess and check to ensure the correct width to height ratios were used.




I discovered that there are images that don't look quite right when rectified. At first I tried to rectify this image of some bookshelves in Main Stacks. As you can see below, the rectified image looks oddly mishapen, especially in regards to the spinning handles. This is because the handles (as well as the signs) stick out from the flat surface of the bookshelves. When you look solely at the flat surface of the bookselves, though, they are correctly rectified.


III. Mosaicing
Now that we know that our homography matrix is working, we can finally stitch some photos together and make mosaics! When I was first working on this project, I was in MLK, so I took some quick pictures with my iPhone for testing. I thought then that not having a tripod would produce unappealing ghosting in my mosaics, but as long as I tried my best to pivot my phone about where the camera is (and look very weird while doing it, I'm sure), the results were quite good. Unfortunately, since my phone does auto adjustments for brightness, contrast, exposure, etc. some of the images don't match each other in colour. I had to go in and manually adjust the image until it matched the others better.




To combine two images, we need at least four corresponding points from each image to make a homography matrix. I used around 10 points for each pair of images. When mosaicing two images together, I averaged the points and warped both images toward that average. When mosaicing three images together, I warped the left and right images towards the centre image.
To determine the size of the image image, I simply used the corresponding homography matrices on the left images' left corners, and the right images' right corners and took the min (for left image) or max (for right side) of the x and y values. This gives me some set of negative and positive numbers such as [-545, -267, 1065, 481], which are the min x value, min y value, max x value, and max y value of the final image. The size of the image can be easily found with these values, and all that is needed from there is to offset the pixels being put onto the final image by the min values.

Although the mosaic looks very nice and aligned in terms of features within the image, there are still obvious harsh edges where one image ends and another begins. Some blending technique must be used to eliminate these edges. I chose to implement linear blending. I simply took the x-axis range of where two images overlap (with the same process as finding the final image's width and height) and created a linear gradient into the next image. To calculate the weight is a simple 1 - (x - (left edge x-value)) / ((right edge x-value) - (left edge x-value)). Take 1 minus that value for the weight of the other image.

This process does still leave some edges along the top and bottom of the images, but cropping can elimiate those edges without taking too much away from the mosaic. The reason linear blending worked so well is because the images and points were taken and chosen with care. If the homography did not correctly align the images, there would be ghosting, and the blending would not have worked as well as it did.

Below is a series of other images that I mosaic'd together with the same process.












IV. Automatic Mosaicing
Although the finished mosaics look quite nice, it is very time consuming to go through and find matching points for every image that you want to stitch. So, the key to automatic mosaic stitching is automatic corresponding point selection.
This process is actually quite simple, although it does require several rounds of pruning to ensure there are no false positives. The steps are outlined below:
- Use the Harris Interest Point Detector method to find possible points of interest. It takes in an image and finds all the Harris corners in the image. I used the provided harris.py.
- Use the Adaptive Non-Maximal Suppression strategy on the list of points. We want to restrict the maximum number of points in our list as feature matching in the next step is expensive. This strategy makes sure that we take a certain number of points that are spread evenly across the image, but also weighted according to how strong their respective corners are.
- Extract the feature descriptor for each point. Points must be compared as a feature, which is basically the patch around a point. These descriptors are the 40 by 40 patch around the point downsized to 8 by 8.
- Feature match each feature of each point. Each feature descriptor of the left image is compared against each feature descriptor of the right image using SSD to find the best fitting right feature for each left feature.
- Use 4 point RANSAC to compute a robust homography estimate. Randomly select four points from the list of possible corresponding points to create a homography and see how many of all the points match. Repeat for a set amount of trials. Select the best trial and use the non-discarded points to create a homography.
I took the previous images of Tilden Park for which I had created a mosaic for with manually selected points to use automatic mosaicing on.


The Harris corners of each image are found with the provided harris.py mentioned earlier.


Adaptive Non-Maximal Suppression, or ANMS, is a strategy that selects only the most important and spread out n points so that only a small amount of feature descriptors must be calculated and compared. I chose n = 300 as I tend to work with smaller images. To explain this method in a most basic sense: using each point's h value, we can find radi for each point, which ranks the importance of each point. I selected the top n points with the maximum radius to continue on to the next step.
Since I used images that are on the small side, no points were pruned as there were less inital Harris corners than my maximum selected points.


Now that we have a reasonable number of points, the next step is to find corresponding points in the other image. To do this, we must consider the feature descriptor of each point. We want these feature descriptors to contain as much information as possible while being small so that the following step will be less computationally expensive. To prepare these feature descriptors, we need to massage it into a format that allows for easy and fast feature comparison computation. I took the 40 by 40 patch around each point and used a Gaussian blur on to before downsizing it to an 8 by 8 patch by selecting every 5th pixel to use in the new patch.


Now that we have our feature descriptors for each image, we can use our basic SSD to see which feature in one images matches which feature of the other image the best.


Although our feature matching algorithm is quite good, there is still a chance of having outlier points. So, we use RANSAC to reject these remaining outliers. The idea is to select some random four points to create a homography matrix (transforming the left points to the right points) and apply it to all the left image points that we have and see how many of them match the corresponding right image points. The points that are within some epsilon of their expected location are accepted as inlier and all other points are discarded as outliers. The number of votes, which is the number of inlier, is saved along with the list of inlier points.
This process is repeated some number of trials, t. I chose to use t = 500, since I usually end up with only 20-50 set of points and it produces consistent results. I chose epsilon = 2 pixels as suggested by Professor Efros in class. I tried other values, but 2 seems to have the best results.


Finally, the point selection process is complete and we will have found a set of corresponding points for the two images. Then we take these points and stitch the images together with the homography matrix just as detailed in the previous section.


The final result looks better than the manually selected version, which is great! Below are some more examples.













Some mosaics have better results when points were manually selected, which is especially noticable in the Union Square mosaic. This is because some areas of the images that would have given helpful points were never selected due to those points of interest not being detected as a Harris corner.
I came across some issues with several sets of images where no matches or even no corners could be found in the images. This is because I did not implement any allowances for rotation between the features, so if one image was slightly rotated, no matched would be found. I also had to increase the contrast in some of the images in order for any Harris corners to be detected.
V. Rotation Invariance
Another reason why we were not able to find a set of corresponding points in some of the image pairs was because some of the images were rotated not only about the y-axis, but about the z-axis as well. Two features, the same, will not be matched if one is rotated with our basic SSD, which compares pixels individually. To solve this problem, there must be some "correct" orientation for each feature. I simply rotated each feature to align with the gradient direction of that feature.
Below is an example of one of the pairs of images for which the automatic point selection process did not succeed. It is quite obvious that the z-axis orientation was not preserved from one image to the next. Take the beige building with yellow lights, for example. It is much more rotated in the left image than the right.


The process is exactly the same, except that instead of comparing the original features, we compare the rotated features. Now, the false negatives that were discarded as unmatched will be preserved.




And horray! A previously unstitchable pair of images can now be stitched.

