This project is all about stitching together many photographs taken from the same vantage point to create larger composite images. In the first part, I manually defined correspondences between photographs, warped them using those correspondences, and then stitched and blended the warps together. In the second part, I automated the correspondence definition by creating image descriptors for interest points and choosing appropriate interest points as the basis for my image warping.
After labeling corresponding points on two images, I needed to create the homography matrix H to warp the first image to the second. This involved solving a system of equations of 8 variables meaning that I needed at least 4 (x, y) correspondences. Since even small errors in correspondence labeling result in drastically different homography matrices, I created an overcomplete system with 6 correspondences and solved for an approximate solution with least squares.
Similar to how I used affine transformations in my face warping project, I warped images with the homographic matrix using an inverse warp. I would get the corners of the warped image as the range I wanted to warp onto and used scipy.interpolate's griddata to linearly interpolate pixel intensities from the original image to get the appropriate colors.
Now that I could perspective warp images, I used my function to rectify images. Given an image of tiles taken at an angle, I put in coordinates that the tile corners would have as if the image was taken from above, and warped the image to those correspondences.
After I had the warped images, I overlaid them with two-band blending. To align the images in their appropriate locations, I used my defined corresponding points to pad and align the images for me to then overlay them. For the overlap regions, I used two blending methods: simple alpha blending and two-band blending. For simple alpha blending, I had an alpha factor which determined the contribution of each image for the overlap's pixel intensities. For two-band blending, I first separated the images into their high- and low-frequency components. I had the overlap only consist of one of the image's high frequency components but the low-frequency components would be blended with an alpha filter.
Given an image, I wanted to detect potential corners in the image on which to align. I used Harris corner calculation to do this, which does a process of smoothing and gradient calculations. In essence, this chose points which had a large change in pixel intensities relative to the surrounding area.
Given many corners, I now had to choose a few corners -- corner matching is computationally expensive so I don't want that many but I want them to be spread out to sample as much of the image as possible. I used the Adaptive Non-Maximal Suppression algorithm to do this.
Now that I have a good set of corners, I needed a metric to compare the corners on -- a descriptor. To do this, I created an 8 x 8 vector sampling each corner's neighboring 40 x 40 region every 5 pixels. I then normalized the vector gains such that the mean was 0 and the standard deviation was 1. Normalization is necessary because I wanted to have invariance to ligting changes across comparisons.
To match features, I compared descriptors of the chosen interest points with SSIM as the metric on how similar descriptor were to each other. I then used Lowe's law of nearest neighbors to then determine which of the similar descriptors actually represented correspondences.
Finally, I used 4-point RANSAC to choose which correspondences to use in creating the homography and then create the resultant homographic matrix.
As can be seen through my less-than-ideal Matching Features and RANSAC results, I ran into a bug in my code that I wasn't able to find by the project deadline which led to me not being able to automatically recover correct homographic matrices. That said, I certainly am leaving this project having learned a lot of cool stuff. I think the coolest thing is honestly the power of perception -- this is the first time I'm feeling that I'm writing a program that can perceive relevant points in an image on its own, I just supply the framework to do so. It's getting me excited to see what's in store for the future and how these ideas can be applied to perception in the robotics domain.