This project is all about stitching together many photographs taken from the same vantage point to create larger composite images. In the first part, I manually defined correspondences between photographs, warped them using those correspondences, and then stitched and blended the warps together. In the second part, I automated the correspondence definition by creating image descriptors for interest points and choosing appropriate interest points as the basis for my image warping.
After labeling corresponding points on two images, I needed to create the homography matrix H to warp the first image to the second. This involved solving a system of equations of 8 variables meaning that I needed at least 4 (x, y) correspondences. Since even small errors in correspondence labeling result in drastically different homography matrices, I created an overcomplete system with 6 correspondences and solved for an approximate solution with least squares.
Similar to how I used affine transformations in my face warping project, I warped images with the homographic matrix using an inverse warp. I would get the corners of the warped image as the range I wanted to warp onto and used scipy.interpolate's griddata to linearly interpolate pixel intensities from the original image to get the appropriate colors.
Now that I could perspective warp images, I used my function to rectify images. Given an image of tiles taken at an angle, I put in coordinates that the tile corners would have as if the image was taken from above, and warped the image to those correspondences.
After I had the warped images, I overlaid them with two-band blending. To align the images in their appropriate locations, I used my defined corresponding points to pad and align the images for me to then overlay them. For the overlap regions, I used two blending methods: simple alpha blending and two-band blending. For simple alpha blending, I had an alpha factor which determined the contribution of each image for the overlap's pixel intensities. For two-band blending, I first separated the images into their high- and low-frequency components. I had the overlap only consist of one of the image's high frequency components but the low-frequency components would be blended with an alpha filter.
Given an image, I wanted to detect potential corners in the image on which to align. I used Harris corner calculation to do this, which does a process of smoothing and gradient calculations. In essence, this chose points which had a large change in pixel intensities relative to the surrounding area.
Given many corners, I now had to choose a few corners -- corner matching is computationally expensive so I don't want that many but I want them to be spread out to sample as much of the image as possible. I used the Adaptive Non-Maximal Suppression algorithm to do this.
Now that I have a good set of corners, I needed a metric to compare the corners on -- a descriptor. To do this, I created an 8 x 8 vector sampling each corner's neighboring 40 x 40 region every 5 pixels. I then normalized the vector gains such that the mean was 0 and the standard deviation was 1. Normalization is necessary because I wanted to have invariance to ligting changes across comparisons.
To match features, I compared descriptors of the chosen interest points with SSIM as the metric on how similar descriptor were to each other. I then used Lowe's law of nearest neighbors to then determine which of the similar descriptors actually represented correspondences.