Siddharth Shashi: Project 4 - Panorama Stitching

Description

This project is all about stitching together many photographs taken from the same vantage point to create larger composite images. In the first part, I manually defined correspondences between photographs, warped them using those correspondences, and then stitched and blended the warps together. In the second part, I automated the correspondence definition by creating image descriptors for interest points and choosing appropriate interest points as the basis for my image warping.

Creating the Homography Matrix

After labeling corresponding points on two images, I needed to create the homography matrix H to warp the first image to the second. This involved solving a system of equations of 8 variables meaning that I needed at least 4 (x, y) correspondences. Since even small errors in correspondence labeling result in drastically different homography matrices, I created an overcomplete system with 6 correspondences and solved for an approximate solution with least squares.

Image Warping

Similar to how I used affine transformations in my face warping project, I warped images with the homographic matrix using an inverse warp. I would get the corners of the warped image as the range I wanted to warp onto and used scipy.interpolate's griddata to linearly interpolate pixel intensities from the original image to get the appropriate colors.

anchor_house_bottom
Anchor House Bottom Unwarped
anchor_house_bottom_warped
Anchor House Bottom Warped

Rectifying Images

Now that I could perspective warp images, I used my function to rectify images. Given an image of tiles taken at an angle, I put in coordinates that the tile corners would have as if the image was taken from above, and warped the image to those correspondences.

tiles
Tiles Unwarped
rectified_tiles
Tiles Warped

Image Stitching

After I had the warped images, I overlaid them with two-band blending. To align the images in their appropriate locations, I used my defined corresponding points to pad and align the images for me to then overlay them. For the overlap regions, I used two blending methods: simple alpha blending and two-band blending. For simple alpha blending, I had an alpha factor which determined the contribution of each image for the overlap's pixel intensities. For two-band blending, I first separated the images into their high- and low-frequency components. I had the overlap only consist of one of the image's high frequency components but the low-frequency components would be blended with an alpha filter.

stitched_building_two_band
Anchor House Stitched w/ Two-band
stitched_building_alpha
Anchor House Stitched w/ Alpha
stitched_moff_alpha
Moffitt Trash Cans Stitched w/ Alpha
stitched_doe_alpha
Doe Entrance Stitched w/ Alpha

Detecting Interest Points

Given an image, I wanted to detect potential corners in the image on which to align. I used Harris corner calculation to do this, which does a process of smoothing and gradient calculations. In essence, this chose points which had a large change in pixel intensities relative to the surrounding area.

harris_corners
Anchor House Bottom Interest Points

Choosing Good Interest Points: Adaptive Non-Maximal Suppression

Given many corners, I now had to choose a few corners -- corner matching is computationally expensive so I don't want that many but I want them to be spread out to sample as much of the image as possible. I used the Adaptive Non-Maximal Suppression algorithm to do this.

anms
Anchor House Bottom ANMS

Getting Feature Descriptors

Now that I have a good set of corners, I needed a metric to compare the corners on -- a descriptor. To do this, I created an 8 x 8 vector sampling each corner's neighboring 40 x 40 region every 5 pixels. I then normalized the vector gains such that the mean was 0 and the standard deviation was 1. Normalization is necessary because I wanted to have invariance to ligting changes across comparisons.

descriptor
8 x 8 Feature Descriptor

Matching Features

To match features, I compared descriptors of the chosen interest points with SSIM as the metric on how similar descriptor were to each other. I then used Lowe's law of nearest neighbors to then determine which of the similar descriptors actually represented correspondences.

anchor_house_bottom_correspondences
Anchor House Bottom Auto-correspondences
anchor_house_middle_correspondences
Anchor House Middle Auto-correspondences

RANSAC: Creating the Homography Matrix

Finally, I used 4-point RANSAC to choose which correspondences to use in creating the homography and then create the resultant homographic matrix.

anchor_house_bottom_ransac
Anchor House Bottom RANSAC
anchor_house_middle_ransac
Anchor House Middle RANSAC

Conclusion

As can be seen through my less-than-ideal Matching Features and RANSAC results, I ran into a bug in my code that I wasn't able to find by the project deadline which led to me not being able to automatically recover correct homographic matrices. That said, I certainly am leaving this project having learned a lot of cool stuff. I think the coolest thing is honestly the power of perception -- this is the first time I'm feeling that I'm writing a program that can perceive relevant points in an image on its own, I just supply the framework to do so. It's getting me excited to see what's in store for the future and how these ideas can be applied to perception in the robotics domain.