For Fall 2024, I am taking an image processing class where one of the assignments involves creating panoramas by implementing image transformations. The process involves manually labeling correspondences between images and performing image warping, rectification, and blending. Below, I document my approach to this assignment with in-depth explanations of each step.

Shoot the Pictures Link to heading

I used a Samsung Galaxy S22 with a 0.5x zoom to capture wide-angle shots. The goal is to capture overlapping images that can be aligned using homographies. Below are some of the pictures I took, which will be used for different stages of the project:

MLK Building
Night Skyline
Table Top

Guidelines for Capturing Images: Link to heading

Camera Movement: Keep the center of projection fixed and rotate the camera for consistent perspective changes.
Overlap: Ensure about 40%-70% overlap between consecutive images to make alignment easier.
Avoid Barrel Distortion: Use lenses that avoid distortion, keeping straight lines straight.
Lighting: Capture images in quick succession to maintain consistent lighting and reduce potential artifacts.

Recover Homographies Link to heading

Homographies are key to transforming points from one image into another. The homography matrix ( H ) is a 3x3 matrix that defines this transformation. It maps points from image 1 to image 2 based on their coordinates.

Given corresponding points ( p = (x, y) ) in the first image and ( p’ = (x’, y’) ) in the second, the transformation can be described by:

$$ p’ = H \cdot p $$

Steps to Recover Homography: Link to heading

Select Corresponding Points: Using a manual labeling tool, I selected pairs of corresponding points in both images.
Formulate the Equation: With four or more correspondences, the homography matrix ( H ) can be computed by solving the system ( Ah = b ), where ( A ) is a matrix formed from the correspondence pairs and ( h ) is the vectorized form of the homography matrix.
Solve Using Least Squares: For robustness, I use more than four points, resulting in an overdetermined system. The least squares method minimizes error in finding the best-fit homography.

Since our H matrix contains 8 unknown variables (the lower-right corner of the matrix is a scaling factor and set to 1), we need >= 8 equations to solve for the matrix. Each point correspondence gives us 2 equations. So, we need at least 4 points to solve the system of linear equations. In practice, we often use lots of points to avoid an unstable and noisy transformation. Least-squares can then be used to recover the homography matrix:

homography matrx

Warp the Images Link to heading

Once the homography matrix ( H ) is computed, I use it to warp one image into the coordinate system of another. The warping operation applies a projective transformation, ensuring that the perspective between the images is corrected.

The warping process is defined as:

$$ p = H^{-1} \cdot p' $$

Where ( H^{-1} ) is the inverse of the homography matrix, used for inverse warping.

Key steps in warping:

Forward vs. Inverse Warping: I used inverse warping to avoid holes in the final image, computing pixel values in the destination image by interpolating pixel values from the source.
Interpolation: Using cv2.remap, I performed interpolation to fill in missing pixel values smoothly.

Image Rectification Link to heading

Before blending the images into a mosaic, I rectified images to ensure the correctness of the homography transformation. Rectification involves mapping an image containing a known planar surface (like a poster or keyboard) so that the plane appears front-facing.

For example:

Starcraft Laptop
Poster

In both cases, I manually selected points on the objects and mapped them to a predefined square in the output space. This demonstrates that the warping process is functioning correctly.

Blend the Images into a Mosaic Link to heading

With the images properly aligned through homographies, the next step is to blend them into a seamless mosaic. Simple stitching often leads to harsh edges and artifacts. To avoid this, I used Weighted Distance Blending.

Weighted Distance Blending: Link to heading

After aligning the images with homographies, I needed to blend them into a seamless mosaic. Initially, I experimented with Laplacian pyramids for multi-scale blending. While this approach is effective in preserving detail and avoiding harsh transitions, it was challenging to determine the exact boundaries between overlapping images, leading to imperfect blending results.

To address this, I switched to a more straightforward and effective method: weighted feathering using the bwdist function. This method worked exceptionally well and produced smooth transitions between the overlapping regions of the images.

Weighted Feathering with `bwdist` Link to heading

The idea behind this technique is to compute a distance transform for each image, which measures the distance from the nearest edge of the mask. The further a pixel is from the boundary, the more weight it carries in the blending process. The weights are normalized to smoothly transition between the two images based on their distances from the edges.

Here’s the Python code I used for blending:

def bwdistBlend(panorama, enlarged_image):
    # Create binary masks for each image
    mask_2 = np.any(enlarged_image > 0, axis=-1).astype(np.float32)
    mask_1 = np.any(panorama > 0, axis=-1).astype(np.float32)

    # Compute the intersection mask where both images overlap
    intersection = np.logical_and(mask_1, mask_2)
    # Compute distance transform for both masks (distance from zero)
    dist_1 = bwdist(mask_1)
    dist_2 = bwdist(mask_2)

    # Normalize distance maps to range [0, 1]
    dist_1_norm = dist_1 / (dist_1.max() + 1e-8)
    dist_2_norm = dist_2 / (dist_2.max() + 1e-8)

    # Compute the blend weights based on distance transforms
    blend_weights_1 = dist_1_norm / (dist_1_norm + dist_2_norm + 1e-8)
    blend_weights_2 = dist_2_norm / (dist_1_norm + dist_2_norm + 1e-8)

    # Ensure the blend weights are only applied in the overlapping regions
    blend_weights_1[~intersection] = 1  # Only panorama where no intersection
    blend_weights_2[~intersection] = 0  # Only enlarged_image where no intersection

    # Blend the two images using the computed weights
    result = panorama + enlarged_image
    result[intersection] = (panorama.astype(np.float32) * blend_weights_1[:, :, np.newaxis] +
              enlarged_image.astype(np.float32) * blend_weights_2[:, :, np.newaxis])[intersection]

    # Clip the result to ensure valid pixel values and convert back to uint8
    result = np.clip(result, 0, 255).astype(np.uint8)

    return result