Misalignment between lifted depth and ground-truth pose #12

andreacaraffa · 2024-09-06T11:26:45Z

Description: I found an issue where the point cloud, obtained by lifting the depth image, appears misaligned with respect to the 3D model of the object, transformed according to its ground-truth (GT) pose.
For lifting the depth image, I use the code shown below, where I pass the input depth image (already multiplied by the depth scale, 0.1) and the specific camera parameters available for that image. I use the camera parameters in test_primesense/0000XX/scene_camera.json.

Problem: When visualizing the point cloud in 3D, the object's 3D model transformed using the GT pose does not align with the lifted point cloud. This misalignment occurs consistently across several scenes and test images.

Steps to Reproduce:
Scene: 000001
Test Image: 197
Objects: 2, 25, 29, 30

Here is the code I use for lifting the depth image to a point cloud:

def lift_pcd(depth : torch.Tensor, camera : torch.Tensor, xy_idxs : Tuple = None):

    '''
    Given a depth image and relative camera, lifts the depth to a point cloud.
    If depth has 4 channels, the last 3 are used as RGB and an RGB point cloud is produced in output.
    Image size is implicitly given as depth image size.
    Optionally a set of xy coordinates can be passed to lift only these points
    '''

    H, W, D = depth.shape

    d = depth[:,:,0]

    if xy_idxs is not None:
        xmap = xy_idxs[0].to(d.device)
        ymap = xy_idxs[1].to(d.device)

        pt2 = d[ymap, xmap]
        xmap = xmap.to(torch.float32)
        ymap = ymap.to(torch.float32)

    else:
        # make coordinate grid
        xs = torch.linspace(0, W-1, steps=W)
        ys = torch.linspace(0, H-1, steps=H)

        # modify to be compatible with torhhc 1.8
        xmap, ymap = np.meshgrid(xs.numpy(),ys.numpy(),indexing='xy')

        xmap = torch.tensor(xmap)

        ymap = torch.tensor(ymap)

        xmap = xmap.flatten().to(d.device).to(torch.float32)
        ymap = ymap.flatten().to(d.device).to(torch.float32)
        pt2 = d.flatten()

    # get camera info
    fx = camera[0]
    fy = camera[4]
    cx = camera[2]
    cy = camera[5]
    # perform lifting
    pt0 = (xmap - cx) * pt2 / fx
    pt1 = (ymap - cy) * pt2 / fy
    pcd_depth = torch.stack((pt0, pt1, pt2),dim=1) 

    if D > 1:
        feats = depth[ymap.long(),xmap.long(),1:]
        if xy_idxs is None:
            feats = feats.reshape(H*W,D-1)
        pcd_depth = torch.cat([pcd_depth, feats], dim=1)
    return pcd_depth

Prior to lifting, I load the depth image by:

depth = cv2.imread(depth_path,-1)

Example Results: See the examples below that show the misalignment in the visualizations.
Scene: 000001, Test Image: 197, Objects: 2,25,29,30
Green: Object’s point cloud transformed according to the GT pose
Red: Lifted depth cropped on the object

Request: I suspect that the issue might be related to how the depth image is lifted or the interpretation of the camera parameters. I do not have this problem with the other BOP datasets, but only with T-LESS. Could anyone provide suggestions on how to correctly align the point cloud with the GT pose, or point out what might be wrong with the lifting process or pose application?