Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panorama 360 #19

Open
sctrueew opened this issue Nov 12, 2024 · 17 comments
Open

Panorama 360 #19

sctrueew opened this issue Nov 12, 2024 · 17 comments

Comments

@sctrueew
Copy link

Hi,

I have a 360 panorama image. How can I display it in 3D?

@EasternJournalist
Copy link
Collaborator

EasternJournalist commented Nov 19, 2024

Hi, MoGe is designed to handle natural perspective images. While it’s possible to use MoGe with panorama images, it would require a few modifications (which could be quite tedious). Here's one approach to obtain a panorama depth map:

  1. Convert the panorama image to cube maps: Split the panorama into 6 faces and predict 3D point maps on each face separately, and convert to Euclidean distance maps. This will give you a panorama depth(distance) map, though the scales from the 6 faces may not be aligned.

  2. Rotate and reprocess: Next, convert the panorama image to another cube map by rotating it 45 degrees. Predict the depth for the new cube map, which will overlap with the first one. This overlap allows you to fuse the two depth maps into a seamless panorama depth map, for example, using poisson blending.

The panorama depth map can be easily lifted to 3D points via $d * \vec r$ where $d$ is the depth(distance) and $\vec r$ is the direction of a pixel.

@sctrueew
Copy link
Author

Thank you for your response. I appreciate your help.

I would like to ask if it’s possible to extract points directly from a 360-degree image or environment. Currently, I am using a script with dep-anything to obtain points.

def get_uni_sphere_xyz(H, W):
    j, i = np.meshgrid(np.arange(H), np.arange(W), indexing='ij')
    u = (i + 0.5) / W * 2 * np.pi
    v = ((j + 0.5) / H - 0.5) * np.pi
    z = -np.sin(v)
    c = np.cos(v)
    y = c * np.sin(u)
    x = c * np.cos(u)
    sphere_xyz = np.stack([x, y, z], -1)
    return sphere_xyz

Looking forward to your guidance on this matter.

@EasternJournalist
Copy link
Collaborator

The code is for lifting panorama depth map to 3D points. However, the problem here is how to obtain the panorama depth map.
It is not recommanded to use models like DepthAnything and MoGe to directly infer with panorama images and take the depth, as the results will be most likely unsatisfying (distorted and not consistent at boundaries). Only models trained with panorama images allows such direct estimation. May be you can find some here https://github.com/bkhanal-11/awesome-360-depth-estimation?tab=readme-ov-file.

If using generalized monocular depth estimation like DepthAnything and MoGe, it is a better idea to divide a panorama image into several perspective images, infer respectively and finally fuse them together. I am planning to implement one in a separate inference script. Please stay tuned.

@sctrueew
Copy link
Author

@EasternJournalist Hi, thank you for sharing your thoughtful approach. it's a clever strategy to tackle the limitations of generalized models with panorama images. Good luck with your implementation.

@EasternJournalist
Copy link
Collaborator

EasternJournalist commented Nov 28, 2024

@sctrueew Hi. I have added scripts/infer_panorama.py. Also check the README please. Your feedback is welcome.

@sctrueew
Copy link
Author

@EasternJournalist Hi, thank you for the implementation. I will check it soon and give you the feedback.

@DiamondGlassDrill
Copy link

DiamondGlassDrill commented Dec 1, 2024

Thanks, @EasternJournalist! For the amazing work.

I tried both infer_panorama and the regular infer function on a standard image.
The regular infer works flawlessly without any issues and really fast
(Note: For those using self written pipelines, the depth output filename was updated in infer.py from depth.png to depth_vis.png.) and moved to script/infer.py

Now, coming back to my tests with infer_panorama.py, I encountered an issue when processing a 4K panoramic view it is really slow 5min on an RTX 3090. Is that to be expected?

Here's the Python output:

Inferring splited views:   0%|          | 0/3 [00:00<?, ?it/s]
Python Error: ...Code\Depth\MoGe\moge\model\utils.py:31: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  x = F.scaled_dot_product_attention(q, k, v, attn_bias)

Inferring splited views:  33%|###3      | 1/3 [00:01<00:02,  1.31s/it]
Inferring splited views:  67%|######6   | 2/3 [00:02<00:01,  1.04s/it]
Inferring splited views: 100%|##########| 3/3 [00:02<00:00,  1.06it/s]

Python Output: True
Merging...

Python Output: ...\Code\Depth\testPage\textures\billiard_hall
Inference type: panorama
Parsed options: {'resize': None, 'resolution_level': 9, 'threshold': 0.03, 'save_maps': True, 'save_glb': False, 'save_ply': False, 'show': False, 'fov_x': None, 'batch_size': 4, 'save_splited': True}

Current debug Steps:

  1. I do see with the save_splited flag that the images are being processed 12 in total with the distance_vis.png. Which is rather fast! ~4 sec

  2. Additionally, I encountered the following warning message:
    Python Error: ...\Code\Depth\MoGe\moge\model\utils.py:31: UserWarning: Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
    x = F.scaled_dot_product_attention(q, k, v, attn_bias)

I suspect that this warning is unrelated to the root cause of the infer_panorama being so slow?

  1. It seems, that the issue is between the the save split step and the merge step, but there is nothing in between :)
  2. Maybe some background calculation as my GPU has a Power Consumption of 40% with 130W for 5min for whatever is happening between save_splited and merge starts, after that step merge is quickly done as well as storing of files as well. ~1sec
        # Save splited
        if save_splited:

            splited_save_path = Path(output_path, image_path.stem, 'splited')
            splited_save_path.mkdir(exist_ok=True, parents=True)
            for i in range(len(splited_images)):
                cv2.imwrite(str(splited_save_path / f'{i:02d}.jpg'), cv2.cvtColor(splited_images[i], cv2.COLOR_RGB2BGR))
                cv2.imwrite(str(splited_save_path / f'{i:02d}_distance_vis.png'), cv2.cvtColor(colorize_depth(splited_distance_maps[i], splited_masks[i]), cv2.COLOR_RGB2BGR))

**BETWEEN HERE AND
HERE it takes 5 minutes?** 

        # Merge
        if pbar.disable:
            print('Merging...')
        else:
            pbar.set_postfix_str(f'Merging')
        panorama_depth, panorama_mask = merge_panorama_depth(width, height, splited_distance_maps, splited_masks, splited_extrinsics, splited_intriniscs)
        panorama_depth = panorama_depth.astype(np.float32)
        panorama_depth = np.where(panorama_mask, panorama_depth, np.inf)
        points = panorama_depth[:, :, None] * spherical_uv_to_directions(utils3d.numpy.image_uv(width=width, height=height))

Maybe that helps already a bit.

PS: Maybe you could update splited into splitted with (tt), if time allows it.

@DiamondGlassDrill
Copy link

DiamondGlassDrill commented Dec 1, 2024

To my mind came another issue in the utils3d/numpy/transforms.py I received as well an error

Inferring splited views:  33%|###3      | 1/3 [00:01<00:02,  1.33s/it]
Inferring splited views:  67%|######6   | 2/3 [00:02<00:01,  1.05s/it]
Inferring splited views: 100%|##########| 3/3 [00:03<00:00,  1.06it/s]

Python Error: ...\Code\Depth\MoGe\utils3d\numpy\transforms.py:606: RuntimeWarning: divide by zero encountered in divide
  uv_coord = points[..., :2] / points[..., 2:]

Which I fixed with:
This brought me from 7min --> 5min as explained above

@batched(2, 2, 2)
def project_cv(
        points: np.ndarray,
        extrinsics: np.ndarray = None,
        intrinsics: np.ndarray = None
    ) -> Tuple[np.ndarray, np.ndarray]:
    """
    Project 3D points to 2D following the OpenCV convention

    Args:
        points (np.ndarray): [..., N, 3] or [..., N, 4] 3D points to project, if the last
            dimension is 4, the points are assumed to be in homogeneous coordinates
        extrinsics (np.ndarray): [..., 4, 4] extrinsics matrix
        intrinsics (np.ndarray): [..., 3, 3] intrinsics matrix

    Returns:
        uv_coord (np.ndarray): [..., N, 2] uv coordinates, value ranging in [0, 1].
            The origin (0., 0.) is corresponding to the left & top
        linear_depth (np.ndarray): [..., N] linear depth
    """
    assert intrinsics is not None, "intrinsics matrix is required"

    # Ensure points are in homogeneous coordinates
    if points.shape[-1] == 3:
        points = np.concatenate([points, np.ones_like(points[..., :1])], axis=-1)

    # Apply extrinsics if provided
    if extrinsics is not None:
        points = points @ extrinsics.swapaxes(-1, -2)

    # Apply intrinsics
    points = points[..., :3] @ intrinsics.swapaxes(-1, -2)

    # Extract z-values (depth)
    z_values = points[..., 2]

    # Handle zero or negative depth values
    valid_mask = z_values > 0  # Valid points have positive depth
    uv_coord = np.zeros_like(points[..., :2])  # Initialize UV coordinates with zeros
    uv_coord[valid_mask] = points[..., :2][valid_mask] / z_values[valid_mask, None]  # Only divide valid points

    # Linear depth
    linear_depth = np.where(valid_mask, z_values, np.inf)  # Set invalid depths to infinity

    return uv_coord, linear_depth

@EasternJournalist
Copy link
Collaborator

EasternJournalist commented Dec 2, 2024

@DiamondGlassDrill Hi, thanks for providing the debugging information!

Q1: infer_panorama.py merging is very slow

The slow merging process might be due to inferring on a very high-resolution panorama image (e.g., 4K). High resolutions significantly slow down the merging step.

The script operates in two stages: first, inferring on separate views, and second, merging them. The initial step involves running MoGe on 12 individual views, which is efficient on GPUs. The merging step, however, performs a least-squares optimization with scipy on the CPU. For a medium resolution (1920x960), this typically takes around 10 seconds. However, higher resolutions increase the optimization time quadratically.

As a quick fix, you can try using the --resize 1920 option. Since 1920x960 is sufficient to capture geometry details, I will update the code to limit the resolution for depth merging, ensuring the optimization time remains manageable.

Q2: Runtime warnings
The "division by zero" warnings are expected during the process. While these warnings do not affect the inference results, I agree they are unnecessary and can be annoying. A practical solution is to avoid warnings by only performing division on valid points.

However, invalid depth values should be set to zeros or left unchanged instead of infinity. This is because valid points are filtered using the condition depth > 0 when projecting spherical points onto separate views. Keeping invalid points might lead to a smoother optimization target during depth merging, which could explain why you observed faster convergence.

@EasternJournalist
Copy link
Collaborator

EasternJournalist commented Dec 2, 2024

@DiamondGlassDrill Merging now works at a limited resolution without manually resizing, and the typos are fixed. #27
Thanks!

@sctrueew
Copy link
Author

sctrueew commented Dec 3, 2024

@EasternJournalist thank you for the information. I have a couple of questions:

  1. How can I smooth the edges? I need to apply some post-processing to achieve smoother transitions.
    Image

  2. Could you guide me in creating SLAM (Simultaneous Localization and Mapping)? I have a video, and I need to generate a 3D point cloud similar to this example.
    Image

Looking forward to your guidance. Thank you in advance!

@EasternJournalist
Copy link
Collaborator

@sctrueew The jagged edges are caused by the image's low resolution. As you are visualizing the point map as a mesh grid connecting adjacent pixels, these artifacts are inevitable since they correspond to individual pixel sizes. To achieve a smoother view, you might consider visualizing it as a point cloud instead.

We're still working on expanding MoGe’s applications, including SLAM and video reconstruction. Since MoGe is an open-source project under the MIT license, you’re welcome to adapt it to meet your project's needs. You may find some information in this issue #6. We appreciate your understanding and cooperation.

@DiamondGlassDrill
Copy link

DiamondGlassDrill commented Dec 3, 2024

@EasternJournalist
I added to \MoGe\utils3d\utils.py at the top of the code

def no_runtime_warnings(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
            return func(*args, **kwargs)
    return wrapper

As otherwise the decorator didn't work in \MoGe\utils3d\utils.py

@DiamondGlassDrill
Copy link

@EasternJournalist , I have another question for you.

Currently, I load an HDR 360 image onto a sphere, and everything works perfectly fine. After that, I process it through your panorama pipeline and generate the depth map. My question is: should I overlay the depth map back onto a sphere? Or, in the case of a room, would it be better to work with a cube format instead?

  1. HDR in sphere Image The original HDR 360 image loaded onto a sphere looks great.
  2. HDR Converted with MoGe Panorama Image I process the image using MoGe Panorama and load the resulting depth map (depth.exr).
  3. Depth Map on a Flat Plane Image
    When I load the depth map onto a flat plane, it only creates a flat 2D effect, with the doors as 3d dents, not the nice 3D plane effect seen in point cloud examples. I thought the depth map would help achieve that 3D look, but it seems to fall short.
  4. Depth Map on a Sphere Image
    Applying the depth map to a sphere looks okay, and there’s some nice depth detail around the doors. However, it still doesn’t create the full 3D effect I’m aiming for. Could this be due to the lack of a proper cube-based 3D structure?

I know you talked about this part of possible Cube structuring of a room in another thread. Did you find some time to code it already by chance? Will try to find as well some time to check your steps.

Thanks in advance.

@EasternJournalist
Copy link
Collaborator

@EasternJournalist I added to \MoGe\utils3d\utils.py at the top of the code

def no_runtime_warnings(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
            return func(*args, **kwargs)
    return wrapper

As otherwise the decorator didn't work in \MoGe\utils3d\utils.py

@DiamondGlassDrill Thanks for the quick fix solution! I am sorry that I forgot to sync the utils3d submodule. The missing decorator "no_warnings" is now added.

@EasternJournalist
Copy link
Collaborator

EasternJournalist commented Dec 4, 2024

@DiamondGlassDrill I don't quite get how the panorama depth map is visualized on a flat plane. The panorama depth map "depth.exr" is supposed to be multiplied with a UV sphere $d\cdot (\cos 2\pi (1- u)\sin\pi v, \sin 2\pi (1-u)\sin\pi v, \cos \pi v )$ to obtain 3D points. See this line

points = panorama_depth[:, :, None] * spherical_uv_to_directions(utils3d.numpy.image_uv(width=width, height=height))

Given the following input image Image

the point cloud result saved in mesh.ply should look like this from the top view
Image
inside
Image
in mesh mode
Image

Is this effect what you expected?

@DiamondGlassDrill
Copy link

DiamondGlassDrill commented Dec 11, 2024

@EasternJournalist excellent indeed - works now perfectly, did some multiplication errors, as you pointed out above!
Awesome work!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants