Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

degrading issue as training progresses #68

Open
JH-CPG opened this issue Sep 9, 2024 · 11 comments
Open

degrading issue as training progresses #68

JH-CPG opened this issue Sep 9, 2024 · 11 comments

Comments

@JH-CPG
Copy link

JH-CPG commented Sep 9, 2024

Hello, thanks for the code.
I tried running it on a COLMAP-based dataset, but I am encountering an issue where the training performance degrades as training progresses. Could you please let me know if I might have missed something?

I initially had problems with my own image set, so I tested on Replica 'room0' dataset, using only the image set.
Camera poses were generated from COLMAP, and depth and normals were estimated from mono depth and normal estimation, respectively.

At an early stage:
image (7)

In the middle stage:
some region of gaussians starts flickering, disappearing and reappearing within short iterations
image (702)

Later on, some areas disappear completely:
image (8)

These are the commands and scripts I've used:

  • COLMAP
    colmap feature_extractor --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --SiftExtraction.max_num_features 10000
    colmap exhaustive_matcher --database_path DB_PATH/colmap/database.db
    mkdir DB_PATH/colmap/sparse
    colmap mapper --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --output_path DB_PATH/colmap/sparse

  • Depth estimation and alignment
    python dn_splatter/scripts/align_depth.py --data DB_PATH

  • Normal estimation
    python dn_splatter/scripts/normals_from_pretrain.py --data-dir DB_PATH --model-type dsine

  • Training
    ns-train dn-splatter --pipeline.model.use-depth-loss True --pipeline.model.mono-depth-lambda 0.1 --pipeline.model.use-depth-smooth-loss True --pipeline.model.use-normal-loss True --pipeline.model.normal-supervision mono coolermap --data DB_PATH --load_normals True

@maturk
Copy link
Owner

maturk commented Sep 9, 2024

Hey, can you try using the dedicated replica dataparser? Does that have problems?

ns-train dn-splatter ... replica --data ./path_to_replica/ --sequence room0

Btw, monocular depth supervision is not so great, but you could try using the Pearson correlation loss introduced in the new PR here #64 . The Pearson loss is a relative loss so it does not require the scale alignment with colmap points. You can use raw zoedepth estimates with the loss.

I suspect the issue here is that something is wrong with the depth estimates, perhaps the scale, and the regularization is failing resulting in very large gradients and opacities going to zero.

@JH-CPG
Copy link
Author

JH-CPG commented Sep 9, 2024

Thanks for the quick respond!
The dedicated replica dataparser had no issues. The results were very good.

As you mentioned, either the depth estimation or COLMAP could be causing the flickering issue, so I ran some additional tests.
Although I haven’t looked into the code deeply yet, I intentionally set the training with both depth and normal losses disabled using the following setting:

  • Replica dataset (image, pose)
    ns-train dn-splatter --pipeline.model.use-depth-loss False --pipeline.model.use-normal-loss False replica --data DB_path/Replica --sequence room0

While there was some degradation in the depth or normal rendering quality, the RGB rendering results were still quite good

  • Replica (image) + Colmap (pose)
    ns-train dn-splatter --pipeline.model.use-depth-loss False --pipeline.model.use-normal-loss False coolermap --data DB_path/room0

The rendering result seems to be good in the early stage, but the flickering issue reappears during the middle stage.

Though I’m not sure about the COLMAP's accuracy, the sparse point cloud produced from the COLMAP file seems reasonable.

@maturk
Copy link
Owner

maturk commented Sep 9, 2024

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

@ruiqiyan
Copy link

Hello, thanks for the code. I tried running it on a COLMAP-based dataset, but I am encountering an issue where the training performance degrades as training progresses. Could you please let me know if I might have missed something?

I initially had problems with my own image set, so I tested on Replica 'room0' dataset, using only the image set. Camera poses were generated from COLMAP, and depth and normals were estimated from mono depth and normal estimation, respectively.

At an early stage: image (7)

In the middle stage: some region of gaussians starts flickering, disappearing and reappearing within short iterations image (702)

Later on, some areas disappear completely: image (8)

These are the commands and scripts I've used:

  • COLMAP
    colmap feature_extractor --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --SiftExtraction.max_num_features 10000
    colmap exhaustive_matcher --database_path DB_PATH/colmap/database.db
    mkdir DB_PATH/colmap/sparse
    colmap mapper --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --output_path DB_PATH/colmap/sparse
  • Depth estimation and alignment
    python dn_splatter/scripts/align_depth.py --data DB_PATH
  • Normal estimation
    python dn_splatter/scripts/normals_from_pretrain.py --data-dir DB_PATH --model-type dsine
  • Training
    ns-train dn-splatter --pipeline.model.use-depth-loss True --pipeline.model.mono-depth-lambda 0.1 --pipeline.model.use-depth-smooth-loss True --pipeline.model.use-normal-loss True --pipeline.model.normal-supervision mono coolermap --data DB_PATH --load_normals True

Hi, bro. I also ran into this problem, how did you solve it?

@ruiqiyan
Copy link

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

@ruiqiyan
Copy link

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

@Irving87
Copy link

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

@ruiqiyan
Copy link

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

@Irving87
Copy link

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

“Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. ”--Is this means used metashape to regenerate the pose, and the problem disappeared? By the way, I'm not familiar with metashape, so I want ask should I use metashape to generate the dataset pose, and transform it to colmap format manually?

@ruiqiyan
Copy link

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

“Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. ”--Is this means used metashape to regenerate the pose, and the problem disappeared? By the way, I'm not familiar with metashape, so I want ask should I use metashape to generate the dataset pose, and transform it to colmap format manually?

This is not certain. The pose format generated by metashape is the same as that generated by colmap.

@Irving87
Copy link

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

“Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. ”--Is this means used metashape to regenerate the pose, and the problem disappeared? By the way, I'm not familiar with metashape, so I want ask should I use metashape to generate the dataset pose, and transform it to colmap format manually?

This is not certain. The pose format generated by metashape is the same as that generated by colmap.

Thanks, I will try metashape to generate pose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants