degrading issue as training progresses #68

JH-CPG · 2024-09-09T09:10:20Z

Hello, thanks for the code.
I tried running it on a COLMAP-based dataset, but I am encountering an issue where the training performance degrades as training progresses. Could you please let me know if I might have missed something?

I initially had problems with my own image set, so I tested on Replica 'room0' dataset, using only the image set.
Camera poses were generated from COLMAP, and depth and normals were estimated from mono depth and normal estimation, respectively.

At an early stage:

In the middle stage:
some region of gaussians starts flickering, disappearing and reappearing within short iterations

Later on, some areas disappear completely:

These are the commands and scripts I've used:

COLMAP
colmap feature_extractor --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --SiftExtraction.max_num_features 10000
colmap exhaustive_matcher --database_path DB_PATH/colmap/database.db
mkdir DB_PATH/colmap/sparse
colmap mapper --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --output_path DB_PATH/colmap/sparse
Depth estimation and alignment
python dn_splatter/scripts/align_depth.py --data DB_PATH
Normal estimation
python dn_splatter/scripts/normals_from_pretrain.py --data-dir DB_PATH --model-type dsine
Training
ns-train dn-splatter --pipeline.model.use-depth-loss True --pipeline.model.mono-depth-lambda 0.1 --pipeline.model.use-depth-smooth-loss True --pipeline.model.use-normal-loss True --pipeline.model.normal-supervision mono coolermap --data DB_PATH --load_normals True

The text was updated successfully, but these errors were encountered:

maturk · 2024-09-09T09:48:57Z

Hey, can you try using the dedicated replica dataparser? Does that have problems?

ns-train dn-splatter ... replica --data ./path_to_replica/ --sequence room0

Btw, monocular depth supervision is not so great, but you could try using the Pearson correlation loss introduced in the new PR here #64 . The Pearson loss is a relative loss so it does not require the scale alignment with colmap points. You can use raw zoedepth estimates with the loss.

I suspect the issue here is that something is wrong with the depth estimates, perhaps the scale, and the regularization is failing resulting in very large gradients and opacities going to zero.

JH-CPG · 2024-09-09T12:13:33Z

Thanks for the quick respond!
The dedicated replica dataparser had no issues. The results were very good.

As you mentioned, either the depth estimation or COLMAP could be causing the flickering issue, so I ran some additional tests.
Although I haven’t looked into the code deeply yet, I intentionally set the training with both depth and normal losses disabled using the following setting:

Replica dataset (image, pose)
ns-train dn-splatter --pipeline.model.use-depth-loss False --pipeline.model.use-normal-loss False replica --data DB_path/Replica --sequence room0

While there was some degradation in the depth or normal rendering quality, the RGB rendering results were still quite good

Replica (image) + Colmap (pose)
ns-train dn-splatter --pipeline.model.use-depth-loss False --pipeline.model.use-normal-loss False coolermap --data DB_path/room0

The rendering result seems to be good in the early stage, but the flickering issue reappears during the middle stage.

Though I’m not sure about the COLMAP's accuracy, the sparse point cloud produced from the COLMAP file seems reasonable.

maturk · 2024-09-09T13:47:13Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

ruiqiyan · 2024-11-17T04:59:01Z

Hello, thanks for the code. I tried running it on a COLMAP-based dataset, but I am encountering an issue where the training performance degrades as training progresses. Could you please let me know if I might have missed something?

I initially had problems with my own image set, so I tested on Replica 'room0' dataset, using only the image set. Camera poses were generated from COLMAP, and depth and normals were estimated from mono depth and normal estimation, respectively.

At an early stage:

In the middle stage: some region of gaussians starts flickering, disappearing and reappearing within short iterations

Later on, some areas disappear completely:

These are the commands and scripts I've used:

COLMAP
colmap feature_extractor --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --SiftExtraction.max_num_features 10000
colmap exhaustive_matcher --database_path DB_PATH/colmap/database.db
mkdir DB_PATH/colmap/sparse
colmap mapper --database_path DB_PATH/colmap/database.db --image_path DB_PATH/images --output_path DB_PATH/colmap/sparse

Depth estimation and alignment
python dn_splatter/scripts/align_depth.py --data DB_PATH

Normal estimation
python dn_splatter/scripts/normals_from_pretrain.py --data-dir DB_PATH --model-type dsine

Training
ns-train dn-splatter --pipeline.model.use-depth-loss True --pipeline.model.mono-depth-lambda 0.1 --pipeline.model.use-depth-smooth-loss True --pipeline.model.use-normal-loss True --pipeline.model.normal-supervision mono coolermap --data DB_PATH --load_normals True

Hi, bro. I also ran into this problem, how did you solve it?

ruiqiyan · 2024-11-17T05:14:46Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

ruiqiyan · 2024-11-18T08:37:13Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

Irving87 · 2024-12-23T02:51:32Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

ruiqiyan · 2024-12-23T03:44:39Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

Irving87 · 2024-12-23T06:13:02Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

“Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. ”--Is this means used metashape to regenerate the pose, and the problem disappeared? By the way, I'm not familiar with metashape, so I want ask should I use metashape to generate the dataset pose, and transform it to colmap format manually?

ruiqiyan · 2024-12-23T06:38:46Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

“Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. ”--Is this means used metashape to regenerate the pose, and the problem disappeared? By the way, I'm not familiar with metashape, so I want ask should I use metashape to generate the dataset pose, and transform it to colmap format manually?

This is not certain. The pose format generated by metashape is the same as that generated by colmap.

Irving87 · 2024-12-23T06:50:26Z

I see, so when you disable dept/normal losses, the results using coolermap is still bad? Hmmm. The only difference between the top command and the bottom command is then the poses. One is using the GT poses given by the Replica dataset (when using dedicated replica dataparser), and the bottom one is using COLMAP estimated poses (when using coolermapdataparser). I wonder if the poses are somehow failing? Weird.

Could this problem be caused by inaccurate posture?

I used colmap to regenerate the pose and found that it was indeed the pose issue.

May I ask how do you find that was the colmap pose issue, and how did you solve it? I met the same problem..

I use the same set of pictures and then use colmap to generate poses, but I will encounter the above situation. Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. The only variables in the two sets of experiments were pose, so I inferred that the cause was pose. It is recommended that you use metashape to generate poses. As far as I know, this application's pose generation is more accurate.

“Afterwards, it was still the same data. I used colmap to regenerate the pose, and the problem disappeared. ”--Is this means used metashape to regenerate the pose, and the problem disappeared? By the way, I'm not familiar with metashape, so I want ask should I use metashape to generate the dataset pose, and transform it to colmap format manually?

This is not certain. The pose format generated by metashape is the same as that generated by colmap.

Thanks, I will try metashape to generate pose

Haven-Lau mentioned this issue Oct 20, 2024

CUDA illegal memory access when training with web viewer running #81

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

degrading issue as training progresses #68

degrading issue as training progresses #68

JH-CPG commented Sep 9, 2024

maturk commented Sep 9, 2024 •

edited

Loading

JH-CPG commented Sep 9, 2024

maturk commented Sep 9, 2024

ruiqiyan commented Nov 17, 2024

ruiqiyan commented Nov 17, 2024

ruiqiyan commented Nov 18, 2024

Irving87 commented Dec 23, 2024

ruiqiyan commented Dec 23, 2024

Irving87 commented Dec 23, 2024

ruiqiyan commented Dec 23, 2024

Irving87 commented Dec 23, 2024

degrading issue as training progresses #68

degrading issue as training progresses #68

Comments

JH-CPG commented Sep 9, 2024

maturk commented Sep 9, 2024 • edited Loading

JH-CPG commented Sep 9, 2024

maturk commented Sep 9, 2024

ruiqiyan commented Nov 17, 2024

ruiqiyan commented Nov 17, 2024

ruiqiyan commented Nov 18, 2024

Irving87 commented Dec 23, 2024

ruiqiyan commented Dec 23, 2024

Irving87 commented Dec 23, 2024

ruiqiyan commented Dec 23, 2024

Irving87 commented Dec 23, 2024

maturk commented Sep 9, 2024 •

edited

Loading