You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the great work and the quick release of the codes!
I have a question regarding the memory module used in Spann3r. I have noticed that you use a similar approach to XMem originally designed for Video Object Segmentation, which aims to store frequently used prototypes in the long-term memory.
From this memory module, I have understood that the long-term memory stored in spann3r would likely store geometrical prototypes that are frequently used or essential to correspond to other views, successfully decoding the 3D points in one global coordinate. I have two questions, where 1. Is my current understanding correct? and 2. Have you noticed any forgetting issues when handling more frames such as n> 300 or n > 1000?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi @crepejung00, yes, your understanding is correct. For forgetting, it is more like "error accumulation" as we do not have a module that refines memory features when including more observations. Also, since our training uses 5-frame sequence due to the GPU memory constraints, this limits the size of the spatial region the current version of Spann3R can deal with. (Please refer to Sec 4.4 for detailed discussions of the current limitations).
Hi, thanks for the great work and the quick release of the codes!
I have a question regarding the memory module used in Spann3r. I have noticed that you use a similar approach to XMem originally designed for Video Object Segmentation, which aims to store frequently used prototypes in the long-term memory.
From this memory module, I have understood that the long-term memory stored in spann3r would likely store geometrical prototypes that are frequently used or essential to correspond to other views, successfully decoding the 3D points in one global coordinate. I have two questions, where 1. Is my current understanding correct? and 2. Have you noticed any forgetting issues when handling more frames such as n> 300 or n > 1000?
Thanks!
The text was updated successfully, but these errors were encountered: