You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding a minor thing in the memory pruning logic.
As far as I understood, the weights for finding relevant features are masked to keep the working memory at the beginning of memory_prune() by:
However, I saw this mysterious +5 in there. Could you maybe explain why it's there?
Additionally, in the paper in Figure 7 we see a performance drop for a number of tokens < 2000. Could it be that this stems from this logic? My feeling would be that this will basically set the same weight for 1960 tokens in the memory, and if the memory gets pruned to a smaller number, "random" features are thrown out. Do you think this might explain this trend or have you used a different working memory size for this experiment?
Thanks again!
The text was updated successfully, but these errors were encountered:
Thanks @lorafib for flagging this issue! Yes, I think you are right! I double-checked with the code and that +5 used to be there because, at the early stage, I want to ensure the memory features stay in long-term memory for an extra 5 frames and then get pruned. However, this indeed causes the issues you mentioned.
After fixing that issue, the number of memory features can be further compressed to less than 1500 on the 7scenes dataset. I will update the paper for the camera-ready version. Thanks:)
Hey,
thanks for sharing your exciting work!
I have a question regarding a minor thing in the memory pruning logic.
As far as I understood, the weights for finding relevant features are masked to keep the working memory at the beginning of memory_prune() by:
weights[self.mem_count<self.work_mem_size+5] = 1e8
However, I saw this mysterious +5 in there. Could you maybe explain why it's there?
Additionally, in the paper in Figure 7 we see a performance drop for a number of tokens < 2000. Could it be that this stems from this logic? My feeling would be that this will basically set the same weight for 1960 tokens in the memory, and if the memory gets pruned to a smaller number, "random" features are thrown out. Do you think this might explain this trend or have you used a different working memory size for this experiment?
Thanks again!
The text was updated successfully, but these errors were encountered: