Question about MQAR eval of Based #29

Hprairie · 2024-10-01T23:43:15Z

Hey, thanks for the great work. I could be wrong, but I feel like there is a disconnect between what is mentioned in the Based paper and what is used in the Figure 2 config for MQAR eval. In the paper, Based is displayed as an architecture with a mixture of linear attention layers and sliding window layers, however, in the MQAR experiment configs there appear to be no sliding window attention layers. It is just a mix of BaseConv layers and linear Attention layers. In the appendix you note that BaseConv can improve performance, which I am assuming is why they are used in these experiments. However, if it's the case that sliding windows were not used, I'm curious if you have any ablations on the usage of small sliding windows for MQAR?

Again I really appreciate the work, super cool stuff to think about!

Hprairie · 2024-10-02T01:06:24Z

Also, unrelated to the based model, I think I'm a little confused about the MQAR task in general when you pass random_non_queries=True. As the implementation does the following:

    if random_non_queries:
        inputs[inputs == 0] = torch.randint(vocab_size, size=inputs.shape)[inputs == 0]

we can see that this has a very high chance of overwriting a QV pair in the "context" section, especially with long sequences such as the test set of based_figure_2. Doesn't this create a significant mismatch between the train and test sets? I don't think this detracts away from the idea that linear attention struggles with the task, however, the task then becomes much different.

AsadMir10 · 2024-10-31T12:51:17Z

If you already have trained model and data saved as .npy files (e.g., from the attention-360 model and PILE as the dataset) and want to use the Zoology repo to test MQAR results directly on this data, it’s a bit complex, as Zoology isn't designed for direct compatibility with pre-existing .npy files. It’s mainly set up for generating synthetic data.

Hprairie · 2024-10-31T13:33:47Z

Sorry yeah my question isn't about that, it was more about how synthetic data is being generated and the differences between the based models during MQAR eval and pretraining eval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about MQAR eval of Based #29

Question about MQAR eval of Based #29

Hprairie commented Oct 1, 2024

Hprairie commented Oct 2, 2024

AsadMir10 commented Oct 31, 2024

Hprairie commented Oct 31, 2024

Question about MQAR eval of Based #29

Question about MQAR eval of Based #29

Comments

Hprairie commented Oct 1, 2024

Hprairie commented Oct 2, 2024

AsadMir10 commented Oct 31, 2024

Hprairie commented Oct 31, 2024