Feedback as a guinea pig #1

lbeyers · 2024-08-28T14:34:30Z

Not an issue, just feedback as someone who has never worked with this type of data before.

I find the notebook easy to follow given the explanation I received. It takes some time to get into it, and I added comments for myself locally, (to be specific, in the "filtering on precursor mass" section I noted that the lowest-index beam pred within tolerance is chosen, which makes sense specifically because confidence decreases with an increasing beam index - i.e. it is not an arbitrary choice to break out of the filtering loop when the first option meets tolerance) but nothing I couldn't work out easily enough.

As a side investigation, I had a look at the limits of the AUC scores both for the train set and the test set. Essentially I wanted to answer the question: with perfect selection of predictions from those available in the pred_beam_i columns, how large can my AUC be?

On the train set, I found a max AUC of 0.715 - so that's the highest AUC we can get without adding new prediction options into the dataset. Using Reference.csv as the ground truth, I found that the highest AUC we can get on the test set is 0.419. This leaves a max improvement of 0.065 in the train set and 0.093 in the test set if only filtering is used!

If I used the right methods to get there, I thought it might be useful to know since it limits the "legal" values that participants should be getting if they don't supplement the data. As a hackathon facilitator, I may lean towards suggesting data supplementation as a strategy with more potential than filtering, and encourage a deep dive into Prosit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback as a guinea pig #1

Feedback as a guinea pig #1

lbeyers commented Aug 28, 2024

Feedback as a guinea pig #1

Feedback as a guinea pig #1

Comments

lbeyers commented Aug 28, 2024