You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not an issue, just feedback as someone who has never worked with this type of data before.
I find the notebook easy to follow given the explanation I received. It takes some time to get into it, and I added comments for myself locally, (to be specific, in the "filtering on precursor mass" section I noted that the lowest-index beam pred within tolerance is chosen, which makes sense specifically because confidence decreases with an increasing beam index - i.e. it is not an arbitrary choice to break out of the filtering loop when the first option meets tolerance) but nothing I couldn't work out easily enough.
As a side investigation, I had a look at the limits of the AUC scores both for the train set and the test set. Essentially I wanted to answer the question: with perfect selection of predictions from those available in the pred_beam_i columns, how large can my AUC be?
On the train set, I found a max AUC of 0.715 - so that's the highest AUC we can get without adding new prediction options into the dataset. Using Reference.csv as the ground truth, I found that the highest AUC we can get on the test set is 0.419. This leaves a max improvement of 0.065 in the train set and 0.093 in the test set if only filtering is used!
If I used the right methods to get there, I thought it might be useful to know since it limits the "legal" values that participants should be getting if they don't supplement the data. As a hackathon facilitator, I may lean towards suggesting data supplementation as a strategy with more potential than filtering, and encourage a deep dive into Prosit.
The text was updated successfully, but these errors were encountered:
Not an issue, just feedback as someone who has never worked with this type of data before.
I find the notebook easy to follow given the explanation I received. It takes some time to get into it, and I added comments for myself locally, (to be specific, in the "filtering on precursor mass" section I noted that the lowest-index beam pred within tolerance is chosen, which makes sense specifically because confidence decreases with an increasing beam index - i.e. it is not an arbitrary choice to break out of the filtering loop when the first option meets tolerance) but nothing I couldn't work out easily enough.
As a side investigation, I had a look at the limits of the AUC scores both for the train set and the test set. Essentially I wanted to answer the question: with perfect selection of predictions from those available in the pred_beam_i columns, how large can my AUC be?
On the train set, I found a max AUC of 0.715 - so that's the highest AUC we can get without adding new prediction options into the dataset. Using Reference.csv as the ground truth, I found that the highest AUC we can get on the test set is 0.419. This leaves a max improvement of 0.065 in the train set and 0.093 in the test set if only filtering is used!
If I used the right methods to get there, I thought it might be useful to know since it limits the "legal" values that participants should be getting if they don't supplement the data. As a hackathon facilitator, I may lean towards suggesting data supplementation as a strategy with more potential than filtering, and encourage a deep dive into Prosit.
The text was updated successfully, but these errors were encountered: