You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since it's multiple choice with four options, random guessing should give at least 25%.
Have you released the outputs from your evaluation runs anywhere?
The text was updated successfully, but these errors were encountered:
Thanks for the work on this benchmark.
I was wondering why the baseline accuracies on code.Debug are so low.
Since it's multiple choice with four options, random guessing should give at least 25%.
Have you released the outputs from your evaluation runs anywhere?
The text was updated successfully, but these errors were encountered: