Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for BioASQ Task A (2013-2021) #261

Open
jason-fries opened this issue Mar 22, 2022 · 8 comments
Open

Create dataset loader for BioASQ Task A (2013-2021) #261

jason-fries opened this issue Mar 22, 2022 · 8 comments
Assignees
Labels
DUA Licence English Language QA Task

Comments

@jason-fries
Copy link
Member

Adding a Dataset

@mdmanurung
Copy link

#self-assign

@jason-fries
Copy link
Member Author

Hi @mdmanurung can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8. You can response to this comment or ping us on Slack or Discord.

No worries if you are not finished but still intend to work on this!

@balikasg
Copy link

balikasg commented Apr 10, 2022

@jason-fries
Before committing a question:

  • BioASQ Task A articles, while they are released every year, each year's set is a superset of previous years. Do we need a loader for all years here or just 2022? reference here. Also, the link for data in this description is wrong (we want the Task A data, you seem to already have added task B)

@hakunanatasha
Copy link
Collaborator

@balikasg Apologies on the incorrect link; this must have cropped up during our large compilation phase. Do you have the link to the most recent version?

Additionally - I suspect yes; there are possibly earlier bodies of work that reference an older version of the dataset. For reproducibility sake, it's nice to have a dataloader that has explicitly what we want, without needing to subset the earlier ones. That being said, thank you for letting us know about it.

Not to presumptively volunteer you, but if you think it may be useful to write a general dataloader for BioAsq based on earlier ones (with a changeable URL) or one for 2022, we would be very grateful!

@MFreidank
Copy link
Contributor

#self-assign

@MFreidank
Copy link
Contributor

MFreidank commented Apr 12, 2022

@jason-fries @hakunanatasha @galtay I noticed that a dataset bioasq.py already exists (implemented by @galtay), but only seems to implement task B.
Should I implement task A into this same file, or as a separate one (e.g. named biodatasets/bioasq_task_A)?

@jason-fries
Copy link
Member Author

Hi @MFreidank
Sorry for the delay in responding! We should rename bioasq.py to bioasq_task_b and use the full names for all the bioasq tasks. I would go with bioasq_task_a for this task.

@MFreidank
Copy link
Contributor

MFreidank commented Apr 20, 2022

Hi @jason-fries

Thank you for the clarification, I'll proceed that way (have most of the changes ready and working locally).
If I open a PR for it today, can we still make it count towards the hackathon? ;)

Took a bit longer than I had hoped, but I've now opened PR #518 .

@MFreidank MFreidank mentioned this issue Apr 25, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DUA Licence English Language QA Task
Projects
Status: PR in Progress
Development

No branches or pull requests

5 participants