Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codename: Discovery #968

Open
9 tasks
Ly0n opened this issue Nov 26, 2024 · 1 comment
Open
9 tasks

Codename: Discovery #968

Ly0n opened this issue Nov 26, 2024 · 1 comment
Assignees

Comments

@Ly0n
Copy link
Member

Ly0n commented Nov 26, 2024

Today @andrew and I discussed the creation of a new service. Using the OpenAlex API, we plan to analyse new Open Access papers for the keywords derived from OST. This will allow us to discover new open source software and data repositories. We discussed a newsfeed service that informs about new open source software and data published in sustainability and climate related topics. Here are some critical TODOs that we discussed:

  • Extracting Text from PDF Files Using OCR: We need to parse the content of open access papers efficiently and accurately to extract URLs from GitHub, Zenodo, Gitlab, Git..
  • We need to integrate the Zenodo API for further metadata.
  • Analyse and increase rate limits for OpenAlex.
  • A cool name for the service.
  • Some marketing slides.
  • A new namespace to develop this project.
  • A cool front end.
  • Further sponsors besides me.
  • Based on the current ~300 filtered keywords, we need to analyse how many Open Access articles are published on average per day.
@Ly0n Ly0n assigned andrew and Ly0n Nov 26, 2024
@Ly0n
Copy link
Member Author

Ly0n commented Dec 13, 2024

Thanks for the great meeting today @andrew. Here the "milestones" we agreed on:

  1. Create list of "Topics" already created by OpenAlex that match the OST Keywords. @Ly0n
  2. Create a simple feed of new open access papers discovered based on the OpenAlex Topics. @andrew
  3. Parse PDFs using GROBID to find relevant data and software repositories from various services.
  4. Create another feed that just shows new datasets and software repositories. @andrew
  5. Estimate services cost to process all papers released per day in a single day.
  6. Create a Front End with great UX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants