Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a community we would like to know the best-practices for archiving OSF #62

Open
14 tasks
mjy opened this issue Dec 15, 2023 · 5 comments
Open
14 tasks
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@mjy
Copy link
Contributor

mjy commented Dec 15, 2023

OSF is seeking to provide an example of best-practices defined by the COL. One of those best-practices is "archving".

  • What does COL archiving entail?

The two targets for archiving I feel we should pursue first, in order, are

  • Zenodo
  • Internet Archive

Additional exploration could target GItHub's policy on archiving.

Can we

  • Identify the required data/software products to Archive
  • Identify the metadata required. What is shared, what differs b/w Zenodo and Internet Archive
  • Define/describe the process of creating the archive
  • Define/describe a SOP for updating the archive(s)

Once the archive is live

  • Make use of the discoverability the archive adds to link back to
    • OSF pages
    • THe corersponding TaxonWorks API
    • Resources like this issue tracker
  • Ensure the DOIs, and other identifiers associated with the archive and integrated back into our products
    • The data in TW itself
    • Webpages like the OSF
@debpaul
Copy link
Contributor

debpaul commented Dec 15, 2023

Just starting to delve.

Zenodo FAQ has some information on metadata requirements and standards, for example.

COL archiving.

  1. See COL Archive Repository
  2. note that trying to access and understand a COL archive, the downloads I tried, failed.

From GitHub see Referencing and citing content

To make your repositories easier to reference in academic literature, you can create persistent identifiers, also known as Digital Object Identifiers (DOIs). You can use the data archiving tool Zenodo to archive a repository on GitHub.com and issue a DOI for the archive.
Zenodo archives your repository and issues a new DOI each time you create a new GitHub release. Follow the steps at "Managing releases in a repository" to create a new one.

  • considering the above and the rate at which our projects might do a "release" keep in mind that

You can create releases to bundle and deliver iterations of a project to users.

@MMCigliano
Copy link
Collaborator

If downloads from COL Archive Repository works, maybe this would be the natural repository.
We could export to ChecklistBank in a more frequent period (4 times a year)

@debpaul
Copy link
Contributor

debpaul commented Dec 15, 2023

If downloads from COL Archive Repository works, maybe this would be the natural repository. We could export to ChecklistBank in a more frequent period (4 times a year)

@MMCigliano I don't know (since it didn't work). I can't see what it contains.

Also, considering points made by @mjy about connecting "the data" and "the web pages" we need to see what's inside the COL to judge if it might work. For example, I can't see "the metadata" included in the COL file. More research needed. I'll be posting a ticket to the COL github repo to see if Markus Doering can fix the download.

@debpaul
Copy link
Contributor

debpaul commented Dec 15, 2023

Update @mjy @MMCigliano for some reason (unknown to me): download of COL archives works for Geoff, fails on my computer. So you might both try (so we can "see" what's in them). And I'll suss out more later, why not working for me.

@debpaul debpaul added enhancement New feature or request question Further information is requested labels Jan 9, 2024
@klausriede
Copy link

redundancy in data storage and archiving is always good. However, just a reminder that OSF exceeds Zenodo and COL by far, reaching back to the 1990s when Daniel Otte started to establish a digital list of types and references. I think that this supremacy in data continuity should be maintained. It might be complemented by strategies of offline longterm digital storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants