Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: data structure #11

Open
vancauwe opened this issue Jan 22, 2025 · 7 comments
Open

chore: data structure #11

vancauwe opened this issue Jan 22, 2025 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed question Further information is requested

Comments

@vancauwe
Copy link
Contributor

vancauwe commented Jan 22, 2025

To define the Zarr hierarchy, a fine-grained understanding of the data structure is necessary.

The following class diagram aims to show the different levels of metadata and data in the Cat+ project and how they connect to one another.

The objects and names are referencing this analytical workflow diagram: https://drive.google.com/drive/u/0/folders/1yFUpIsnpPJRgIskudSVJviBfy9Te6MUQ

The resulting Zarr hierarchy would be as follows:

campaign /
    metadata-campaign-HCI
    metadata-all
    batch / 
      metadata-batch-HCI
      sample1/
          metadata-synthesis
          metadata-agilent-LC
          peak1/
             peak1-UV/
                metdata UV
                data-UV (array)
             peak1-IR/
                metadata-IR
                data-IR (array)
          ...
          peak2/
             data-agilest-SFC
      sample2/
          ...
classDiagram

Campaign --> Batch : generates
Campaign: Data Package
Campaign: Metadata | HCI
Campaign: unsure of difference between Campaign and Batch ? 

Batch: Metadata = HCI
Batch: Metadata | hasCampaign = Campaign-URI

Batch --> Sample1 
Batch --> Sample2 
Batch --> Sample3
Sample1: Metadata | hasBatch = Batch-URI
Sample1: Metadata | hasAction1 (Synthesis)
Sample1: Metadata | hasAction2 (Synthesis)
Sample1: Metadata | Agilent

Sample2: Metadata | hasBatch = Batch-URI
Sample2: Metadata | hasAction3 (Synthesis)
Sample2: Metadata | hasAction4 (Synthesis)
Sample2: Metadata | Agilent

Sample3: Metadata | hasBatch = Batch-URI
Sample3: Metadata | hasAction5 (Synthesis)
Sample3: Metadata | hasAction6 (Synthesis)
Sample3: Metadata | Agilent

Sample1 --> Peak1: identified as NEW
Peak1: Metadata | hasSample = Sample1-URI
Peak1: Metadata |  isKnown = False
Peak1: Metadata | Agilent
Peak1: Data | Agilent
Peak1: Metadata | hasAction (Bravo)

Sample1 --> Peak2: identified as KNOWN
Peak2: Metadata | hasSample = Sample1-URI
Peak2 : Metadata |  isKnown = True
Peak2: Metadata | isChemical = the chemical it was identified as

Peak1 --> Peak1-IM-Q-TOF : analyzed
Peak1 --> Peak1-IR : analyzed
Peak1 --> Peak1-UV : analyzed
Peak1 --> Peak1-NMR : analyzed

Peak1-IM-Q-TOF : Metadata |  hasPeak = Peak1-URI
Peak1-IM-Q-TOF : Metadata | Agilent IM-Q-TOF
Peak1-IM-Q-TOF : Data | Agilent IM-Q-TOF

Peak1-IR : Metadata | hasPeak = Peak1-URI
Peak1-IR : Metadata | IR
Peak1-IR : Data | IR

Peak1-UV : Metadata |  hasPeak = Peak1-URI
Peak1-UV : Metadata | UV
Peak1-UV : Data | UV

Peak1-NMR : Metadata | hasPeak = Peak1-URI
Peak1-NMR : Metadata | NMR
Peak1-NMR : Data | NMR

Loading
@vancauwe vancauwe added documentation Improvements or additions to documentation help wanted Extra attention is needed question Further information is requested labels Jan 22, 2025
@vancauwe vancauwe added this to cat+ Jan 22, 2025
@rmfranken
Copy link
Member

rmfranken commented Jan 27, 2025

I have a diagram too 😎 This one is auto-generated based on the shapes of my main branch. I am missing "Action" , "Chemical" and "Obeservation" as some important concepts to begin with. I'm guessing the Zarr hierarchy should connect pretty 1-1 to the ontology concepts + hierarchy right? Maybe we can discuss it tomorrow?

Image

@vancauwe
Copy link
Contributor Author

Image

I cannot see the image @rmfranken :)

@sabinem
Copy link
Member

sabinem commented Jan 28, 2025

@vancauwe , @rmfranken I get:

This page contains the following errors:
error on line 1 at column 79201: Double hyphen within comment: <!--SRC=[lHhPSjiuybrVmRNVTiO9Rv2gsgrLP2TnbQ-nFPcfcarDW
Below is a rendering of the page up to the first error.

And then I see different boxes from the ontology.

@rmfranken
Copy link
Member

aaa strange. It's a very large image, does it also not work if you click on it? Screenshotting it is not practical at this size i'm afraid

Image

@sabinem
Copy link
Member

sabinem commented Jan 30, 2025

So the image can be received by uploading the ontology here: https://shacl-play.sparna.fr/play/draw

@vancauwe
Copy link
Contributor Author

Meeting notes from Cat+ team:

  • A Peak is a Chemical
  • HCI metadata: for Batch and Campaign
  • Campaign’s Chemicals is pure metadata (not the same as Peaks). (“This campaign revolves around water”)
  • Synthesis metadata belongs to the Samples (Each sample has a synthesis)
  • AddAction: Sample1 and Sample2. Action is recursive.
  • Agilent file per Sample. 96 samples: 96 Agilent files and they each have 5 peaks. If unknown peak then all analysis are done.
  • After LC-DAD: Sample gets cut up into Peaks and the Peaks are known/unknown and it defines if further analysis is needed.

Example Data Access Scenarios:

  • Per Data Type:
    Chemist wants to do AI on NMR:
    Search for all NMR data but all metadata of each campaign (HCI)

  • Per Campaign:
    Chemist wants an entire experiment/campaign:
    Return the entire Data Package

@vancauwe
Copy link
Contributor Author

@rmfranken after thinking about it, I think the Actions in the Synthesis and the Bravo will be directly under their respective Samples and Peaks. (because we have 1 Synthesis file = 1 Sample, same as 1 Agilent file = 1 Sample).

The Synthesis file generates the Sample URI (ResultSample as we discussed).
The Agilent file generates the Peak URIs (there is a peak list).

From Bravo, each peak has Actions (tied to it through the peakIdentifier).

We will need a representation of IR, UV, and NMR though.

I edited the diagram above, let me know if it helps? else let's sit down with the ontology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

4 participants