-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2021-09-28-notes.html
194 lines (192 loc) · 16.4 KB
/
2021-09-28-notes.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
attendees: "Sophie, Abigail, Nicolas, Alun, Silvia, Anton L., Anton B., Ana, Antoine, Uwe, Kat, Andy"
intro: "<h2>Feedback on LEAPS general assembly and next steps</h2>
<p>Andy and Sophie reported on the LEAPS general assembly on the 17th of Sept. Following the presentation of the main outcomes of ExPaNDS and PaNOSC, three main questions were asked to the LEAPS representatives:</p>
<ol>
<li>which ones of the <b>projects outcomes</b> are you planning to adopt at your facility and at what stage are you?</li>
<li>are you interested in a <b>PaN open data commons</b>? and if yes,</li>
<ol>
<li>would you rather have a decentralised system: all centres curating their data and only the central federated search API being maintained by the community</li>
<li>or a centralised system: 1 PaN data management entity dealing with all the individual centres open data</li>
<li>or a hybrid system: smaller centres confide their open data to other PaN centres who already curate hundreds of PB and have an open data portal</li>
</ol>
<li>should we write a <b>strategy paper on data for LEAPS</b>?</li>
</ol>
<p>The questionnaire and associated cost estimates as they were sent to LEAPS GA are attached in the <b>appendix</b> of these minutes. An answer is expected by this Friday.</p>
<p>Next step is for Andy and Patrick to present the engagement of LEAPS facilities in the sustainability of our projects outcomes based on the replies received, at the <b>LEAPS plenary on the 21st of October</b>. This will hopefully be echoed by both LEAPS and LENS representatives at our PaN EOSC symposium on the 26th of October.</p>
<p>A discussion followed on financial plans already decided for, on parallel data management fundings at other levels, the concern that this could lead to big decisions to be taken on short notice. But overall it was seen as very positive that this led to important internal discussions at our facilities.</p>
<p>On the open data commons topic, there’s a tendency to say the <b>hybrid solution</b> seems the most promising. Not all facilities would agree on a central storage, especially those who already built a data management infrastructure. But others will definitely be interested in having at least their published open data stored and curated, without having to start from scratch.</p>
<p>It was also raised that in the PaN community, we think about data storage, but not enough about data curation. Selecting <b>which data to store</b> is very important and there is yet no consensus / guideline in the community. For example, weighing the reproducibility of an experiment itself vs. keeping the data stored forever could be evaluated. Some experiments are more reproducible / expensive than others.</p>"
status-quo:
#WP1
- ""
#WP2
- "<p>Tasks organisation was discussed during the last WP2 monthly meeting. In particular, Nicolas is kicking off the work on data management.</p>
<p>Brian contributed to the INFRA-EOSC-5 projects workshop on Policies at the <b>OSFair</b>. The session was <a href='https://www.youtube.com/watch?v=efF32XcAWM8'>recorded and is available here</a>.</p>"
#WP3
- "<p>Following the last monthly management meeting with task leaders, tasks 3.4 and 3.5 are to be kicked-off tomorrow, including discussions on the interactions between those two.</p>
<p>HZDR discussed with B2FIND for the harvesting of their catalogue, which is in progress.</p>
<p><i>Note: the status of each facility <a href='https://github.com/ExPaNDS-eu/ExPaNDS/wiki/Delivering-data-services-to-EOSC#status-of-expands-facilities-towards-a-reachable-oai-pmh-endpoint'>is monitored here in GitHub</a> - any updates welcome.</i></p>"
#WP4
- "<p>The <b>workplan</b> is now picking up after a summer slowdown:</p>
<ul>
<li>onboarding existing services to EOSC,</li>
<li>connecting the same services to the federated PaN search portal under development by PaNOSC, </li>
<li>making analysis SW portable (jupyter and <a href='http://eosc-pan-git.desy.de/'>non-jupyter</a>),</li>
<li>exploring VISA,</li>
<li>documenting everything.</li>
</ul>
<p>Abigail mentioned an interesting discussion at STFC concerning the testing of services once they are registered in EOSC. It would be interesting to include this in the ongoing deliverable on the <b>testing framework</b>. This will be discussed with Zdenek, Anton, Franz and the rest of WP4.</p>
<p>Sophie also added that EOSC-Synergy offered us to present ExPaNDS as a use case for their SQAaaS framework during <a href='https://indico.egi.eu/event/5464/sessions/4746/#20211019'>this <b>EGI conference</b> workshop</a>. Talks are ongoing with Zdenek and Michael to see what we can present, in the spirit of the mid-term review demo. There is a meeting tomorrow morning.</p>"
#WP5
- "<p>The deliverable on the e-learning platform is under final review by Patrick and Kat. It will then be delivered to the EC.</p>
<p>Material on ontologies will soon be added to the catalogue, thanks to Sylvie.</p>
<p>Recent <b>enhancements to the training catalogue</b>:</p>
<ul>
<li>automatic upload,</li>
<li>setup webservices on Moodle to be able to scrap the content from the e-learning platform courses metadata back to the catalogue,</li>
<li>adding workflows from SOLEIL and HZDR from the reference datasets (in progress) + making them more visual,</li>
<li>UmbrellaID implementation being tested by Giuseppe at EGI, feedback for next meeting at the end of Oct.,</li>
<li>normalisation and curation of metadata: compiling ideas now.</li>
</ul>
<p>HZDR is looking for a speaker for a use case lightning talk at the symposium, the previous speaker foreseen is unfortunately not available that day.</p>
<p>Uwe also invited other WPs to add workflows to the platform (<a href='https://pan-training.hzdr.de/workflows/laservis'>see example here</a>), starting with the reference datasets but also including cross-WP work.</p>"
#WP6
- "<p>The <b>librarian symposium</b> is now upcoming this Thursday. It has 63 registrants at this stage. It was well cascaded at each facility, notably thanks to European heads of comms meeting.</p>
<p>Kat also secured a <b>10 minutes slot for ExPaNDS to present at the HZB user meeting</b> on the 9th of Dec. The topic / presenter will be discussed with Patrick. Discussions are also ongoing for a possible talk at the <a href='https://www.sxns16.org/'><b>SXNS 2022</b></a> in January in Lund.</p>"
aob: "<p>All thanked Sylvie for the great collaboration.</p>
<p>Sophie informed the PEB about the <b>PaNOSC Copenhagen workshop</b> planned for 29-30/11/2021 to discuss, face-to-face, the remaining period objectives among PaNOSC WP leaders. Sophie and Patrick will be “representing” ExPaNDS there and will ask for input in due time.</p>
<h1>APPENDIX</h1>
<h1>LEAPS Data Strategy – Questions to LEAPS General Assembly</h1>
<h2>Introduction</h2>
<p>The outcomes and future of the two EOSC projects <a href='https://panosc.eu'>PaNOSC</a> and <a href='https://expands.eu'>ExPaNDS</a> were presented at the LEAPS GA on 17/9/2021. <a href='https://cloud.esrf.fr/s/j66Nwazp4PZdeHH'>A copy of the presentation can be found here</a>. The two projects include almost all members of LEAPS and LENS and therefore the future of the two projects depends on LEAPS and LENS adopting the outcomes of these projects. The presenters posed three questions concerning the adoption of the outcomes and the future data strategy of LEAPS. The LEAPS AG requested some time to reflect on the answers and to have a summary of the costs involved. This document briefly summarises the questions and presents an estimate of the costs involved in adopting the solutions as requested. <i>Note: the costs of sustaining the solutions over the long term costs are not included in these estimates (see WP7 of PaNOSC for an estimate of these costs).</i></p>
<p>The answers from the GA will be presented at the Plenary session and discussed at the PaNOSC and ExPaNDS Symposium in October.</p>
<h2>Adopting the PaNOSC and ExPaNDS outcomes</h2>
<p>The following table give figures estimating the actions and effort required to implement the outcomes of PaNOSC and ExPaNDS. The time estimates are approximate based on the assumption that facilities build on existing solutions developed by PaNOSC and ExPaNDS and do not start from scratch. The figures are approximations and will vary from site to site depending on local resources available and priorities.</p>
<p><b>Almost all the outcomes require that facilities have a Data Manager dedicated to scientific data management, this is still not the case at many facilities which makes the adoption of the outcomes more difficult.</b></p>
<p>The following abbreviations have been used: DM = Data Manager, DP = Data Policy, PM = Person Months, UO = User Office</p>
<table>
<tr>
<th>Outcome</th>
<th>Specific Actions</th>
<th>Resources to adopt Outcome</th>
<th>Comments</th>
</tr>
<tr>
<td>1. <b>FAIR data policy</b></td>
<td>Adopt or modify existing data policies to be FAIR.</td>
<td>2 PMs of a Data Manager + Consultation with Scientists + Management</td>
<td>Management support to prepare and present DP to governing bodies.</td>
</tr>
<tr>
<td>2. <b>Data Management Plans (DMPs)</b></td>
<td>Implement DMPs for users based on outcomes of PaNOSC + ExPaNDS.</td>
<td>6 PMs of a Data Manager</td>
<td>Implement DMP solution and integrate it into the UO workflow.</td>
</tr>
<tr>
<td>3. <b>FAIR assessment</b> and common <b>PID</b> framework</td>
<td>Implement Digital Object Identifiers (DOIs) for data. Setup a WG to assess FAIRness of data.</td>
<td>12 PMs of a Data Manager</td>
<td>Setup a contract with Datacite and implement a data repository for registering data. This assumes adopting one the existing data repositories + having access to an infrastructure for archiving data.</td>
</tr>
<tr>
<td>4. Standardised metadata (<b>Nexus/HDF5</b>, PaN ontologies)</td>
<td>Adopt Nexus/HDF5 and produce data following the Nexus conventions.</td>
<td>12 PMs of a controls engineer. 2 PMs per technique for data scientists.</td>
<td>Requires adding support for Nexus/HDF5 to the control system</td>
</tr>
<tr>
<td>5. <b>Federated search API</b> for PaN data catalogues</td>
<td>Implement the PaNOSC search API as a service for the local data repository.</td>
<td>6 PMs of a Data Manager for a non-standard catalogue.<br>2 PMs of a Data Manager for one of the supported catalogues.</td>
<td>Need to implement the search API if the local repository is not a standard one.</td>
</tr>
<tr>
<td>6. <b>Open Data portal</b> for searching + downloading data</td>
<td>Implement a metadata catalogue and data repository to upload, search and download open data.</td>
<td>24 PMs of a Data Manager + 12 PMs of an IT infrastructure engineer.<br>10-20 k€/PB/y of archived data.<br>(Amazon cold storage costs 12 k€/PB/y but does not include the local manpower required to integrate AWS in the local infrastructure.)</td>
<td>Need to adapt an existing solution locally. Cost for archiving is a rough estimate for long term storage.</td>
</tr>
<tr>
<td>7. Community <b>AAI UmbrellaId</b></td>
<td>Implement the PaN community AAI UmbrellaId based on eduTEAMS.</td>
<td>6 PMs of an IT engineer.<br>1 PM per new data service.</td>
<td>Implement the latest version of UmbrellaId locally for data services.</td>
</tr>
<tr>
<td>8. <b>JupyterLab notebooks</b> and HDF5/NeXus files visualisation</td>
<td>Implement a Jupyterlab instance for remote data viewing and interactive analysis.</td>
<td>6 PMs of a software engineer and 6 PMs of an IT infrastructure engineer.</td>
<td>PaNOSC has packaged jupyterlab for slurm and developed h5web for viewing HDF5.</td>
</tr>
<tr>
<td>9. <b>Remote data analysis</b> with VISA + data analysis pipelines</td>
<td>Implement the VISA remote analysis service.</td>
<td>12 PMs of an IT infrastructure engineer and 6 PMs of a Data Manager.</td>
<td></td>
</tr>
<tr>
<td>10. <b>Simulation</b> software for simulating experimental data (SIMEX)</td>
<td>Adopt and install the SIMEX simulation software.</td>
<td>6 PMs of a software engineer.</td>
<td></td>
</tr>
<tr>
<td>11. <b>PaN-learning</b> platform (pan-learning.org + pan-training.org)</td>
<td>Use and upload new training material to the PaN-learning platform.</td>
<td>1 PM of a trainer to prepare training material per new topic.</td>
<td></td>
</tr>
</table>
<p>It would be useful to know what the state of the outcomes are at your facility :</p>
<ul>
<li>Yes, already adopted (Y)</li>
<li>Not Planning to be adopted (N)</li>
<li>In progress of being adopted (WIP)</li>
<li>Planned to be adopted (P)</li>
<li>Under evaluation (U)</li>
</ul>
<strong>Q1: LEAPS GA members are requested to fill out the table below.</strong>
<table>
<tr>
<th>Facility</th>
<th>FAIR Data Policy</th>
<th>DMPs</th>
<th>DOIs</th>
<th>Nexus HDF5</th>
<th>Search API</th>
<th>Open Data Portal</th>
<th>AAI</th>
<th>Jupyter Lab</th>
<th>VISA</th>
<th>SIMEX</th>
<th>PaN training</th>
</tr>
<tr>
<td>Your facility</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
<h2>PaN Open Data Commons</h2>
<p>The visibility and impact of data from the LEAPS facilities could be greatly enhanced by exposing them via a common portal which can search for data across all facilities. This would make the data easier to find to access (F and A of FAIR) and make them more re-used (R of FAIR). The impact of LEAPS data would be easier to measure which would could lead to more funding for LEAPS thanks to a common branding. It would also contribute to implementing Open Science across the LEAPS facilities. The LEAPS data would be integrated into EOSC through the common portal. This approach is common in biology (ELIXIR), environment (ENVRI), sociology (SSHOC) and astronomy (Virtual Observatory). </p>
<p>The question was posed at the LEAPS GA of 17 September 2021 if the vision of a common portal was shared by the members of the LEAPS GA. Before answering the question, the LEAPS partners requested to have more information on the potential cost of implementing such a portal. Here are some rough estimates for different scenarios:</p>
<p> <b>Scenario 1</b> – all LEAPS facilities implement an open data portal locally (see table above for estimates of costs) including the federated search API. In this case the cost involved for a PaN Open Data commons is implementing and maintaining the common portal, linking it to the individual data portals, linking it to the EOSC and maintaining it including making improvements to the search algorithms. The minimum cost additional would be to have a DM working part-time to deploy and maintain the common portal connected to the local portals. The DM would be hired by one of the facilities and paid for collectively by a collaboration contract.</p>
<p> <b>Scenario 2</b> – not all LEAPS facilities implement an open data portal locally. Some of them use the common data portal to upload and store data centrally. In this scenario the common cost is higher because it includes the cost for the infrastructure (10-20 k/PB/y of archived data) for storing the common data, 1 IT engineer and 1 DM per year.</p>
<p> <b>Scenario 3</b> – would be a hybrid solution between 1 and 2. Some LEAPS facilities would implement an open data portal locally while others would rely on the common portal to upload and store data centrally. In this scenario the cost is lower than 2 because there would be fewer PBs to be archived centrally. It still includes the cost for the infrastructure (10-20 k/PB/y of archived data) for storing the common data, 1 IT engineer and 1 DM per year.</p>
<strong>Q2: Does LEAPS GA members share the vision of a common open data portal for open data from LEAPS facilities?</strong>
<strong>Q2-bis: In case the answer to Q2 is yes, which scenario do you prefer?</strong>
<h2>A LEAPS Open Science and Data Strategy paper</h2>
<p>The vision of PaNOSC and ExPaNDS is ambitious and the answers to the above questions could be laid out in more detail in a common LEAPS strategy paper on Open Science and Data. The question to the LEAPS GA is whether there a common agreement to write such a paper and would they contribute to and/or endorse it?</p>
<strong>Q3: Do you want a common LEAPS strategy paper on Open Science and Data and would you contribute to it?</strong>"
---