CallNotes-SoftLanuch-process1-3.md 16.2 KB
Newer Older
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
# Data sharing decision tree

## Open WIN Community Feedback

-----

**Important information**

**Where**: Teams

**When**: Friday 30th July 2021, 11:30-13:00 BST (UTC+1)

**Contact**: Email Cass:
[*cassandra.gouldcanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldcanpraag@psych.ox.ac.uk)
or message on Open WIN Slack \#data-sharing-decision-tree

**Material we will be reviewing**:

-   Decision tree:
    https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md
-   Appendices:
    https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md

-----

## Agenda

1.  Introductions
2.  Participation guidelines
3.  Using this document

    a.  Add your name if you'd like to be listed as a contributor
    b.  "+1" where you agree
    c.  All write everywhere! Individual comments can be anonymous.

4.  General considerations
5.  Feedback on each step
6.  New repository issues (big jobs!)
7.  Feedback on the day

## Participants

(Name / Pronouns / Department / GitLab user ID - or "none"

1.  Cassandra Gould van Praag / she/her / Psychiatry / [@cassag](https://git.fmrib.ox.ac.uk/cassag)
2.  Mats van Es / he/him / Psychiatry / [@psyc1435](https://git.fmrib.ox.ac.uk/psyc1435)
3.  Alon Baram / he/him / FMRIB (WIN/NDCN) / none (using GitHub)
4.  Benjamin Tendler / He/Him / WIN / [@btendler](https://git.fmrib.ox.ac.uk/btendler)
5.  Thijs de Buck / he/him / FMRIB / [@ndcn0873](https://git.fmrib.ox.ac.uk/ndcn0873)
6.  Christoph Arthofer / He/Him / WIN / [@cart](https://git.fmrib.ox.ac.uk/cart)
7.  Jessica Walsh / she/her / WIN / [@ndcn1073](https://git.fmrib.ox.ac.uk/ndcn1073)
8.  Michiel Cottaar / he/him / FMRIB / [@ndcn0236](https://git.fmrib.ox.ac.uk/ndcn0236)
9.  Ludovica Griffanti / She/her / WIN (Psych and NCN) / [@ludovica](https://git.fmrib.ox.ac.uk/ludovica)
10. Tom Whyntie / he/him / Dept. Oncology / [@twhyntie](https://git.fmrib.ox.ac.uk/twhyntie)


## [Participation Guidelines](https://open.win.ox.ac.uk/pages/open-science/community/Open-WIN-Community/docs/community/CODE_OF_CONDUCT/)

-   We value the participation of every member of our community and want to ensure that every contributor has an enjoyable and fulfilling experience. Please show respect and courtesy to other community members at all times.
-   We are dedicated to a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, religion, politics or technology choices. We do not tolerate harassment by and/or of members of our community in any form.
-   We fall under the formal policy and reporting guidelines of the [*University Bullying and Harassment Policy*](https://edu.admin.ox.ac.uk/harassment-policy) and we expect everyone to be a [*responsible bystander*](https://edu.web.ox.ac.uk/bystander)

## General questions to consider

-   How do you feel about data sharing? How does your PI feel about data
    sharing?

    -  Mats: Great! We should all do it! +1 +1
          My PI, I don't know. We mostly use other people's data, so there's not a lot of data collection. I think most data are shared with
          collaborators directly, but not openly.
    - Ludo: would be great, but realised not easy. (technical) PI: sure, share it yesterday! Clinical PI: I want to know who/when/what/why people use it and then maybe share it.
      - Alon: +100
      - Example from email: "it is fine to share the images provided the following points are satisfied. 1) Images are defaced 2) All identifiable information (including DOB, sex, height and weight) removed and that you can guarantee that no identifiers are left behind. 3) There is a log of who accesses the images and permission/password to access is provided by us and renewed frequently."
    - Benjamin Tendler: Is this a choice that a PI/member should be making? We are actively encouraged by Wellcome to share data: "All Wellcome-funded researchers are expected to manage their research outputs in a way that will achieve the greatest health benefit, maximising the availability of research data, software and materials with as few restrictions as possible".

**Imagine you are a WIN researcher asking "*Can I share my data?*", or "*How can I share my data?*"**

The "decision tree" document and appendices should guide you through the necessary stages to prepare your data for sharing.

**This is not a trivial process, but the purpose of this guide is to help you, not scare you!**

Today we are not writing a "user guide", simply determining whether the plans we have developed will work for you, if we have missed anything, or if there is anything we need to do to support you further.

**You are the ones who will be sharing and receiving credit for your
data, so we want to make it easy for you!**

## General questions for the materials

-   How (in what format(s)) would you like to engage with this material?
    -   Do you expect to go through the whole guide in one sitting?
    -   Would you like to take notes against it "online"?
    -   Should it be hierarchical (see only the top most level then deeper) or would you like to see all at once?
    -   Would you like a glossary or FAQ (list below any terms which might be unfamiliar to the average researcher)?
-   What would make you want to engage or run away from this material?!

## Feedback on each step

-   While we're working through each step:
    -   Do you know how to do this thing?
    -   Does this flow work for your data?
        -   Current knowledge gap around secondary data analysis!

### Process 1: Data management, data security and ethics

This process mostly contained actions which are required for all research, irrespective of whether the data are intended to be shared. They are included here to highlight the importance of these stages in managing the additional risks where data are intended to be shared.

Add below any comments about each of the steps in this process.

-   Data management plan
  - Who has done a DMP: -1-1 +1 -1 -1-1 -1-1 (Yes = 1, No = 7)
  - TW: +1 - DMPs are mandatory for clinical trials
  - DMP created for applying for a fellowship. +1(small section)
  - Written for protocol for clinical ethics committees
  - Most DMPs seem to be included in ethics documents - I've never had to write one because of the Technical Development clearance, but it might be that there's a DMP included in there which I'm not aware of?
  - No-one had been on data management training from University. Didn't know it exists.
  - It would be useful to provide an example Data management plan. For example, a lot of data at the WIN will be sharing in vivo imaging data, which is being stored using similar resources (e.g. FMRIB cluster). This 'standard' pipeline could be provided as a template, and then edited for more specific cases. +100


-   Researcher data security training
  - Who has completed in the last year: +1 -1 +1 (Yes = 2, No = 1, rest unsure)
  - Like the links to things (e.g. training profile), to make it easier to find. Suggest adding the exact name of the course to search for


-   DPIA screening
  - Not sure whether it is me who has to take care of this, or my PI?
  - Only if you acquire new data or if you deal with any data? Primary or secondary data acquired from a person → needs further assessment
  - For any study, not only if you share data
  - Who has completed one: -1 -1 +1 -1 -1 -1 (Yes = 1, No = 5)
  - Can we split this [the whole decision tree] off into "PI level actions" and "researcher level"? Would make it easier to know what you should be concerned about.


-   Ethics approval
    - Discuss retrospective ethics approval - \*think\* if you don\'t' already have consent to share openly, you need consent to recontact your participants and consent them to sharing! → is a simple e-mail ("I consent to ...") sufficient, or is there a certain form to be be completed?
        - Comment that you do not need consent to share if it has been completely anonymised. When is it possible to make it 100% anonymous?
        - Secondary analysis output: Aggregation of data, minimum number of subjects?
        - Clarity on what (if any) data can be shared without specific ethical approval. For example, methods development scans which are routine at WIN and can contain \*no\* clinical information.

-   Participant consent

    -   Relying on the lab to have a sufficiently audible consent process to remove (jnot share) participants which do not give consent for sharing.

    -   Maybe consent forms (and patient information sheets) need to be explicit about which parts of the data (e.g. reaction time vs. pupillometry) are anonymous by default.

    -   Example wording for consent forms and participant information sheets, and DMP. +1+1+1

        -   Caution that it can stop people thinking if they are just
            copy-pasting.

    -   How detailed do you need to be with participants about what data will be shared. Maybe we need to do some patient and public involvement (PPI) to create standard.

    -   Probably lots of stuff we can share from biobank e.g. [this information sheet](https://www.ukbiobank.ac.uk/explore-your-participation/contribute-further/serology-study/information-sheet).

    -   If you move to "on request", add that you should then draw up a contract with the intended recipient of the data. That contract might be only a slight deviation from the standard DUA we are developing here for XNAT.
    -   Can a material transfer agreement (MTA) be used for sharing MRI data? Can also cover digital data. There are some stock templates about what you can and can't do.

    -   Are we putting limits on what people can and can't do with the shared data, e.g. non-commercial use, "research only", specific acknowledgement statements from funders.... Come back to Ben to find out more!\ → At the Donders Repo, you have to choose a license when sharing your data (options are various existing licenses, or a general Donders license

    -   Collaboration agreements are being designed with the Trust, which has some nice statements which might be relevant. → At the Donders Repo, when you want to access data, you have to agree with a user agreement which also specifies you won't try do de-anonymise the data - we will cover this in the final section of the processes :)

Other questions:

-   Is it clear that the burden of data governance exists irrespective of sharing plans? You do not have to additionally formulate governance plans if you choose to share your data, just include sharing specific descriptions. Would you like "governance anyway" steps to be identified separately?

### Process 2: Protected features of the data

This process identifies suitable repositories for different data types based on wether they contain information which is protected under UK GDPR.

Add below any comments about each of the steps in this process.

-   Protected data

    -   Ex-vivo data form the Oxford Brain Bank (for example) - you will need to check with them regarding what you can do. For example, very rare diseases may make the tissue sample very identifiable. Consent for Oxford brain bank has specific levels of sharing.

    -   Also some restrictions with non-human data - Benjamin Tendler: This can relate to data security (for example, not attempting to contact the source of brains)

    -   Ben and Rogier should be brought in on these discussions.

-   Primary data
-   Supplementary data
-   Bin data

    -   When you have many more variables than subjects, they can all be unique combinations. If you have to bin these data, they may not fit with what was necessary for the analysis. Compromise between the granularity you need for the analysis vs. the level appropriate for de-identified sharing.

    -   This is only a recommended step: show you have done your due diligence.

    -   Think about what a data breach would look like.

    -   It's the combination of these values (disease status, age, gender, handedness) which can make someone identifiable.

    -   Does binning apply to brains too?

    -   Give explicit list of things which are ok to be shared without MTAs etc.

    -   Give examples of linkage attack data.

Other questions:

-   Do you use any other data sharing sites (especially for sources other than human MRI)?

### Process 3: De-identification

Personal data can not be made fully anonymous, only "de-identified" to the best of our practical ability. This process describes appropriate de-identification steps to take.

Add below any comments about each of the steps in this process.

-   Participant identifiers

    -   Where do you currently store your participant identifiers? Where would you store them if you were going to share data?

    -   For clinical studies, this is standard practice. In Oxford sponsored study =\> oxford holds key. Trust sponsored =\> trust holds key. Contract then says "you will never share that key".

    -   Calpendo may store key. Check that calpendo no-longer holds names. Can see the name and the WIN ID (and time of scan) next to eachother.

    -   We recommend not sharing the scan ID, to break the link between the data that WIN hold. You could share your own generated IDs, or you could randomise.

    -   Oncology policy to randomise again before sharing. Cass look into this policy.

    -   Biobank rescrambles every time the data is shared (by contract) with each new share. But can request bridging files across applications.

    -   Scrambling can make it hard when different recipients of the data the want to recombine.

-   Bin data
-   NWB:N

    -   Not familiar with this\...

-   Unique dicom fields

    -   Do you routinely or infrequently rely on any of these fields? Which would be problematic to remove? Any that should be kept for everyone?

    -   How would you like scrubbing to be applied? A default set of fields removed with optional extras (opt in), or all fields by default with option to keep some (opt out). At what stage of the process? Before upload to XNAT or after?

    -   How much date information would you like to keep (for example to handle a "product recall" scenario if a scanner fault was detected)?

    -   Patient size and weight are in there for the safety of the MR system. Did you know they were in there? Do we need to educate about what is sensitive in headers?

    -   Risk that not all scanner manufacturers have the same tags. An anonymisation tool may miss some tags.

-   BIDS

    -   Converting to BIDS can be run on XNAT. Would your processed data be in BIDS already?


-   Raw vs. processed

    -   1\) K-space - what file type? Can also contain tags which appear in the dicom, or if you publish the values which are measured, they don't contain protected info.

    -   2\) reconstructed unprocessed (dicom)

    -   3\) nii? Not "processed" as it constraints the same data as the dicom

    -   More about the information which is included rather than the level of "processing" - this is about the branches which we use to define/name the paths. General agreement the naming on the file format would work.

    -   Look explicitly at the type of data that people want to share with methods dev work. Cannot deface k-space.

    -   What do we need to add to WIN technical development ethics to make it fit e.g. with sharing, DPIA?

-   Structural data


-   Defacing

    -   The face is usually critical for coregistration. What would be the advice; sharing the defaced structural + the coregistration info (that was derived from the complete structural)?


-   Unique .json fields


-   Quality control

    -   Mriqc can be run on XNAT. Would you run mriqc before your own analysis anyway?

-   Face structure
-   [Reconstruction of defaced data](https://arxiv.org/pdf/1810.06455.pdf)

Other questions:

-   How would you work with XNAT alongside your data collection and processing (jalapeno) workflows?


-------

## Post-meeting summary

-   

-------

## Feedback on the meeting

-   Please take a few minutes to tell us how this day went for you! Your feedback is invaluable to making this community and these events work.
-   You are also welcome to email feedback to
    [*cassandra.gouldvanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)

*30th July 2021*

-   What worked?

    -   

-   What didn't work?

    -   

-   What would you change?

    -   

-   What surprised you?

    -