CallNotes-SoftLanuch-process1-3.md 14.9 KB
Newer Older
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Data sharing decision tree

## Open WIN Community Feedback

-----

**Important information**

**Where**: Teams

**When**: Friday 30th July 2021, 11:30-13:00 BST (UTC+1)

**Contact**: Email Cass:
[*cassandra.gouldcanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldcanpraag@psych.ox.ac.uk)
or message on Open WIN Slack \#data-sharing-decision-tree

**Material we will be reviewing**:

-   Decision tree:
    https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md
-   Appendices:
    https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md

-----

## Agenda

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
28
29
30
1. Introductions
2. Participation guidelines
3. Using this document
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
31
32

    a.  Add your name if you'd like to be listed as a contributor
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
33

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
34
    b.  "+1" where you agree
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
35

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
36
37
    c.  All write everywhere! Individual comments can be anonymous.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
38
39
40
41
4. General considerations
5. Feedback on each step
6. New repository issues (big jobs!)
7. Feedback on the day
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
42
43
44
45
46

## Participants

(Name / Pronouns / Department / GitLab user ID - or "none"

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
47
48
49
50
51
52
53
54
55
1. Cassandra Gould van Praag / she/her / Psychiatry / [@cassag](https://git.fmrib.ox.ac.uk/cassag)
2. Mats van Es / he/him / Psychiatry / [@psyc1435](https://git.fmrib.ox.ac.uk/psyc1435)
3. Alon Baram / he/him / FMRIB (WIN/NDCN) / none (using GitHub)
4. Benjamin Tendler / He/Him / WIN / [@btendler](https://git.fmrib.ox.ac.uk/btendler)
5. Thijs de Buck / he/him / FMRIB / [@ndcn0873](https://git.fmrib.ox.ac.uk/ndcn0873)
6. Christoph Arthofer / He/Him / WIN / [@cart](https://git.fmrib.ox.ac.uk/cart)
7. Jessica Walsh / she/her / WIN / [@ndcn1073](https://git.fmrib.ox.ac.uk/ndcn1073)
8. Michiel Cottaar / he/him / FMRIB / [@ndcn0236](https://git.fmrib.ox.ac.uk/ndcn0236)
9. Ludovica Griffanti / She/her / WIN (Psych and NCN) / [@ludovica](https://git.fmrib.ox.ac.uk/ludovica)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
56
57
58
59
60
10. Tom Whyntie / he/him / Dept. Oncology / [@twhyntie](https://git.fmrib.ox.ac.uk/twhyntie)


## [Participation Guidelines](https://open.win.ox.ac.uk/pages/open-science/community/Open-WIN-Community/docs/community/CODE_OF_CONDUCT/)

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
61
62
63
- We value the participation of every member of our community and want to ensure that every contributor has an enjoyable and fulfilling experience. Please show respect and courtesy to other community members at all times.
- We are dedicated to a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, religion, politics or technology choices. We do not tolerate harassment by and/or of members of our community in any form.
- We fall under the formal policy and reporting guidelines of the [*University Bullying and Harassment Policy*](https://edu.admin.ox.ac.uk/harassment-policy) and we expect everyone to be a [*responsible bystander*](https://edu.web.ox.ac.uk/bystander)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
64
65
66

## General questions to consider

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
67
- How do you feel about data sharing? How does your PI feel about data
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
68
69
    sharing?

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
70
    - Mats: Great! We should all do it! +1 +1
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
          My PI, I don't know. We mostly use other people's data, so there's not a lot of data collection. I think most data are shared with
          collaborators directly, but not openly.
    - Ludo: would be great, but realised not easy. (technical) PI: sure, share it yesterday! Clinical PI: I want to know who/when/what/why people use it and then maybe share it.
      - Alon: +100
      - Example from email: "it is fine to share the images provided the following points are satisfied. 1) Images are defaced 2) All identifiable information (including DOB, sex, height and weight) removed and that you can guarantee that no identifiers are left behind. 3) There is a log of who accesses the images and permission/password to access is provided by us and renewed frequently."
    - Benjamin Tendler: Is this a choice that a PI/member should be making? We are actively encouraged by Wellcome to share data: "All Wellcome-funded researchers are expected to manage their research outputs in a way that will achieve the greatest health benefit, maximising the availability of research data, software and materials with as few restrictions as possible".

**Imagine you are a WIN researcher asking "*Can I share my data?*", or "*How can I share my data?*"**

The "decision tree" document and appendices should guide you through the necessary stages to prepare your data for sharing.

**This is not a trivial process, but the purpose of this guide is to help you, not scare you!**

Today we are not writing a "user guide", simply determining whether the plans we have developed will work for you, if we have missed anything, or if there is anything we need to do to support you further.

**You are the ones who will be sharing and receiving credit for your
data, so we want to make it easy for you!**

## Feedback on each step

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
91
92
93
94
- While we're working through each step:
    - Do you know how to do this thing?
    - Does this flow work for your data?
        - Current knowledge gap around secondary data analysis!
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
95
96
97
98
99
100
101

### Process 1: Data management, data security and ethics

This process mostly contained actions which are required for all research, irrespective of whether the data are intended to be shared. They are included here to highlight the importance of these stages in managing the additional risks where data are intended to be shared.

Add below any comments about each of the steps in this process.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
102
103
104
105
106
107
108
109
- Data management plan
    - Who has done a DMP: -1-1 +1 -1 -1-1 -1-1 (Yes = 1, No = 7)
    - TW: +1 - DMPs are mandatory for clinical trials
    - DMP created for applying for a fellowship. +1(small section)
    - Written for protocol for clinical ethics committees
    - Most DMPs seem to be included in ethics documents - I've never had to write one because of the Technical Development clearance, but it might be that there's a DMP included in there which I'm not aware of?
    - No-one had been on data management training from University. Didn't know it exists.
    - It would be useful to provide an example Data management plan. For example, a lot of data at the WIN will be sharing in vivo imaging data, which is being stored using similar resources (e.g. FMRIB cluster). This 'standard' pipeline could be provided as a template, and then edited for more specific cases. +100
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
110
111


Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
112
113
114
- Researcher data security training
    - Who has completed in the last year: +1 -1 +1 (Yes = 2, No = 1, rest unsure)
    - Like the links to things (e.g. training profile), to make it easier to find. Suggest adding the exact name of the course to search for
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
115
116


Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
117
118
119
120
121
122
- DPIA screening
    - Not sure whether it is me who has to take care of this, or my PI?
    - Only if you acquire new data or if you deal with any data? Primary or secondary data acquired from a person → needs further assessment
    - For any study, not only if you share data
    - Who has completed one: -1 -1 +1 -1 -1 -1 (Yes = 1, No = 5)
    - Can we split this [the whole decision tree] off into "PI level actions" and "researcher level"? Would make it easier to know what you should be concerned about.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
123
124


Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
125
- Ethics approval
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
126
127
128
129
130
    - Discuss retrospective ethics approval - \*think\* if you don\'t' already have consent to share openly, you need consent to recontact your participants and consent them to sharing! → is a simple e-mail ("I consent to ...") sufficient, or is there a certain form to be be completed?
        - Comment that you do not need consent to share if it has been completely anonymised. When is it possible to make it 100% anonymous?
        - Secondary analysis output: Aggregation of data, minimum number of subjects?
        - Clarity on what (if any) data can be shared without specific ethical approval. For example, methods development scans which are routine at WIN and can contain \*no\* clinical information.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
131
132
133
134
135
- Participant consent
    - Relying on the lab to have a sufficiently audible consent process to remove (jnot share) participants which do not give consent for sharing.
    - Maybe consent forms (and patient information sheets) need to be explicit about which parts of the data (e.g. reaction time vs. pupillometry) are anonymous by default.
    - Example wording for consent forms and participant information sheets, and DMP. +1+1+1
        - Caution that it can stop people thinking if they are just
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
136
            copy-pasting.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
137
138
139
140
141
142
    - How detailed do you need to be with participants about what data will be shared. Maybe we need to do some patient and public involvement (PPI) to create standard.
    - Probably lots of stuff we can share from biobank e.g. [this information sheet](https://www.ukbiobank.ac.uk/explore-your-participation/contribute-further/serology-study/information-sheet).
    - If you move to "on request", add that you should then draw up a contract with the intended recipient of the data. That contract might be only a slight deviation from the standard DUA we are developing here for XNAT.
    - Can a material transfer agreement (MTA) be used for sharing MRI data? Can also cover digital data. There are some stock templates about what you can and can't do.
    - Are we putting limits on what people can and can't do with the shared data, e.g. non-commercial use, "research only", specific acknowledgement statements from funders.... Come back to Ben to find out more!\ → At the Donders Repo, you have to choose a license when sharing your data (options are various existing licenses, or a general Donders license
    - Collaboration agreements are being designed with the Trust, which has some nice statements which might be relevant. → At the Donders Repo, when you want to access data, you have to agree with a user agreement which also specifies you won't try do de-anonymise the data - we will cover this in the final section of the processes :)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
143
144
145
146
147
148
149
150
151



### Process 2: Protected features of the data

This process identifies suitable repositories for different data types based on wether they contain information which is protected under UK GDPR.

Add below any comments about each of the steps in this process.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
152
153
154
155
156
157
158
159
160
161
162
163
164
165
- Protected data
    - Ex-vivo data form the Oxford Brain Bank (for example) - you will need to check with them regarding what you can do. For example, very rare diseases may make the tissue sample very identifiable. Consent for Oxford brain bank has specific levels of sharing.
    - Also some restrictions with non-human data - Benjamin Tendler: This can relate to data security (for example, not attempting to contact the source of brains)
    - Ben and Rogier should be brought in on these discussions.
- Primary data
- Supplementary data
- Bin data
    - When you have many more variables than subjects, they can all be unique combinations. If you have to bin these data, they may not fit with what was necessary for the analysis. Compromise between the granularity you need for the analysis vs. the level appropriate for de-identified sharing.
    - This is only a recommended step: show you have done your due diligence.
    - Think about what a data breach would look like.
    - It's the combination of these values (disease status, age, gender, handedness) which can make someone identifiable.
    - Does binning apply to brains too?
    - Give explicit list of things which are ok to be shared without MTAs etc.
    - Give examples of linkage attack data.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
166
167
168
169
170
171
172
173


### Process 3: De-identification

Personal data can not be made fully anonymous, only "de-identified" to the best of our practical ability. This process describes appropriate de-identification steps to take.

Add below any comments about each of the steps in this process.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
- Participant identifiers
    - Where do you currently store your participant identifiers? Where would you store them if you were going to share data?
    - For clinical studies, this is standard practice. In Oxford sponsored study =\> oxford holds key. Trust sponsored =\> trust holds key. Contract then says "you will never share that key".
    - Calpendo may store key. Check that calpendo no-longer holds names. Can see the name and the WIN ID (and time of scan) next to each other.
    - We recommend not sharing the scan ID, to break the link between the data that WIN hold. You could share your own generated IDs, or you could randomise.
    - Oncology policy to randomise again before sharing. Cass look into this policy.
    - Biobank rescrambles every time the data is shared (by contract) with each new share. But can request bridging files across applications.
    - Scrambling can make it hard when different recipients of the data the want to recombine.
- Bin data
- NWB:N
    - Not familiar with this\...
- Unique dicom fields
    - Do you routinely or infrequently rely on any of these fields? Which would be problematic to remove? Any that should be kept for everyone?
    - How would you like scrubbing to be applied? A default set of fields removed with optional extras (opt in), or all fields by default with option to keep some (opt out). At what stage of the process? Before upload to XNAT or after?
    - How much date information would you like to keep (for example to handle a "product recall" scenario if a scanner fault was detected)?
    - Patient size and weight are in there for the safety of the MR system. Did you know they were in there? Do we need to educate about what is sensitive in headers?
    - Risk that not all scanner manufacturers have the same tags. An anonymisation tool may miss some tags.
- BIDS
    - Converting to BIDS can be run on XNAT. Would your processed data be in BIDS already?
- "Raw" vs. "processed"
    - 1) K-space - what file type? Can also contain tags which appear in the dicom, or if you publish the values which are measured, they don't contain protected info.
    - 2) reconstructed unprocessed (dicom)
    - 3) nii? Not "processed" as it constraints the same data as the dicom
    - It is more about the information which is included rather than the level of "processing" - this is about the branches which we use to define/name the paths. General agreement the naming on the file format would work.
    - Look explicitly at the type of data that people want to share with methods dev work. Cannot deface k-space.
    - What do we need to add to WIN technical development ethics to make it fit e.g. with sharing, DPIA?

- Structural data

- Defacing
    - The face is usually critical for coregistration. What would be the advice; sharing the defaced structural + the coregistration info (that was derived from the complete structural)?
- Unique .json fields
- Quality control
    - Mriqc can be run on XNAT. Would you run mriqc before your own analysis anyway?
- Face structure
- [Reconstruction of defaced data](https://arxiv.org/pdf/1810.06455.pdf)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
210
211
212
213
214
215
216
217
218
219
220
221
222
223




-------

## Post-meeting summary

-   

-------

## Feedback on the meeting

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
224
225
- Please take a few minutes to tell us how this day went for you! Your feedback is invaluable to making this community and these events work.
- You are also welcome to email feedback to
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
226
227
228
    [*cassandra.gouldvanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)


Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
229
- What worked?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
230
231
    -   

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
232
- What didn't work?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
233
234
    -   

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
235
- What would you change?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
236
237
    -   

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
238
- What surprised you?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
239
    -