CallNotes-SoftLanuch-process1-3.md 14.5 KB
Newer Older
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
1
2
3
4
5
6
7
8
9
# Data sharing decision tree

## Open WIN Community Feedback

-----

**Important information**

**Where**: Teams
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
10

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
11
**When**: Friday 30th July 2021, 11:30-13:00 BST (UTC+1)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
12

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
13
**Contact**: Email Cass:
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
14
15
[cassandra.gouldcanpraag@psych.ox.ac.uk](mailto:cassandra.gouldcanpraag@psych.ox.ac.uk)
or message on Open WIN Slack #data-sharing-decision-tree
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
16
17

**Material we will be reviewing**:
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
18
19
- [Decision tree](https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md)
- [Appendices](https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
20
21
22
23
24

-----

## Agenda

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
25
26
27
1. Introductions
2. Participation guidelines
3. Using this document
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
28
29
30
	a. Add your name if you'd like to be listed as a contributor
	b. "+1" where you agree
	c. All write everywhere! Individual comments can be anonymous.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
31
32
4. General considerations
5. Feedback on each step
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
33
6. Feedback on the day
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
34
35
36
37

## Participants

(Name / Pronouns / Department / GitLab user ID - or "none"
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
38
39
40
41
42
43
44
45
46
1. Cassandra Gould van Praag / she/her / Psychiatry / [@cassag](https://git.fmrib.ox.ac.uk/cassag)
2. Mats van Es / he/him / Psychiatry / [@psyc1435](https://git.fmrib.ox.ac.uk/psyc1435)
3. Alon Baram / he/him / FMRIB (WIN/NDCN) / none (using GitHub)
4. Benjamin Tendler / He/Him / WIN / [@btendler](https://git.fmrib.ox.ac.uk/btendler)
5. Thijs de Buck / he/him / FMRIB / [@ndcn0873](https://git.fmrib.ox.ac.uk/ndcn0873)
6. Christoph Arthofer / He/Him / WIN / [@cart](https://git.fmrib.ox.ac.uk/cart)
7. Jessica Walsh / she/her / WIN / [@ndcn1073](https://git.fmrib.ox.ac.uk/ndcn1073)
8. Michiel Cottaar / he/him / FMRIB / [@ndcn0236](https://git.fmrib.ox.ac.uk/ndcn0236)
9. Ludovica Griffanti / She/her / WIN (Psych and NCN) / [@ludovica](https://git.fmrib.ox.ac.uk/ludovica)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
47
48
49
10. Tom Whyntie / he/him / Dept. Oncology / [@twhyntie](https://git.fmrib.ox.ac.uk/twhyntie)


Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
50
## Participation Guidelines
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
51
52
- We value the participation of every member of our community and want to ensure that every contributor has an enjoyable and fulfilling experience. Please show respect and courtesy to other community members at all times.
- We are dedicated to a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, religion, politics or technology choices. We do not tolerate harassment by and/or of members of our community in any form.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
53
- We fall under the formal policy and reporting guidelines of the [University Bullying and Harassment Policy](https://edu.admin.ox.ac.uk/harassment-policy) and we expect everyone to be a [responsible bystander](https://edu.web.ox.ac.uk/bystander)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
54

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
55
56
57
58
59
60
61
## How do you feel about data sharing? How does your PI feel about data sharing?
- Mats: Great! We should all do it! +1 +1
- Mats: My PI, I don't know. We mostly use other people's data, so there's not a lot of data collection. I think most data are shared with collaborators directly, but not openly.
- Ludo: would be great, but realised not easy. (technical) PI: sure, share it yesterday! Clinical PI: I want to know who/when/what/why people use it and then maybe share it.
	- Alon: +100
	- Example from email: "it is fine to share the images provided the following points are satisfied. 1) Images are defaced 2) All identifiable information (including DOB, sex, height and weight) removed and that you can guarantee that no identifiers are left behind. 3) There is a log of who accesses the images and permission/password to access is provided by us and renewed frequently."
- Benjamin Tendler: Is this a choice that a PI/member should be making? We are actively encouraged by Wellcome to share data: "All Wellcome-funded researchers are expected to manage their research outputs in a way that will achieve the greatest health benefit, maximising the availability of research data, software and materials with as few restrictions as possible".
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
62
63
64
65
66
67
68
69
70

**Imagine you are a WIN researcher asking "*Can I share my data?*", or "*How can I share my data?*"**

The "decision tree" document and appendices should guide you through the necessary stages to prepare your data for sharing.

**This is not a trivial process, but the purpose of this guide is to help you, not scare you!**

Today we are not writing a "user guide", simply determining whether the plans we have developed will work for you, if we have missed anything, or if there is anything we need to do to support you further.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
71
**You are the ones who will be sharing and receiving credit for your data, so we want to make it easy for you!**
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
72
73

## Feedback on each step
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
74
75
76
77
While we're working through each step:
- Do you know how to do this thing?
- Does this flow work for your data?
	- Current knowledge gap around secondary data analysis!
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
78
79
80
81
82
83

### Process 1: Data management, data security and ethics
This process mostly contained actions which are required for all research, irrespective of whether the data are intended to be shared. They are included here to highlight the importance of these stages in managing the additional risks where data are intended to be shared.

Add below any comments about each of the steps in this process.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
#### Data management plan
- Who has done a DMP: -1-1 +1 -1 -1-1 -1-1 (Yes = 1, No = 7)
- TW: +1 - DMPs are mandatory for clinical trials
- DMP created for applying for a fellowship. +1(small section)
- Written for protocol for clinical ethics committees
- Most DMPs seem to be included in ethics documents - I've never had to write one because of the Technical Development clearance, but it might be that there's a DMP included in there which I'm not aware of?
- No-one had been on data management training from University. Didn't know it exists.
- It would be useful to provide an example Data management plan. For example, a lot of data at the WIN will be sharing in vivo imaging data, which is being stored using similar resources (e.g. FMRIB cluster). This 'standard' pipeline could be provided as a template, and then edited for more specific cases. +100

#### Researcher data security training
- Who has completed in the last year: +1 -1 +1 (Yes = 2, No = 1, rest unsure)
- Like the links to things (e.g. training profile), to make it easier to find. Suggest adding the exact name of the course to search for

#### DPIA screening
- Not sure whether it is me who has to take care of this, or my PI?
- Only if you acquire new data or if you deal with any data? Primary or secondary data acquired from a person → needs further assessment
- For any study, not only if you share data
- Who has completed one: -1 -1 +1 -1 -1 -1 (Yes = 1, No = 5)
- Can we split this [the whole decision tree] off into "PI level actions" and "researcher level"? Would make it easier to know what you should be concerned about.

#### Ethics approval
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
105
- Discuss retrospective ethics approval - *think* if you don't already have consent to share openly, you need consent to recontact your participants and consent them to sharing! → is a simple e-mail ("I consent to ...") sufficient, or is there a certain form to be be completed?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
106
107
    - Comment that you do not need consent to share if it has been completely anonymised. When is it possible to make it 100% anonymous?
    - Secondary analysis output: Aggregation of data, minimum number of subjects?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
108
    - Clarity on what (if any) data can be shared without specific ethical approval. For example, methods development scans which are routine at WIN and can contain *no* clinical information.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
109
110
111
112
113
114
115
116
117
118

#### Participant consent
- Relying on the lab to have a sufficiently audible consent process to remove (jnot share) participants which do not give consent for sharing.
- Maybe consent forms (and patient information sheets) need to be explicit about which parts of the data (e.g. reaction time vs. pupillometry) are anonymous by default.
- Example wording for consent forms and participant information sheets, and DMP. +1+1+1
    - Caution that it can stop people thinking if they are just copy-pasting.
- How detailed do you need to be with participants about what data will be shared. Maybe we need to do some patient and public involvement (PPI) to create standard.
- Probably lots of stuff we can share from biobank e.g. [this information sheet](https://www.ukbiobank.ac.uk/explore-your-participation/contribute-further/serology-study/information-sheet).
- If you move to "on request", add that you should then draw up a contract with the intended recipient of the data. That contract might be only a slight deviation from the standard DUA we are developing here for XNAT.
- Can a material transfer agreement (MTA) be used for sharing MRI data? Can also cover digital data. There are some stock templates about what you can and can't do.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
119
- Are we putting limits on what people can and can't do with the shared data, e.g. non-commercial use, "research only", specific acknowledgement statements from funders.... Come back to Ben to find out more! → At the Donders Repo, you have to choose a license when sharing your data (options are various existing licenses, or a general Donders license
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
120
- Collaboration agreements are being designed with the Trust, which has some nice statements which might be relevant. → At the Donders Repo, when you want to access data, you have to agree with a user agreement which also specifies you won't try do de-anonymise the data - we will cover this in the final section of the processes :)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
121
122
123
124
125
126

### Process 2: Protected features of the data
This process identifies suitable repositories for different data types based on wether they contain information which is protected under UK GDPR.

Add below any comments about each of the steps in this process.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
127
128
129
130
#### Protected data
- Ex-vivo data form the Oxford Brain Bank (for example) - you will need to check with them regarding what you can do. For example, very rare diseases may make the tissue sample very identifiable. Consent for Oxford brain bank has specific levels of sharing.
- Also some restrictions with non-human data - Benjamin Tendler: This can relate to data security (for example, not attempting to contact the source of brains)
- Ben and Rogier should be brought in on these discussions.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
131

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
132
133
#### Primary data
- no comments
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
134

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
135
136
#### Supplementary data
- no comments
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
137

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
138
139
140
141
142
143
144
145
146
147
148
#### Bin data
- When you have many more variables than subjects, they can all be unique combinations. If you have to bin these data, they may not fit with what was necessary for the analysis. Compromise between the granularity you need for the analysis vs. the level appropriate for de-identified sharing.
- This is only a recommended step: show you have done your due diligence.
- Think about what a data breach would look like.
- It's the combination of these values (disease status, age, gender, handedness) which can make someone identifiable.
- Does binning apply to brains too?
- Give explicit list of things which are ok to be shared without MTAs etc.
- Give examples of linkage attack data.


### Process 3: De-identification
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
149
150
151
152
Personal data can not be made fully anonymous, only "de-identified" to the best of our practical ability. This process describes appropriate de-identification steps to take.

Add below any comments about each of the steps in this process.

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
153
154
#### Participant identifiers
- Where do you currently store your participant identifiers? Where would you store them if you were going to share data?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
155
- For clinical studies, this is standard practice. In Oxford sponsored study => oxford holds key. Trust sponsored => trust holds key. Contract then says "you will never share that key".
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
- Calpendo may store key. Check that calpendo no-longer holds names. Can see the name and the WIN ID (and time of scan) next to each other.
- We recommend not sharing the scan ID, to break the link between the data that WIN hold. You could share your own generated IDs, or you could randomise.
- Oncology policy to randomise again before sharing. Cass look into this policy.
- Biobank rescrambles every time the data is shared (by contract) with each new share. But can request bridging files across applications.
- Scrambling can make it hard when different recipients of the data the want to recombine.

#### Bin data
- no comments

#### NWB:N
- Not familiar with this...

#### Unique dicom fields
- Do you routinely or infrequently rely on any of these fields? Which would be problematic to remove? Any that should be kept for everyone?
- How would you like scrubbing to be applied? A default set of fields removed with optional extras (opt in), or all fields by default with option to keep some (opt out). At what stage of the process? Before upload to XNAT or after?
- How much date information would you like to keep (for example to handle a "product recall" scenario if a scanner fault was detected)?
- Patient size and weight are in there for the safety of the MR system. Did you know they were in there? Do we need to educate about what is sensitive in headers?
- Risk that not all scanner manufacturers have the same tags. An anonymisation tool may miss some tags.

#### BIDS
- Converting to BIDS can be run on XNAT. Would your processed data be in BIDS already?

#### "Raw" vs. "processed"
- 1) K-space - what file type? Can also contain tags which appear in the dicom, or if you publish only the values which are measured, they don't contain protected info.
- 2) reconstructed unprocessed (dicom)
- 3) nii? Not "processed" as it constraints the same data as the dicom
- It is more about the information which is included rather than the level of "processing" - this is about the branches which we use to define/name the paths. General agreement the naming on the file format would work.
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
183
184
185
    - Look explicitly at the type of data that people want to share with methods dev work. Cannot deface k-space.
    - What do we need to add to WIN technical development ethics to make it fit e.g. with sharing, DPIA?

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
186
187
#### Structural data
- no comments
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
188

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
189
190
#### Defacing
- The face is usually critical for coregistration. What would be the advice; sharing the defaced structural + the coregistration info (that was derived from the complete structural)?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
191

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
192
193
#### Unique .json fields
- no comments
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
194

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
195
196
#### Quality control
- Mriqc can be run on XNAT. Would you run mriqc before your own analysis anyway?
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
197

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
198
199
#### Face structure
- [Reconstruction of defaced data](https://arxiv.org/pdf/1810.06455.pdf)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
200
201
202
203




Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
204
----
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
205
206

## Feedback on the meeting
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
207
208
- Please take a few minutes to tell us how this day went for you! Your feedback is invaluable to making this community and these events work.
- You are also welcome to email feedback to
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
209
    [cassandra.gouldvanpraag@psych.ox.ac.uk](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
210
211


Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
212
213
214
215
216
217
218
219
#### What worked?
-   

#### What didn't work?
-   

#### What would you change?
-   
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
220

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
221
222
#### What surprised you?
-
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
223

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
224
-------
Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
225

Cassandra Gould van Praag's avatar
Cassandra Gould van Praag committed
226
227
## Post-meeting summary
-