Commit fbfb3bca authored by Cassandra Gould van Praag's avatar Cassandra Gould van Praag
Browse files

fix formatting

parent 8faf0e46
......@@ -30,6 +30,7 @@ This consultation will start by inviting a small number of researchers (with a r
Notes from these consultation meetings are available below:
- [Meeting 1: Process 1-3](./docs/CallNotes-SoftLanuch-process1-3.md)
- [Meeting 2: Process 4-5](./docs/CallNotes-SoftLanuch-process4-5.md)
- [Meeting 3: General Questions](./docs/CallNotes-SoftLanuch-outstandingQuestions.md)
## What do we need
......
# Data sharing decision tree
## Open WIN Community Feedback
-----
**Important information**
**Where**: Teams
**When**: TBC
**Contact**: Email Cass:
[*cassandra.gouldcanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldcanpraag@psych.ox.ac.uk)
or message on Open WIN Slack \#data-sharing-decision-tree
**Material we will be reviewing**:
- Decision tree:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md
- Appendices:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md
-----
## Agenda
1. Introductions
2. Participation guidelines
3. Using this document
a. Add your name if you'd like to be listed as a contributor
b. "+1" where you agree
c. All write everywhere! Individual comments can be anonymous.
4. General considerations
5. Feedback on each step
6. New repository issues (big jobs!)
7. Feedback on the day
## Participants
(Name / Pronouns / Department / GitLab user ID - or "none"
1. Cassandra Gould van Praag / she/her / Psychiatry / [@cassag](https://git.fmrib.ox.ac.uk/cassag)
2.
## [Participation Guidelines](https://open.win.ox.ac.uk/pages/open-science/community/Open-WIN-Community/docs/community/CODE_OF_CONDUCT/)
- We value the participation of every member of our community and want to ensure that every contributor has an enjoyable and fulfilling experience. Please show respect and courtesy to other community members at all times.
- We are dedicated to a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, religion, politics or technology choices. We do not tolerate harassment by and/or of members of our community in any form.
- We fall under the formal policy and reporting guidelines of the [*University Bullying and Harassment Policy*](https://edu.admin.ox.ac.uk/harassment-policy) and we expect everyone to be a [*responsible bystander*](https://edu.web.ox.ac.uk/bystander)
## General questions to consider
- How do you feel about data sharing? How does your PI feel about data
sharing?
-
**Imagine you are a WIN researcher asking "*Can I share my data?*", or "*How can I share my data?*"**
The "decision tree" document and appendices should guide you through the necessary stages to prepare your data for sharing.
**This is not a trivial process, but the purpose of this guide is to help you, not scare you!**
Today we are not writing a "user guide", simply determining whether the plans we have developed will work for you, if we have missed anything, or if there is anything we need to do to support you further.
**You are the ones who will be sharing and receiving credit for your data, so we want to make it easy for you!**
## General questions for the materials
- How (in what format(s)) would you like to engage with this material?
- Do you expect to go through the whole guide in one sitting?
- Would you like to take notes against it "online"?
- Should it be hierarchical (see only the top most level then deeper) or would you like to see all at once?
- Would you like a glossary or FAQ (list below any terms which might be unfamiliar to the average researcher)?
- What would make you want to engage or run away from this material?!
- Is it clear that the burden of data governance exists irrespective of sharing plans? You do not have to additionally formulate governance plans if you choose to share your data, just include sharing specific descriptions. Would you like "governance anyway" steps to be identified separately?
- Do you use any other data sharing sites (especially for sources other than human MRI)?
- How would you work with XNAT alongside your data collection and processing (jalapeno) workflows?
-------
## Post-meeting summary
-
-------
## Feedback on the meeting
- Please take a few minutes to tell us how this day went for you! Your feedback is invaluable to making this community and these events work.
- You are also welcome to email feedback to
[*cassandra.gouldvanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)
*30th July 2021*
- What worked?
-
- What didn't work?
-
- What would you change?
-
- What surprised you?
-
......@@ -30,7 +30,9 @@ or message on Open WIN Slack \#data-sharing-decision-tree
3. Using this document
a. Add your name if you'd like to be listed as a contributor
b. "+1" where you agree
c. All write everywhere! Individual comments can be anonymous.
4. General considerations
......@@ -84,15 +86,6 @@ Today we are not writing a "user guide", simply determining whether the plans we
**You are the ones who will be sharing and receiving credit for your
data, so we want to make it easy for you!**
## General questions for the materials
- How (in what format(s)) would you like to engage with this material?
- Do you expect to go through the whole guide in one sitting?
- Would you like to take notes against it "online"?
- Should it be hierarchical (see only the top most level then deeper) or would you like to see all at once?
- Would you like a glossary or FAQ (list below any terms which might be unfamiliar to the average researcher)?
- What would make you want to engage or run away from this material?!
## Feedback on each step
- While we're working through each step:
......@@ -136,30 +129,19 @@ Add below any comments about each of the steps in this process.
- Clarity on what (if any) data can be shared without specific ethical approval. For example, methods development scans which are routine at WIN and can contain \*no\* clinical information.
- Participant consent
- Relying on the lab to have a sufficiently audible consent process to remove (jnot share) participants which do not give consent for sharing.
- Maybe consent forms (and patient information sheets) need to be explicit about which parts of the data (e.g. reaction time vs. pupillometry) are anonymous by default.
- Example wording for consent forms and participant information sheets, and DMP. +1+1+1
- Caution that it can stop people thinking if they are just
copy-pasting.
- How detailed do you need to be with participants about what data will be shared. Maybe we need to do some patient and public involvement (PPI) to create standard.
- Probably lots of stuff we can share from biobank e.g. [this information sheet](https://www.ukbiobank.ac.uk/explore-your-participation/contribute-further/serology-study/information-sheet).
- If you move to "on request", add that you should then draw up a contract with the intended recipient of the data. That contract might be only a slight deviation from the standard DUA we are developing here for XNAT.
- Can a material transfer agreement (MTA) be used for sharing MRI data? Can also cover digital data. There are some stock templates about what you can and can't do.
- Are we putting limits on what people can and can't do with the shared data, e.g. non-commercial use, "research only", specific acknowledgement statements from funders.... Come back to Ben to find out more!\ → At the Donders Repo, you have to choose a license when sharing your data (options are various existing licenses, or a general Donders license
- Collaboration agreements are being designed with the Trust, which has some nice statements which might be relevant. → At the Donders Repo, when you want to access data, you have to agree with a user agreement which also specifies you won't try do de-anonymise the data - we will cover this in the final section of the processes :)
Other questions:
- Is it clear that the burden of data governance exists irrespective of sharing plans? You do not have to additionally formulate governance plans if you choose to share your data, just include sharing specific descriptions. Would you like "governance anyway" steps to be identified separately?
### Process 2: Protected features of the data
......@@ -168,34 +150,20 @@ This process identifies suitable repositories for different data types based on
Add below any comments about each of the steps in this process.
- Protected data
- Ex-vivo data form the Oxford Brain Bank (for example) - you will need to check with them regarding what you can do. For example, very rare diseases may make the tissue sample very identifiable. Consent for Oxford brain bank has specific levels of sharing.
- Also some restrictions with non-human data - Benjamin Tendler: This can relate to data security (for example, not attempting to contact the source of brains)
- Ben and Rogier should be brought in on these discussions.
- Primary data
- Supplementary data
- Bin data
- When you have many more variables than subjects, they can all be unique combinations. If you have to bin these data, they may not fit with what was necessary for the analysis. Compromise between the granularity you need for the analysis vs. the level appropriate for de-identified sharing.
- This is only a recommended step: show you have done your due diligence.
- Think about what a data breach would look like.
- It's the combination of these values (disease status, age, gender, handedness) which can make someone identifiable.
- Does binning apply to brains too?
- Give explicit list of things which are ok to be shared without MTAs etc.
- Give examples of linkage attack data.
Other questions:
- Do you use any other data sharing sites (especially for sources other than human MRI)?
### Process 3: De-identification
......@@ -204,78 +172,43 @@ Personal data can not be made fully anonymous, only "de-identified" to the best
Add below any comments about each of the steps in this process.
- Participant identifiers
- Where do you currently store your participant identifiers? Where would you store them if you were going to share data?
- For clinical studies, this is standard practice. In Oxford sponsored study =\> oxford holds key. Trust sponsored =\> trust holds key. Contract then says "you will never share that key".
- Calpendo may store key. Check that calpendo no-longer holds names. Can see the name and the WIN ID (and time of scan) next to eachother.
- Calpendo may store key. Check that calpendo no-longer holds names. Can see the name and the WIN ID (and time of scan) next to each other.
- We recommend not sharing the scan ID, to break the link between the data that WIN hold. You could share your own generated IDs, or you could randomise.
- Oncology policy to randomise again before sharing. Cass look into this policy.
- Biobank rescrambles every time the data is shared (by contract) with each new share. But can request bridging files across applications.
- Scrambling can make it hard when different recipients of the data the want to recombine.
- Bin data
- NWB:N
- Not familiar with this\...
- Unique dicom fields
- Do you routinely or infrequently rely on any of these fields? Which would be problematic to remove? Any that should be kept for everyone?
- How would you like scrubbing to be applied? A default set of fields removed with optional extras (opt in), or all fields by default with option to keep some (opt out). At what stage of the process? Before upload to XNAT or after?
- How much date information would you like to keep (for example to handle a "product recall" scenario if a scanner fault was detected)?
- Patient size and weight are in there for the safety of the MR system. Did you know they were in there? Do we need to educate about what is sensitive in headers?
- Risk that not all scanner manufacturers have the same tags. An anonymisation tool may miss some tags.
- BIDS
- Converting to BIDS can be run on XNAT. Would your processed data be in BIDS already?
- Raw vs. processed
- 1\) K-space - what file type? Can also contain tags which appear in the dicom, or if you publish the values which are measured, they don't contain protected info.
- 2\) reconstructed unprocessed (dicom)
- 3\) nii? Not "processed" as it constraints the same data as the dicom
- More about the information which is included rather than the level of "processing" - this is about the branches which we use to define/name the paths. General agreement the naming on the file format would work.
- "Raw" vs. "processed"
- 1) K-space - what file type? Can also contain tags which appear in the dicom, or if you publish the values which are measured, they don't contain protected info.
- 2) reconstructed unprocessed (dicom)
- 3) nii? Not "processed" as it constraints the same data as the dicom
- It is more about the information which is included rather than the level of "processing" - this is about the branches which we use to define/name the paths. General agreement the naming on the file format would work.
- Look explicitly at the type of data that people want to share with methods dev work. Cannot deface k-space.
- What do we need to add to WIN technical development ethics to make it fit e.g. with sharing, DPIA?
- Structural data
- Defacing
- The face is usually critical for coregistration. What would be the advice; sharing the defaced structural + the coregistration info (that was derived from the complete structural)?
- Unique .json fields
- Quality control
- Mriqc can be run on XNAT. Would you run mriqc before your own analysis anyway?
- Face structure
- [Reconstruction of defaced data](https://arxiv.org/pdf/1810.06455.pdf)
Other questions:
- How would you work with XNAT alongside your data collection and processing (jalapeno) workflows?
-------
......@@ -292,20 +225,15 @@ Other questions:
- You are also welcome to email feedback to
[*cassandra.gouldvanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)
*30th July 2021*
- What worked?
-
- What didn't work?
-
- What would you change?
-
- What surprised you?
-
......@@ -16,10 +16,8 @@ or message on Open WIN Slack \#data-sharing-decision-tree
**Material we will be reviewing**:
- Decision tree:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md
- Appendices:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md
- Decision tree: https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md
- Appendices: https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md
-----
......@@ -30,7 +28,9 @@ or message on Open WIN Slack \#data-sharing-decision-tree
3. Using this document
a. Add your name if you'd like to be listed as a contributor
b. "+1" where you agree
c. All write everywhere! Individual comments can be anonymous.
4. General considerations
......@@ -54,9 +54,7 @@ or message on Open WIN Slack \#data-sharing-decision-tree
## General questions to consider
- How do you feel about data sharing? How does your PI feel about data
sharing?
- How do you feel about data sharing? How does your PI feel about data sharing?
-
**Imagine you are a WIN researcher asking "*Can I share my data?*", or "*How can I share my data?*"**
......@@ -69,21 +67,11 @@ Today we are not writing a "user guide", simply determining whether the plans we
**You are the ones who will be sharing and receiving credit for your data, so we want to make it easy for you!**
## General questions for the materials
- How (in what format(s)) would you like to engage with this material?
- Do you expect to go through the whole guide in one sitting?
- Would you like to take notes against it "online"?
- Should it be hierarchical (see only the top most level then deeper) or would you like to see all at once?
- Would you like a glossary or FAQ (list below any terms which might be unfamiliar to the average researcher)?
- What would make you want to engage or run away from this material?!
## Feedback on each step
- While we're working through each step:
- Do you know how to do this thing?
- Does this flow work for your data?
- Current knowledge gap around secondary data analysis!
### Process 4: Metadata
......@@ -106,7 +94,6 @@ This process describes how to upload your data to XNAT and make it citable. You
Add below any comments about each of the steps in this process.
- Upload to XNAT
- Who would you like to give internal access to your project? You, your PI and XNAT admin? Your jalapeno user group?
- Data freeze
......@@ -114,12 +101,10 @@ Add below any comments about each of the steps in this process.
- Data usage agreement
- Which statement do you prefer about authorship on papers which re-use your data:
1. "The data generators sha ll not be included as an author of publications or presentations without consent."
1. "The data generators shall not be included as an author of publications or presentations without consent."
2. "Neither the Donders Institute or Radboud University, nor the researchers that provide this data should be included as an author of publications or presentations if this authorship would be based solely on the use of this data."
- Approve public sharing on XNAT
- How would you like to stage the approval process? Would you like the PI to approve? Email warnings / invitations? Would you like to see a review summary of the de-identification processes completed (e.g. a checklist read from the image metadata)?
Other questions:
......@@ -142,20 +127,15 @@ Other questions:
- You are also welcome to email feedback to
[*cassandra.gouldvanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)
*30th July 2021*
- What worked?
-
- What didn't work?
-
- What would you change?
-
- What surprised you?
-
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment