Commit 8faf0e46 authored by Cassandra Gould van Praag's avatar Cassandra Gould van Praag
Browse files

add call notes

parent 9f4261c6
......@@ -27,6 +27,10 @@ We are now opening this work up to the community to provide feedback on bottlene
This consultation will start by inviting a small number of researchers (with a range of data types) to [add and comment on issues with the proposed guidance](https://git.fmrib.ox.ac.uk/open-science/data-sharing-decision-tree/-/issues) and update the diagram where appropriate.
Notes from these consultation meetings are available below:
- [Meeting 1: Process 1-3](./docs/CallNotes-SoftLanuch-process1-3.md)
- [Meeting 2: Process 4-5](./docs/CallNotes-SoftLanuch-process4-5.md)
## What do we need
We need researchers to walk through these documents with their existing or hypothetical research projects. Help us to identify gaps, issues and solutions!
......
# Data sharing decision tree
## Open WIN Community Feedback
-----
**Important information**
**Where**: Teams
**When**: Friday 30th July 2021, 11:30-13:00 BST (UTC+1)
**Contact**: Email Cass:
[*cassandra.gouldcanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldcanpraag@psych.ox.ac.uk)
or message on Open WIN Slack \#data-sharing-decision-tree
**Material we will be reviewing**:
- Decision tree:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md
- Appendices:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md
-----
## Agenda
1. Introductions
2. Participation guidelines
3. Using this document
a. Add your name if you'd like to be listed as a contributor
b. "+1" where you agree
c. All write everywhere! Individual comments can be anonymous.
4. General considerations
5. Feedback on each step
6. New repository issues (big jobs!)
7. Feedback on the day
## Participants
(Name / Pronouns / Department / GitLab user ID - or "none"
1. Cassandra Gould van Praag / she/her / Psychiatry / [@cassag](https://git.fmrib.ox.ac.uk/cassag)
2. Mats van Es / he/him / Psychiatry / [@psyc1435](https://git.fmrib.ox.ac.uk/psyc1435)
3. Alon Baram / he/him / FMRIB (WIN/NDCN) / none (using GitHub)
4. Benjamin Tendler / He/Him / WIN / [@btendler](https://git.fmrib.ox.ac.uk/btendler)
5. Thijs de Buck / he/him / FMRIB / [@ndcn0873](https://git.fmrib.ox.ac.uk/ndcn0873)
6. Christoph Arthofer / He/Him / WIN / [@cart](https://git.fmrib.ox.ac.uk/cart)
7. Jessica Walsh / she/her / WIN / [@ndcn1073](https://git.fmrib.ox.ac.uk/ndcn1073)
8. Michiel Cottaar / he/him / FMRIB / [@ndcn0236](https://git.fmrib.ox.ac.uk/ndcn0236)
9. Ludovica Griffanti / She/her / WIN (Psych and NCN) / [@ludovica](https://git.fmrib.ox.ac.uk/ludovica)
10. Tom Whyntie / he/him / Dept. Oncology / [@twhyntie](https://git.fmrib.ox.ac.uk/twhyntie)
## [Participation Guidelines](https://open.win.ox.ac.uk/pages/open-science/community/Open-WIN-Community/docs/community/CODE_OF_CONDUCT/)
- We value the participation of every member of our community and want to ensure that every contributor has an enjoyable and fulfilling experience. Please show respect and courtesy to other community members at all times.
- We are dedicated to a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, religion, politics or technology choices. We do not tolerate harassment by and/or of members of our community in any form.
- We fall under the formal policy and reporting guidelines of the [*University Bullying and Harassment Policy*](https://edu.admin.ox.ac.uk/harassment-policy) and we expect everyone to be a [*responsible bystander*](https://edu.web.ox.ac.uk/bystander)
## General questions to consider
- How do you feel about data sharing? How does your PI feel about data
sharing?
- Mats: Great! We should all do it! +1 +1
My PI, I don't know. We mostly use other people's data, so there's not a lot of data collection. I think most data are shared with
collaborators directly, but not openly.
- Ludo: would be great, but realised not easy. (technical) PI: sure, share it yesterday! Clinical PI: I want to know who/when/what/why people use it and then maybe share it.
- Alon: +100
- Example from email: "it is fine to share the images provided the following points are satisfied. 1) Images are defaced 2) All identifiable information (including DOB, sex, height and weight) removed and that you can guarantee that no identifiers are left behind. 3) There is a log of who accesses the images and permission/password to access is provided by us and renewed frequently."
- Benjamin Tendler: Is this a choice that a PI/member should be making? We are actively encouraged by Wellcome to share data: "All Wellcome-funded researchers are expected to manage their research outputs in a way that will achieve the greatest health benefit, maximising the availability of research data, software and materials with as few restrictions as possible".
**Imagine you are a WIN researcher asking "*Can I share my data?*", or "*How can I share my data?*"**
The "decision tree" document and appendices should guide you through the necessary stages to prepare your data for sharing.
**This is not a trivial process, but the purpose of this guide is to help you, not scare you!**
Today we are not writing a "user guide", simply determining whether the plans we have developed will work for you, if we have missed anything, or if there is anything we need to do to support you further.
**You are the ones who will be sharing and receiving credit for your
data, so we want to make it easy for you!**
## General questions for the materials
- How (in what format(s)) would you like to engage with this material?
- Do you expect to go through the whole guide in one sitting?
- Would you like to take notes against it "online"?
- Should it be hierarchical (see only the top most level then deeper) or would you like to see all at once?
- Would you like a glossary or FAQ (list below any terms which might be unfamiliar to the average researcher)?
- What would make you want to engage or run away from this material?!
## Feedback on each step
- While we're working through each step:
- Do you know how to do this thing?
- Does this flow work for your data?
- Current knowledge gap around secondary data analysis!
### Process 1: Data management, data security and ethics
This process mostly contained actions which are required for all research, irrespective of whether the data are intended to be shared. They are included here to highlight the importance of these stages in managing the additional risks where data are intended to be shared.
Add below any comments about each of the steps in this process.
- Data management plan
- Who has done a DMP: -1-1 +1 -1 -1-1 -1-1 (Yes = 1, No = 7)
- TW: +1 - DMPs are mandatory for clinical trials
- DMP created for applying for a fellowship. +1(small section)
- Written for protocol for clinical ethics committees
- Most DMPs seem to be included in ethics documents - I've never had to write one because of the Technical Development clearance, but it might be that there's a DMP included in there which I'm not aware of?
- No-one had been on data management training from University. Didn't know it exists.
- It would be useful to provide an example Data management plan. For example, a lot of data at the WIN will be sharing in vivo imaging data, which is being stored using similar resources (e.g. FMRIB cluster). This 'standard' pipeline could be provided as a template, and then edited for more specific cases. +100
- Researcher data security training
- Who has completed in the last year: +1 -1 +1 (Yes = 2, No = 1, rest unsure)
- Like the links to things (e.g. training profile), to make it easier to find. Suggest adding the exact name of the course to search for
- DPIA screening
- Not sure whether it is me who has to take care of this, or my PI?
- Only if you acquire new data or if you deal with any data? Primary or secondary data acquired from a person → needs further assessment
- For any study, not only if you share data
- Who has completed one: -1 -1 +1 -1 -1 -1 (Yes = 1, No = 5)
- Can we split this [the whole decision tree] off into "PI level actions" and "researcher level"? Would make it easier to know what you should be concerned about.
- Ethics approval
- Discuss retrospective ethics approval - \*think\* if you don\'t' already have consent to share openly, you need consent to recontact your participants and consent them to sharing! → is a simple e-mail ("I consent to ...") sufficient, or is there a certain form to be be completed?
- Comment that you do not need consent to share if it has been completely anonymised. When is it possible to make it 100% anonymous?
- Secondary analysis output: Aggregation of data, minimum number of subjects?
- Clarity on what (if any) data can be shared without specific ethical approval. For example, methods development scans which are routine at WIN and can contain \*no\* clinical information.
- Participant consent
- Relying on the lab to have a sufficiently audible consent process to remove (jnot share) participants which do not give consent for sharing.
- Maybe consent forms (and patient information sheets) need to be explicit about which parts of the data (e.g. reaction time vs. pupillometry) are anonymous by default.
- Example wording for consent forms and participant information sheets, and DMP. +1+1+1
- Caution that it can stop people thinking if they are just
copy-pasting.
- How detailed do you need to be with participants about what data will be shared. Maybe we need to do some patient and public involvement (PPI) to create standard.
- Probably lots of stuff we can share from biobank e.g. [this information sheet](https://www.ukbiobank.ac.uk/explore-your-participation/contribute-further/serology-study/information-sheet).
- If you move to "on request", add that you should then draw up a contract with the intended recipient of the data. That contract might be only a slight deviation from the standard DUA we are developing here for XNAT.
- Can a material transfer agreement (MTA) be used for sharing MRI data? Can also cover digital data. There are some stock templates about what you can and can't do.
- Are we putting limits on what people can and can't do with the shared data, e.g. non-commercial use, "research only", specific acknowledgement statements from funders.... Come back to Ben to find out more!\ → At the Donders Repo, you have to choose a license when sharing your data (options are various existing licenses, or a general Donders license
- Collaboration agreements are being designed with the Trust, which has some nice statements which might be relevant. → At the Donders Repo, when you want to access data, you have to agree with a user agreement which also specifies you won't try do de-anonymise the data - we will cover this in the final section of the processes :)
Other questions:
- Is it clear that the burden of data governance exists irrespective of sharing plans? You do not have to additionally formulate governance plans if you choose to share your data, just include sharing specific descriptions. Would you like "governance anyway" steps to be identified separately?
### Process 2: Protected features of the data
This process identifies suitable repositories for different data types based on wether they contain information which is protected under UK GDPR.
Add below any comments about each of the steps in this process.
- Protected data
- Ex-vivo data form the Oxford Brain Bank (for example) - you will need to check with them regarding what you can do. For example, very rare diseases may make the tissue sample very identifiable. Consent for Oxford brain bank has specific levels of sharing.
- Also some restrictions with non-human data - Benjamin Tendler: This can relate to data security (for example, not attempting to contact the source of brains)
- Ben and Rogier should be brought in on these discussions.
- Primary data
- Supplementary data
- Bin data
- When you have many more variables than subjects, they can all be unique combinations. If you have to bin these data, they may not fit with what was necessary for the analysis. Compromise between the granularity you need for the analysis vs. the level appropriate for de-identified sharing.
- This is only a recommended step: show you have done your due diligence.
- Think about what a data breach would look like.
- It's the combination of these values (disease status, age, gender, handedness) which can make someone identifiable.
- Does binning apply to brains too?
- Give explicit list of things which are ok to be shared without MTAs etc.
- Give examples of linkage attack data.
Other questions:
- Do you use any other data sharing sites (especially for sources other than human MRI)?
### Process 3: De-identification
Personal data can not be made fully anonymous, only "de-identified" to the best of our practical ability. This process describes appropriate de-identification steps to take.
Add below any comments about each of the steps in this process.
- Participant identifiers
- Where do you currently store your participant identifiers? Where would you store them if you were going to share data?
- For clinical studies, this is standard practice. In Oxford sponsored study =\> oxford holds key. Trust sponsored =\> trust holds key. Contract then says "you will never share that key".
- Calpendo may store key. Check that calpendo no-longer holds names. Can see the name and the WIN ID (and time of scan) next to eachother.
- We recommend not sharing the scan ID, to break the link between the data that WIN hold. You could share your own generated IDs, or you could randomise.
- Oncology policy to randomise again before sharing. Cass look into this policy.
- Biobank rescrambles every time the data is shared (by contract) with each new share. But can request bridging files across applications.
- Scrambling can make it hard when different recipients of the data the want to recombine.
- Bin data
- NWB:N
- Not familiar with this\...
- Unique dicom fields
- Do you routinely or infrequently rely on any of these fields? Which would be problematic to remove? Any that should be kept for everyone?
- How would you like scrubbing to be applied? A default set of fields removed with optional extras (opt in), or all fields by default with option to keep some (opt out). At what stage of the process? Before upload to XNAT or after?
- How much date information would you like to keep (for example to handle a "product recall" scenario if a scanner fault was detected)?
- Patient size and weight are in there for the safety of the MR system. Did you know they were in there? Do we need to educate about what is sensitive in headers?
- Risk that not all scanner manufacturers have the same tags. An anonymisation tool may miss some tags.
- BIDS
- Converting to BIDS can be run on XNAT. Would your processed data be in BIDS already?
- Raw vs. processed
- 1\) K-space - what file type? Can also contain tags which appear in the dicom, or if you publish the values which are measured, they don't contain protected info.
- 2\) reconstructed unprocessed (dicom)
- 3\) nii? Not "processed" as it constraints the same data as the dicom
- More about the information which is included rather than the level of "processing" - this is about the branches which we use to define/name the paths. General agreement the naming on the file format would work.
- Look explicitly at the type of data that people want to share with methods dev work. Cannot deface k-space.
- What do we need to add to WIN technical development ethics to make it fit e.g. with sharing, DPIA?
- Structural data
- Defacing
- The face is usually critical for coregistration. What would be the advice; sharing the defaced structural + the coregistration info (that was derived from the complete structural)?
- Unique .json fields
- Quality control
- Mriqc can be run on XNAT. Would you run mriqc before your own analysis anyway?
- Face structure
- [Reconstruction of defaced data](https://arxiv.org/pdf/1810.06455.pdf)
Other questions:
- How would you work with XNAT alongside your data collection and processing (jalapeno) workflows?
-------
## Post-meeting summary
-
-------
## Feedback on the meeting
- Please take a few minutes to tell us how this day went for you! Your feedback is invaluable to making this community and these events work.
- You are also welcome to email feedback to
[*cassandra.gouldvanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)
*30th July 2021*
- What worked?
-
- What didn't work?
-
- What would you change?
-
- What surprised you?
-
# Data sharing decision tree
## Open WIN Community Feedback
-----
**Important information**
**Where**: Teams
**When**: TBC
**Contact**: Email Cass:
[*cassandra.gouldcanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldcanpraag@psych.ox.ac.uk)
or message on Open WIN Slack \#data-sharing-decision-tree
**Material we will be reviewing**:
- Decision tree:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree.md
- Appendices:
https://git.fmrib.ox.ac.uk/open-science/community/data-sharing-decision-tree/-/blob/master/docs/decision-tree-appendicies.md
-----
## Agenda
1. Introductions
2. Participation guidelines
3. Using this document
a. Add your name if you'd like to be listed as a contributor
b. "+1" where you agree
c. All write everywhere! Individual comments can be anonymous.
4. General considerations
5. Feedback on each step
6. New repository issues (big jobs!)
7. Feedback on the day
## Participants
(Name / Pronouns / Department / GitLab user ID - or "none"
1. Cassandra Gould van Praag / she/her / Psychiatry / [@cassag](https://git.fmrib.ox.ac.uk/cassag)
2.
## [Participation Guidelines](https://open.win.ox.ac.uk/pages/open-science/community/Open-WIN-Community/docs/community/CODE_OF_CONDUCT/)
- We value the participation of every member of our community and want to ensure that every contributor has an enjoyable and fulfilling experience. Please show respect and courtesy to other community members at all times.
- We are dedicated to a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, religion, politics or technology choices. We do not tolerate harassment by and/or of members of our community in any form.
- We fall under the formal policy and reporting guidelines of the [*University Bullying and Harassment Policy*](https://edu.admin.ox.ac.uk/harassment-policy) and we expect everyone to be a [*responsible bystander*](https://edu.web.ox.ac.uk/bystander)
## General questions to consider
- How do you feel about data sharing? How does your PI feel about data
sharing?
-
**Imagine you are a WIN researcher asking "*Can I share my data?*", or "*How can I share my data?*"**
The "decision tree" document and appendices should guide you through the necessary stages to prepare your data for sharing.
**This is not a trivial process, but the purpose of this guide is to help you, not scare you!**
Today we are not writing a "user guide", simply determining whether the plans we have developed will work for you, if we have missed anything, or if there is anything we need to do to support you further.
**You are the ones who will be sharing and receiving credit for your data, so we want to make it easy for you!**
## General questions for the materials
- How (in what format(s)) would you like to engage with this material?
- Do you expect to go through the whole guide in one sitting?
- Would you like to take notes against it "online"?
- Should it be hierarchical (see only the top most level then deeper) or would you like to see all at once?
- Would you like a glossary or FAQ (list below any terms which might be unfamiliar to the average researcher)?
- What would make you want to engage or run away from this material?!
## Feedback on each step
- While we're working through each step:
- Do you know how to do this thing?
- Does this flow work for your data?
- Current knowledge gap around secondary data analysis!
### Process 4: Metadata
Your shared research data should be supported by additional metadata to make it findable, accessible, interoperable, and reusable ([*FAIR*](https://www.go-fair.org/fair-principles/)). This process helps you identify what metadata will be necessary to make your data FAIR, how to collate it, and how to sufficiently describe it.
Add below any comments about each of the steps in this process.
- Behavioural and clinical covariates
- Electrophysiological
- Image metadata
- BIDS
- Acquisition protocol
- WIN Open Acquisition database
- Experimental protocol
### Process 5: Sharing and attribution
This process describes how to upload your data to XNAT and make it citable. You are also asked to consider whether the standard WIN XNAT Data Usage Agreement (DUA) is appropriate for your data.
Add below any comments about each of the steps in this process.
- Upload to XNAT
- Who would you like to give internal access to your project? You, your PI and XNAT admin? Your jalapeno user group?
- Data freeze
- Generate a DOI
- Data usage agreement
- Which statement do you prefer about authorship on papers which re-use your data:
1. "The data generators sha ll not be included as an author of publications or presentations without consent."
2. "Neither the Donders Institute or Radboud University, nor the researchers that provide this data should be included as an author of publications or presentations if this authorship would be based solely on the use of this data."
- Approve public sharing on XNAT
- How would you like to stage the approval process? Would you like the PI to approve? Email warnings / invitations? Would you like to see a review summary of the de-identification processes completed (e.g. a checklist read from the image metadata)?
Other questions:
- How do you think the access of external users to be managed? Should they have to create accounts? How will accounts be verified (e.g. ORCID or institutional?)? Should accounts be closed after a fixed period?
- What metrics would you like to know about how your data is accessed?
-------
## Post-meeting summary
-
-------
## Feedback on the meeting
- Please take a few minutes to tell us how this day went for you! Your feedback is invaluable to making this community and these events work.
- You are also welcome to email feedback to
[*cassandra.gouldvanpraag\@psych.ox.ac.uk*](mailto:cassandra.gouldvanpraag@psych.ox.ac.uk)
*30th July 2021*
- What worked?
-
- What didn't work?
-
- What would you change?
-
- What surprised you?
-
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment