ONTOX Training Reproducible Research
conquer the chaos
Workshop overview
Why you want to complete this workshop
Research projects are chaotic and we all know how hard it can be to keep track of manuscipt versions, changes to datasets, which figure was generated from which file, etc.. However, when building an data platform such as within ONTOX, it is very important that every datapoint can be retraced to its origin.
At the same time, many people store data in a way that works for them, but is almost impossible to reproduce once someone changes jobs. There are some easy systems to implement that conquer this chaos, make your work more productive, collaborations easier, and ONTOX possible. And also, save you a lot of time in the long run.
Learning objectives
After this workshop you:
- Can store your research projects (data, code, manuscripts, etc) in a future proof way, that fascilitates collaboration, and saves you a lot of time if you ever want to revisit, share or alter your projects.
- Know which information needs to be stored with data, and how to keep these together when collaborating with others.
- Can keep your projects under version control, so you can always backtrace what changed, when, and by whom.
- Applied all this to your current most active research project
- Can start tomorrow (or well, next monday) with applying this in your everyday work.
Contents
This workshop is devided in 2 parts:
Practical Data Handling
- Folder structure
- meta data
Data sharing
- Sending others your data or code
- Receiving data or code
- Version control
Schedule
Total of 4 hours (14:30 - 18:30)
Approximate durations for each item are displayed, because some exercises could last days or 5 minutes, depending on how deep you dive into them.
14:30-15:00
- 10-20 min welcome, everyone introduces themselves
- 10 min broad introduction to research data management and FAIR data in ONTOX
15:00-15:50 hands-on storing your current data
- MAIN 5 min intro folder structure
- BREAKOUT (3 people, 15 min) DIY analysing and cleaning your own current data storage ways (exercise A1)
- MAIN 5 min intro metadata
- BREAKOUT (3 people, 15-20 min) DIY and discussion meta data (exercises A2 and A3)
15:50-16:10 sticky notes /discussion
- MURAL on incentives and barriers for using reproducible methods
16:10-16:30 break
16:30-18:10 hands-on sharing your current data
- MAIN 10 min intro: The usual problematic workflow
- MAIN 5 min split group
- collaboration and digital data
- BREAKOUT (5 people 30 min)
- Option 1: 5-10 min DIY sharing excel files (exercise B1) and 15-20 min DIY data in excel (exercise B2)
- Option 2: 5-10 min DIY sharing excel files (exercise B1) and 15-20 min DIY github (exercise C1, C2, C3)
- BREAKOUT (5 people 30 min)
- machine readable data and workflows
- MAIN 10 min demo AI: why we need it to be machine readable, tidy data / mermaid, version control
- BREAKOUT (5 people, 45 min)
- Option 1: 15-20 min DIY tidy data (exercise B3) and 20-30 min version control (exercise B4 and B5)
- Option 2: finish C1 till C3, and 45 min visualising workflows (exercise C4 and C5)
18:10- 18:30
- MAIN/MURAL immediate plans for tomorrow: Prioritization of training needs
- closure