JISC Digital Festival 9-10 March 2015 #digifest15
The key problem: Information flow “too slow and too impoverished”
We bury data into the publication. RIP “Rest in Publication”.
Utopia Documents http://getutopia.com 'datafies’ PDFs by pulling out.
Why do we have to break publications into pieces to get at data instead of making data “born reproducible”?
Scientific Publications are “virtual witnessing”.
Publications are not the scholarship. They advertise the scholarship.
A lot of papers have no access to primary data, broken links, no software versioning, released code etc.
Not only can’t access the data can’t access the method. Need both to be able to reproduce.
Broken software = broken science.
Hilariously scathing about the effort put into creating software and training scientists in use of software tools compared to say laboratory equipment.
Involves tools, standards, machine actionable, formats, reporting, policies, practices.
Libraries are being crushed between the “republic of science” and the regulation of science.
Discusses at length many of the ways science can go wrong: it is messy, honest error, deliberate fraud, inherent to the type of experiment.
Scientist’s desks are messy. Scientist’s find it difficult to reproduce their own research in their own labs.
There can be problems with the scientific method: poor training and approach. There are also problems from the social environment with pressure to publish, impact factor mania, broken peer review, time pressures and general disorganisation.
Really fragmented research and publishing environments/ecosystems.
- Data collection
- Data discovery
- Data assembly, cleaning, refinement
- Statistical analysis
- Scholarly Communication and reporting
My Experiment (http://www.myexperiment.org/home) pack contains all the assets needed to report and reproduce an experiment.
Aggregate outputs. Compound investigations, research products. These are units of exchange. These form a commons and provide contextual metadata on the input to experiments not just the outputs.
Research objects are First Class Citizens. They include data, software, methods and paper. They have IDs, they can be managed, credited, tracked, profiled.
The resources in them may span, multiple assets not just those contained within the repository.
- closed to open
- local to alien
- embed to refer
- fixed to fluid
These multi-typed, stewarded, sited, authored objects span research, researchers, platforms and time. How do we store, cite, steward, store these?
Also a shift from
- document to package
- publish to release
Research objects being used to package code, study, data and metadata and send it to others.
Mozilla Science Lab been working on code as a research object.
Research is not a series of static documents that are published but a series of research objects that are released like software. They fork and merge like software. They are version controlled and cited much as software. Apply to all of the research object components.
It is the entire study that backs a paper … not a piece of data.
FAIRDOM: Aggregated Commons infrastructure = uber cataloguing tool. Holds all of the pieces together for a particular study.
Research objects can not just be thought of as metadata packages but as instruments. Data and software as instrument. The Research Object workflow as an instrument. Reproducibility is facing uncertainty and change. The lab changes, science changes.
“The questions don’t change but the answers do” - Dan Reed
You have to “prepare to repair"
Be careful with The Cloud. Try replacing the word Cloud with Clown and see how it sounds. If you use The Cloud make sure there is a way to get your data out: a lifeboat, and escape pod.
Different types of reproducibility:
- Rerun (Robust) - Variations/Internal
- Repeat (Defend) - Same Experiment/Internal
- Replicate (Certify) - Same Experiment/Peer Review
- Reproduce (Compare) - Variations on Experiment
- Reuse (Transfer) - Different Experiment
It is a big jump from the RARE space (research environment) to the FAIR space (publishing environment)
- Reproduce by Reading Archived Record/Retaining
- Reproduce by Running (Virtual Machines)
Goble confessing to slight bitterness on the research/REF process. Tells of how she was criticised for writing a paper so that people would be able to read it.
Model and standards for packaging and publishing research objects manifests.
All sounds good but is a small part used by computationally savvy researchers. Reality is lab books with things stuck in them, files and spreadsheets.
To move from there to RARE and FAIR we need:
- stealthy progress (reduce friction, optimise The Neylon Equation). For example better data structures, controlled vocabulary in spreadsheets.
- auto-magical end-to-end instrumentation. For example electronic lab notebooks.
- get over credit. Credit is not the same as authorship. Need to optimise love, money, fame and duty
- training. For example software and data carpentry. Also establish pool of software engineers that researchers can call on to help them develop software.
- need to make reproducibility (public good) a side effect of personal productivity.
- incremental for infrastructure providers
- moderate for policy makers and stewards
- paradigm for researchers and institutions
- method matters
- studies born reproducible
- be smart about reproducibility
- think commons not repository
- think release not publish