Skip to content

Instantly share code, notes, and snippets.

@marcelomgarcia
Last active March 27, 2024 07:59
Show Gist options
  • Save marcelomgarcia/3045beb85dddb1e3566117d70a3b21aa to your computer and use it in GitHub Desktop.
Save marcelomgarcia/3045beb85dddb1e3566117d70a3b21aa to your computer and use it in GitHub Desktop.

Research Data Tools Examples

Plan

Data management planning (DMP): Tools focused on enabling preparation and submisssion of data management plans:

  • DMPonline
  • Argos (machine actionable data plan)
  • RDMO (provided a comprehensive documentation in English, a demo area, and it's open source.)

Project planning: Tools designed to enable project planning:

  • MS Project (the standard tool for project management.)
  • Trello (probably the second most well known project management tool and, possibly, integration with Atlassiana.)
  • Asana (looked more a project management tool than Monday.com, with connectors to other tools.)

Combined DMP/project: Tools which combine project planning with the ability to prepare data management plans:

  • DSW (This is the only tool that combines everything in the description above.)

Collect

Quantitative data collection tool: Tools that collect quantitative data:

  • MATLAB Data Aquistion Toolbox (a complete, although expensive, solution.)
  • CEDAR Workbench (biomedical data) (emphasis on the metadata.)

Qualitative data collection (e.g. Survey tool): Tools that collect qualitative data:

  • REDCap
  • SurveyJS (interesting project open-source project to create forms for surveys.)
  • Kobo
  • TeamScope (Clinical Research Collection App)
  • Animal Tracker App (track animals in the wild.)
  • Track3D (track animals in a confined space)
  • Citizen Science Tools (citizen science) (this seems to be a kind of marketplace for citizen science projects.)

Note: there isn't much science here since I'm not familiar with these tools.

Harvesting tool (e.g. WebScrapers): Tools that harvest data from various sources:

  • Beautiful Soup (Python)
  • Scrapy (Python)
  • Netlytic (collect data from social media)

Process

Electronic laboratory notebooks (ELNs): Tools that enable aggregation, management, and organization of experimental and physical sample data:

  • Benchling (Biology)
  • E-lab FTW (Open source)
  • Mbook (Chemistry)
  • RSpace ELN

Scientific computing across all programming languages: Tools that enable creation and sharing of computational documents

Metadata Tool: Tools that enable creation, application, and management of metadata, and embedding of metadata in other kinds of tools

  • CEDAR Workbench (biomedical data)

Store

Repository (e.g. MySQL, DSpace): Tools that structure and provide a framework to organise information

  • Generalist repository: DSpace, Figshare, Zenodo, Dryad, etc.
  • Data repository: CKAN, Dataverse, NOMAD-OASIS (material science), etc
  • RDMS: Oracle, MySQL, MariaDB, Postgres, sqlite

Archive: Tools that facilitate the long-term storage of data

  • Libsafe
  • Preservica

Management tool (e.g. iRODS, GLOBUS, Mediaflux ): Tools that facilitate the organisation of data:

Share

Data repository: Tools that enable storage, and public sharing of data

  • Data repository: CKAN, Dataverse, NOMAD-OASIS

Electronic laboratory notebooks (ELNs): Tools that enable aggregation, organization and management of experimental and physical sample data

Scientific computing across all programming languages: Tools that enable creation and sharing of computational documents

Transform

Electronic laboratory notebooks (ELNs) Tools that enable aggregation, management, and organization of experimental and physical sample data

  • Benchling (Biology)
  • E-lab FTW (Open source)
  • Mbook (Chemistry)
  • RSpace ELN

Programming languages:

  • Python (including Pandas, Numpy, etc.)
  • R
  • Julia

Extract, Transform, Load (ETL) tools: Tools that enable 'extract, transform, load'—a data integration process used to combine data from multiple sources into a single, consistent data set for loading into a data warehouse, data lake or other target system:

  • Apache Spark
  • Apache Hadoop
  • Google: Cloud Data Fusion + Dataflow + Dataproc
  • AWS Glue
  • Microsoft SQL Server Integration Services (SSIS) or Azure Data Factory
  • Oracle Data Integrator

Note: The problem with the term ETL is that implies big data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment