Skip to content

Instantly share code, notes, and snippets.

@annakrystalli
Last active October 2, 2016 22:58
Show Gist options
  • Save annakrystalli/25d58384f691b6b91df143337bd5ed4c to your computer and use it in GitHub Desktop.
Save annakrystalli/25d58384f691b6b91df143337bd5ed4c to your computer and use it in GitHub Desktop.
Crowdsourcing ideas for rmacroRDM basic data quality check strategy

rmacroRDM issue #10


There are a number of areas in the package that various checks are performed but could do with being more strategic about it. Should link in with developing tests #14 .

Q: What do you consider the most important elements of ensuring quality of your data? eg:

check all that apply. add your own

  • handling of missing values?
  • checking data.types against expectations?
  • handling of white.space and blank lines
  • consistency of variable names throughout file.system
  • consistency of species names throughout file.system
  • identifying typos
  • identifying outliers
  • identifying duplicates


Q: What tools in r and at what stage of data processing have you found these to be most effective?


Please feel free to fork and share thoughts and ideas!

@auremoser
Copy link

Aure Moser

Q: What do you consider the most important elements of ensuring quality of your data? eg:

check all that apply. add your own

  • handling of missing values?
  • checking data.types against expectations?
  • handling of white.space and blank lines
  • consistency of variable names throughout file.system
  • consistency of species names throughout file.system
  • identifying typos
  • identifying outliers
  • identifying duplicates

I'm not sure I did this right, some of the issues seem to be more related to data clean-up and prep before analysis (handling white space/blank lines, naming conventions/consistency, and some seem like they would be more valuable when analyzing (identifying outliers, duplicates, missing values, checking against expectation, etc.)

Q: What tools in r and at what stage of data processing have you found these to be most effective?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment