Skip to content

Instantly share code, notes, and snippets.

@gatorwatt
Last active October 10, 2020 04:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gatorwatt/ef36c034a1831ca153f97b129d54adcc to your computer and use it in GitHub Desktop.
Save gatorwatt/ef36c034a1831ca153f97b129d54adcc to your computer and use it in GitHub Desktop.

image

Prepare Tabular Data for ML with Automunge


Presenter

Nicholas Teague

website: automunge.com

twitter: @_NicT_


Context

Mainstream machine learning libraries have several prerequisites:

  • numerical encoded data
  • normalized sets
  • all valid entries
  • consistent transforms between training and test data

Automunge is a platform for preparing tabular data for machine learning.


Description

Automunge is an open source python library. Through application tabular training data may be prepared to meet the prerequisites of machine learning, and subsequent data may easily be prepared on a consistent basis.

Automunge is suitable for both novice and expert alike, as transformations may be performed under automation, or users may also engineer custom data pipelines.

Some highlights include:

  • automated ML derived missing data infill aka "ML infill"
  • extensive library of data transformations, or users may define their own
  • simple specification of transformation sets that may include generations and branches of derivations
  • automated feature importance evaluation
  • automated dimensionality reductions
  • automated evaluation of data property drift between training data and subsequent data

Documentation

Full documentation is availalbe on GitHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment