Nicholas Teague
website: automunge.com
twitter: @_NicT_
Mainstream machine learning libraries have several prerequisites:
- numerical encoded data
- normalized sets
- all valid entries
- consistent transforms between training and test data
Automunge is a platform for preparing tabular data for machine learning.
Automunge is an open source python library. Through application tabular training data may be prepared to meet the prerequisites of machine learning, and subsequent data may easily be prepared on a consistent basis.
Automunge is suitable for both novice and expert alike, as transformations may be performed under automation, or users may also engineer custom data pipelines.
Some highlights include:
- automated ML derived missing data infill aka "ML infill"
- extensive library of data transformations, or users may define their own
- simple specification of transformation sets that may include generations and branches of derivations
- automated feature importance evaluation
- automated dimensionality reductions
- automated evaluation of data property drift between training data and subsequent data
Full documentation is availalbe on GitHub