You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. A data wrangler is a person who performs these transformation operations. Wiki
Wrangler is an interactive tool for data cleaning and transformation.
Spend less time formatting and more time analyzing your data. stanford
Example - 1
0 - Requirement
I was given a data problem where I have to write a model to auto-clean database values without manual work. This was my first practical ML solution delivered to my client.
Analysing the dataset before processing. I was given a column of actual values their corresponding correction values. I have planned to use the same solution similar to name gender prediction in my previous project Github - Name Gender Prediction
array(['Female', 'Male', 'Other/Prefer Not To Answer'], dtype=object)
2. Solution
Making feature matrix X
deffeature_extraction(_data):
""" This function is used to extract features in a given data value"""_data=_data.lower()
f_1, f_2, f_3, f_4, l_1, l_2, l_3, l_4=None, None, None, None, None, None, None ,None# extracting first and last 4 charactersiflen(_data) >=4:
f_4=_data[:4]
l_4=_data[-4:]
# extracting first and last 3 charactersiflen(_data) >=3:
f_3=_data[:3]
l_3=_data[-3:]
# extracting first and last 2 charactersiflen(_data) >=2:
f_2=_data[:2]
l_2=_data[-2:]
# extracting first and last 1 characteriflen(_data) >=1:
f_1=_data[:1]
l_1=_data[-1:]
feature= {
'f_1': f_1,
'f_2': f_2,
'l_1': l_1,
'l_2': l_2,
'f_3': f_3,
'f_4': f_4,
'l_3': l_3,
'l_4': l_4
}
returnfeature
-4.701 f_1=='m' and label is 'Other/Prefer Not To Answer'
3.138 l_1=='f' and label is 'Female'
3.132 l_1=='男' and label is 'Male'
3.132 f_1=='男' and label is 'Male'
-2.761 f_1=='m' and label is 'Female'
2.704 l_1=='女' and label is 'Female'
2.704 f_1=='女' and label is 'Female'
2.640 l_2=='nő' and label is 'Female'
2.640 l_1=='ő' and label is 'Female'
2.640 f_2=='nő' and label is 'Female'
None