Skip to content

Instantly share code, notes, and snippets.

@spidezad
Created May 21, 2019 06:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save spidezad/10d68425fe1b46829f99316aa5c951f9 to your computer and use it in GitHub Desktop.
Save spidezad/10d68425fe1b46829f99316aa5c951f9 to your computer and use it in GitHub Desktop.
shopee1_cleaning.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "def clean_text(text):\n \"\"\" Return: modified cleaned initial string.\n \"\"\"\n\n text = text.lower() \n text = re.sub('[/(){}\\[\\]\\|@,;]',' ', text) # replace with space\n text = re.sub('[^0-9a-z #+_]', '', text) # remove special characters\n\n return text",
"execution_count": 9,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# replace nan with best value based on exploration\ndf['Warranty Period'] = df['Warranty Period'].fillna(df['Warranty Period'].mode()[0])\ndf['Operating System'] = df['Operating System'].fillna(df['Operating System'].mode()[0])\ndf['Network Connections'] = df['Network Connections'].fillna(df['Network Connections'].mode()[0])",
"execution_count": 7,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Convert attributes to integer\nfor col in [ 'Operating System', 'Features','Network Connections', 'Memory RAM', 'Brand', 'Warranty Period',\\\n 'Storage Capacity', 'Color Family', 'Phone Model', 'Camera', 'Phone Screen Size']:\n df[col] = df[col].fillna(-1)\n df[col] = df[col].astype('int')\n df[col] = df[col].astype('str')\n df[col] = df[col].replace('-1', np.nan)\n\ndf['title1'] = df['title'].apply(clean_text)",
"execution_count": 10,
"outputs": []
}
],
"metadata": {
"_draft": {
"nbviewer_url": "https://gist.github.com/91545bc442cb4f4e5d5387f3bfb03f55"
},
"gist": {
"id": "91545bc442cb4f4e5d5387f3bfb03f55",
"data": {
"description": "shopee1_cleaning.ipynb",
"public": true
}
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.4",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment