- Model building in scikit-learn (refresher)
- Representing text as numerical data
- Reading a text-based dataset into pandas
- Vectorizing our dataset
- Building and evaluating a model
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. A data wrangler is a person who performs these transformation operations. Wiki
Wrangler is an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. stanford
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. A data wrangler is a person who performs these transformation operations. Wiki
Wrangler is an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. stanford
This manual mode where you can test this predicition model with runtime names.
def model_evaluation(classifier):
print('<<< Testing Module >>> ')
print('Enter "q" or "quit" to end testing module')
while 1:
test_name = input('\n Enter name to classify: ')
if test_name.lower() == 'q' or test_name.lower() == 'quit':
print('End')
exit(1)
def train_and_test(train_percent=0.80):
feature_set = prepare_data_set()
validate_data_set(feature_set)
random.shuffle(feature_set)
total = len(feature_set)
cut_point = int(total * train_percent)
# splitting Dataset into train and test
train_set = feature_set[:cut_point]
test_set = feature_set[cut_point:]
Feature/attributes/input/predictors extraction from given name string.
def extract_feature(name: str):
name = name.upper()
feature = dict()
# additional feature extraction
# feature["first_1"] = name[0]
# for letter in 'abcdefghijklmnopqrstuvwxyz'.upper():
You can download the dataset at here
!/usr/bin/env python3.5
# -*- coding: utf-8 -*-
import os
import random
from zipfile import ZipFile
from nltk import NaiveBayesClassifier, MaxentClassifier, DecisionTreeClassifier, classify
# -*- coding: utf-8 -*- | |
import re | |
import email | |
import smtplib | |
import mimetypes | |
from email.mime.multipart import MIMEMultipart | |
from email import encoders | |
from email.mime.audio import MIMEAudio | |
from email.mime.base import MIMEBase |