Skip to content

Instantly share code, notes, and snippets.

@manuel103
Last active July 16, 2020 17:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save manuel103/79606ce97fc54bdbdaca1498a8428fa1 to your computer and use it in GitHub Desktop.
Save manuel103/79606ce97fc54bdbdaca1498a8428fa1 to your computer and use it in GitHub Desktop.
Machine Learning - Beginner to Advanced
import pandas as pd
def load_housing_data():
return pd.read_csv('housing.csv')
housing = load_housing_data()
housing.head()
import numpy as np
# For illustration only. Sklearn has train_test_split()
def split_train_test(data, test_ratio):
shuffled_indices = np.random.permutation(len(data))
test_set_size = int(len(data) * test_ratio)
test_indices = shuffled_indices[:test_set_size]
train_indices = shuffled_indices[test_set_size:]
return data.iloc[train_indices], data.iloc[test_indices]
# Using the function to split
train_set, test_set = split_train_test(housing, 0.2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment