Skip to content

Instantly share code, notes, and snippets.

@erykml
Last active August 7, 2021 13:21
Show Gist options
  • Save erykml/844c86e53b1f8f37bd5ada7746e5a732 to your computer and use it in GitHub Desktop.
Save erykml/844c86e53b1f8f37bd5ada7746e5a732 to your computer and use it in GitHub Desktop.
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
boston = load_boston()
y = boston.target
X = pd.DataFrame(boston.data, columns = boston.feature_names)
np.random.seed(seed = 42)
X['random'] = np.random.random(size = len(X))
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size = 0.2, random_state = 42)
@abhinavnayak11
Copy link

abhinavnayak11 commented Aug 1, 2021

Shouldn't test_size be 0.2 or something?
With test_size = 0.8, len(X_train) = 101 and len(X_valid) = 405.

@erykml
Copy link
Author

erykml commented Aug 7, 2021

Indeed, it should be 0.2. At first, I had train_size there and overlooked it while changing to test_size. Thanks for pointing this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment