Skip to content

Instantly share code, notes, and snippets.

@mutafaf
Created July 2, 2018 21:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mutafaf/7715ad67bc3cf4e08985afefcc0ce08a to your computer and use it in GitHub Desktop.
Save mutafaf/7715ad67bc3cf4e08985afefcc0ce08a to your computer and use it in GitHub Desktop.
Data Science - Decision Tree
age dis gen
30 0 0
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 7 1
29 8 1
29 9 1
29 10 1
29 11 1
29 12 1
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 8 1
29 12 1
29 10 1
29 7 1
29 9 1
29 13 1
29 11 1
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 8 1
29 12 1
29 10 1
29 7 1
29 9 1
29 13 1
29 11 1
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 8 1
29 12 1
29 10 1
29 7 1
29 9 1
29 13 1
29 11 1
27 14 2
27 15 2
27 16 2
27 17 2
27 18 2
27 19 2
27 8 2
27 20 2
27 21 2
27 22 2
27 23 2
27 16 2
27 24 2
27 25 2
27 26 2
27 15 2
27 18 2
27 27 2
27 28 2
27 14 2
27 16 2
27 19 2
27 15 2
27 17 2
27 18 2
31 14 0
31 16 0
31 19 0
31 29 0
31 26 0
31 30 0
31 14 0
31 29 0
31 16 0
31 26 0
31 30 0
31 19 0
28 31 1
35 32 1
35 33 1
35 8 1
35 34 1
35 35 1
35 26 1
35 36 1
35 21 1
35 17 1
31 27 1
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@greed2411
Copy link

You are supposed to pass numpy arrays and not lists as arguments to the DecisionTree, since your input was a list it gets trained as 70 features (1D list) and your test had list of 30 elements and the classifier sees it as 30 features.

Nonetheless, you need to reshape your input numpy array and pass it as a matrix

meaning: X_train.values.reshape(-1, 1) instead of X_train (it should be a numpy array not a list)

this is the entire gist:

X_train, X_test, y_train, y_test = train_test_split(data.dis, data.gen, test_size=0.30, random_state=42)
from sklearn import tree
c = tree.DecisionTreeClassifier()
c.fit(X_train.values.reshape(-1, 1), y_train)
accu_train = np.sum(c.predict(X_train.values.reshape(-1, 1)) == y_train)/y_train_size
accu_test = np.sum(c.predict(X_test.values.reshape(-1, 1)) == y_test)/y_test_size
print("Accuracy on Train: ", accu_train)
print("Accuracy on Test: ", accu_test)

I'm getting the following output:

Accuracy on Train:  0.8857142857142857
Accuracy on Test:  0.7333333333333333

Thanks for sharing the dataset. It was helpful for testing it. Hope this helps.

@mutafaf
Copy link
Author

mutafaf commented Jul 15, 2020

Sorry, can't answer your question. Hopefully someone with experience can answer well @parthsalunke.
Best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment