Skip to content

Instantly share code, notes, and snippets.

@mutafaf

mutafaf/ndata.csv

Created Jul 2, 2018
Embed
What would you like to do?
Data Science - Decision Tree
age dis gen
30 0 0
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 7 1
29 8 1
29 9 1
29 10 1
29 11 1
29 12 1
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 8 1
29 12 1
29 10 1
29 7 1
29 9 1
29 13 1
29 11 1
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 8 1
29 12 1
29 10 1
29 7 1
29 9 1
29 13 1
29 11 1
29 1 1
29 2 1
29 3 1
29 4 1
29 5 1
29 6 1
29 8 1
29 12 1
29 10 1
29 7 1
29 9 1
29 13 1
29 11 1
27 14 2
27 15 2
27 16 2
27 17 2
27 18 2
27 19 2
27 8 2
27 20 2
27 21 2
27 22 2
27 23 2
27 16 2
27 24 2
27 25 2
27 26 2
27 15 2
27 18 2
27 27 2
27 28 2
27 14 2
27 16 2
27 19 2
27 15 2
27 17 2
27 18 2
31 14 0
31 16 0
31 19 0
31 29 0
31 26 0
31 30 0
31 14 0
31 29 0
31 16 0
31 26 0
31 30 0
31 19 0
28 31 1
35 32 1
35 33 1
35 8 1
35 34 1
35 35 1
35 26 1
35 36 1
35 21 1
35 17 1
31 27 1
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@greed2411

This comment has been minimized.

Copy link

@greed2411 greed2411 commented Jul 3, 2018

You are supposed to pass numpy arrays and not lists as arguments to the DecisionTree, since your input was a list it gets trained as 70 features (1D list) and your test had list of 30 elements and the classifier sees it as 30 features.

Nonetheless, you need to reshape your input numpy array and pass it as a matrix

meaning: X_train.values.reshape(-1, 1) instead of X_train (it should be a numpy array not a list)

this is the entire gist:

X_train, X_test, y_train, y_test = train_test_split(data.dis, data.gen, test_size=0.30, random_state=42)
from sklearn import tree
c = tree.DecisionTreeClassifier()
c.fit(X_train.values.reshape(-1, 1), y_train)
accu_train = np.sum(c.predict(X_train.values.reshape(-1, 1)) == y_train)/y_train_size
accu_test = np.sum(c.predict(X_test.values.reshape(-1, 1)) == y_test)/y_test_size
print("Accuracy on Train: ", accu_train)
print("Accuracy on Test: ", accu_test)

I'm getting the following output:

Accuracy on Train:  0.8857142857142857
Accuracy on Test:  0.7333333333333333

Thanks for sharing the dataset. It was helpful for testing it. Hope this helps.

@mutafaf

This comment has been minimized.

Copy link
Owner Author

@mutafaf mutafaf commented Jul 15, 2020

Sorry, can't answer your question. Hopefully someone with experience can answer well @parthsalunke.
Best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment