Skip to content

Instantly share code, notes, and snippets.

@gauravbansal98
Last active May 24, 2018 07:36
Show Gist options
  • Save gauravbansal98/a42614b9fab897523a782583705ea2f5 to your computer and use it in GitHub Desktop.
Save gauravbansal98/a42614b9fab897523a782583705ea2f5 to your computer and use it in GitHub Desktop.
def create_feature_sets_and_labels(test_size = 0.1): #separate the data into training and testing
#test size is the size of the testing data
lexicon = create_lexicon()
features = []
features += feature_vectors('pos.txt',lexicon,[1,0])
features += feature_vector('neg.txt',lexicon,[0,1])
random.shuffle(features) #to shuffle all the feature vectors
features = np.array(features)
testing_size = int(test_size*len(features)) # testing size will be .1 of the total data
train_x = list(features[:,0][:-testing_size]) # features[:, 0] is used as features is a list of list as explained above
train_y = list(features[:,1][:-testing_size])
test_x = list(features[:,0][-testing_size:])
test_y = list(features[:,1][-testing_size:])
return train_x,train_y,test_x,test_y
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment