Skip to content

Instantly share code, notes, and snippets.

@dhfromkorea
Created September 11, 2017 19:55
Show Gist options
  • Save dhfromkorea/b60c6538d1ae0b9f39149f143e3e542a to your computer and use it in GitHub Desktop.
Save dhfromkorea/b60c6538d1ae0b9f39149f143e3e542a to your computer and use it in GitHub Desktop.
keyword example
# this model is the most naive one
# find the occurences of the keywords listed and consider them to be matched.
# TODO will be to apply a Bag of Words model or some sort and assign credit score
# read raw caption files
caption_data, metadata = next(load_caption_files(path_to_caption_file))
# combine captions by 10 seconds (i.e. make time blocks)
X = split_caption_to_X(caption_data)
keywords = ['caption', 'type=story', 'type=commercial']
model = KeywordSearch(keywords)
is_matched = model.predict(X)
matched_lines = self.X[is_matched]
# matched_lines is a boolean vector
# e.g. [True, False, False, False, ... True, ... False]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment