Skip to content

Instantly share code, notes, and snippets.

@rahulbot
Last active February 22, 2018 00:52
Show Gist options
  • Save rahulbot/8d85eee42495d2216015c57c3ba12f3f to your computer and use it in GitHub Desktop.
Save rahulbot/8d85eee42495d2216015c57c3ba12f3f to your computer and use it in GitHub Desktop.
Querying Media Cloud for the top recent tags on stories
MC_APY_KEY = "MY_KEY"
mc = mediacloud.api.AdminMediaCloud(MY_KEY)
# tag sets that hold tags on stories
NYT_LABELS_TAG_SET = 1963 # one tag per theme in a story (Jasmin's transfer-learning model)
GEO_TAG_SET = 1011 # one tag per country/state stories are about (disambiguated)
CLIFF_ORGS_TAG_SET = 2388 # one tag for each org mentioned in stories
CLIFF_PEOPLE_TAG_SET = 2389 # one tag for each perosn mentioned in stories
US_PEW_TOP_MEDIA_COLLECTION_ID = 9139487
# find the most use tags within a set over the last few months in the US Top Online set of sources
mc.sentenceFieldCount('*',[
'tags_id_media:{}'.format(US_PEW_TOP_MEDIA_COLLECTION_ID),
'publish_date:NOW to NOW-3MONTH'
],
tag_sets_id=CLIFF_PEOPLE_TAG_SET,
sample_size=5000)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment