livibetter/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Better YouTube Recommendation

yt-reco.py was written to filter recommendation and prevent repeatedly seeing the same videos. Once a video is showed, the video ID is stored in a JSON for look-up in next invocation.

installation


Save the script somewhere.


Run the script to authorize it:
$ /path/to/yt-reco.py -s /path/to/yt-reco.dat --auth
Follow the instruction, then there should be a /path/to/yt-reco.dat created.


Run the script to process:
$ /path/to/yt-reco.py -s /path/to/yt-reco.dat -j /path/to/yt-reco.json > output.html


You can also set up cron job to do it automatically:
With cron

I have this cron task to run the helper scrip five minutes every three hours from midnight:
5     */3   *     *       *       /path/to/yt-reco.sh

and this yt-reco.sh helper script:
#!/bin/bash

SCRIPT=/path/to/yt-reco.py
JSON=/path/to/yt-reco.json
STORAGE=/path/to/yt-reco.dat
printf -v TS '%(%Y%m%d-%H%M%S)T' -1
DEST="/path/to/yt-reco.$TS.html"

OUTPUT=$(PYTHONPATH="/path/to/custom/filter" "$SCRIPT" -s "$STORAGE" -j "$JSON" -f yt_reco_custom)
if [[ $OUTPUT ]]; then
  echo "$OUTPUT" > "$DEST"
fi
This helper script only write the HTML when there is output, and the filename is time stamped. I also have a custom filter script, /path/to/custom/filter/yt_reco_custom.py. You can see the Bash script add path to PYTHONPATH, so it can be found.
usage

usage: yt-reco.py [-h] [-s STORAGE] [-j JSON] [-f FILTER] [-a]

optional arguments:
  -h, --help            show this help message and exit
  -s STORAGE, --storage STORAGE
                        the credential file (default: yt-reco.dat)
  -j JSON, --json JSON  showed videos save file in JSON (default:
                        /home/livibetter/p/gist/yt-reco/yt-reco.py.json)
  -f FILTER, --filter FILTER
                        Python script to handle filter
  -a, --always          always output HTML, even no videos

The STORAGE is where the credential stores, and JSON is where the list of video IDs stored.
custom filter

You can use your own filter, for example:
$ ./yt-reco.py -f custom
There should be a custom.py to match custom in the option. In the script, FILTER function is used to filter:
def FILTER(item):

  return True
The code above does no filter, that's everything will be shown. The following code is equivalent to without -f:
def FILTER(item):

  return item['snippet']['type'] == 'recommendation'
You can do whatever to filter anyway you like, return True if you want the video to be shown. The structure can be found on API documentation.
Here is an actual code I am using to exclude certain channel by channel ID and/or title:
BLOCKED_CHANNEL_IDS = [
  'tHeChhNALLID',
]


def FILTER(item):

  s = item['snippet']
  channelId = s['channelId']

  c = item['contentDetails']

  if channelId in BLOCKED_CHANNEL_IDS:
    return False

  if s['type'] == 'upload':
    title = s['title']
    if channelId == 'AnOTheRCHANNelID' and 'unwanted string' in title:
      return False

  return True
I only use simple string check because that is all I need now, I might use regular expression in the future when there is a need for more advanced filter.
type


recommendation: recommended reason video is linked in "recommendation" text on top-right corner.
upload: the channel owner uploads the video.
comment, like: the channel owner comments or likes the video.
playlistItem: the video is added to the playlist, playlist item is linked in "playlistItem" text on top-right corner.
subscription: this type is useless because there is no data about what the channel owner is subscribed to in the returned JSON.

There are more types and they may cause troubles, but I've only deal with those above.
related links


my blog post about this script.


## example.png

      
    Raw
  

              example.png
            
          
## yt-reco.py
#!/usr/bin/env python
# Better YouTube Recommendations
# Copyright (c) 2013, 2015 Yu-Jie Lin
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#
# Gist: https://gist.github.com/livibetter/7428568

from __future__ import print_function

import argparse
import datetime
import imp
import json
import os
import sys
import time
from os import path

import httplib2

from apiclient.discovery import build
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.file import Storage as BaseStorage
from oauth2client.tools import argparser, run_flow

API_STORAGE = 'yt-reco.dat'

API_PART = 'snippet,contentDetails'
API_FIELDS = 'items(contentDetails,snippet),nextPageToken'

# default JSON file is in current working directory
DEFAULT_JSON = path.abspath(path.basename(sys.argv[0]) + '.json')
# extract video ID from thumbnail URL
#   https://i.ytimg.com/vi/VIDEO_ID/default.jpg
# the videoId field deep in items[].contentDetails isn't nested same:
#   contentDetails.upload.videoId
#   contentDetails.recommendation.resourceId.videoId
ITEM_THUMB_URL = lambda item: item['snippet']['thumbnails']['default']['url']
ITEM_VIDEO_ID = lambda item: ITEM_THUMB_URL(item).rsplit('/', 2)[-2]
FILTER_RECO_ONLY = lambda item: item['snippet']['type'] == 'recommendation'

TMPL_HTML = u'''\
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Better YouTube Recommendations</title>
<style>
body {{
  font-family: Arial;
}}
a {{
  color: #4AF;
  text-decoration: none;
  font-weight: bold;
}}
a:hover {{
  text-decoration: underline;
}}
.wrapper {{
  width: 640px;
  margin: 0 auto;
}}
.video {{
  margin: 10px 0;
}}
.video > .type {{
  float: right;
  color: #04C;
  font-style: italic;
}}
.video > .type > a {{
  color: #04C;
}}
.video > .thumbnail {{
  width: 120px;
  float: left;
}}
.video > .snippet {{
  margin-left: 130px;
}}
.video > .snippet > .channelTitle {{
  font-weight: bold;
  line-height: 1.5em;
}}
.video > .snippet > .channelTitle > a {{
  color: #666;
}}
.video > .snippet > .title {{
  line-height: 1.5em;
}}
.video > .description {{
  clear: both;
  padding-top: 10px;
  white-space: pre-wrap;
  line-height: 1.2em;
  max-height: 12em;
  overflow: auto;
}}
.generated {{
  clear: both;
  text-align: center;
}}
</style>
</head>
<body>
<div class='wrapper'>
{items}
<div class='generated'>
Generated at <time datetime='{utc}'>{now}</time>
</div>
</div>
</body>
</html>'''
TMPL_ITEM_BASE = u'''\
<div class='video'>
  <div class='type'>{snippet[type]}</div>
  <div class='thumbnail'>
    <img src="{snippet[thumbnails][default][url]}"/>
  </div>
  <div class='snippet'>
    <div class='channelTitle'>
      <a href="https://www.youtube.com/channel/{snippet[channelId]}">\
{snippet[channelTitle]}</a>
    </div>
    <div class='title'>
      <a href="{url}">{snippet[title]}</a>
    </div>
  </div>
  <pre class='description'>{snippet[description]}</pre>
</div>
'''
TMPL_ITEM_PL = u'''\
<div class='video'>
  <div class='type'>
    <a href="https://www.youtube.com/watch?\
v={contentDetails[playlistItem][resourceId][videoId]}&\
list={contentDetails[playlistItem][playlistId]}\
">{snippet[type]}</a>
  </div>
  <div class='thumbnail'>
    <img src="{snippet[thumbnails][default][url]}"/>
  </div>
  <div class='snippet'>
    <div class='channelTitle'>
      <a href="https://www.youtube.com/channel/{snippet[channelId]}">\
{snippet[channelTitle]}</a>
    </div>
    <div class='title'>
      <a href="{url}">{snippet[title]}</a>
    </div>
  </div>
  <pre class='description'>{snippet[description]}</pre>
</div>
'''
TMPL_ITEM_RECO = u'''\
<div class='video'>
  <div class='type'>
    <a href="\
http://youtu.be/{contentDetails[recommendation][seedResourceId][videoId]}\
">{snippet[type]}</a>
  </div>
  <div class='thumbnail'>
    <img src="{snippet[thumbnails][default][url]}"/>
  </div>
  <div class='snippet'>
    <div class='channelTitle'>
      <a href="https://www.youtube.com/channel/{snippet[channelId]}">\
{snippet[channelTitle]}</a>
    </div>
    <div class='title'>
      <a href="{url}">{snippet[title]}</a>
    </div>
  </div>
  <pre class='description'>{snippet[description]}</pre>
</div>
'''
TMPL_ITEM_SUB = u'''\
<div class='video'>
  <div class='type'>{snippet[type]}</div>
  <div class='thumbnail'>
    <img src="{snippet[thumbnails][default][url]}"/>
  </div>
  <div class='snippet'>
    <div class='channelTitle'>
      <a href="\
https://www.youtube.com/channel/\
{contentDetails[subscription][resourceId][channelId]}">\
{snippet[channelTitle]}</a>
    </div>
  </div>
  <pre class='description'></pre>
</div>
'''
TMPL_ITEM = {
  'base': TMPL_ITEM_BASE,
  'playlistItem': TMPL_ITEM_PL,
  'recommendation': TMPL_ITEM_RECO,
  'subscription': TMPL_ITEM_SUB,
}


class Storage(BaseStorage):
  """Inherit the API Storage to suppress CredentialsFileSymbolicLinkError
  """

  def __init__(self, filename):

    super(Storage, self).__init__(filename)
    self._filename_link_warned = False

  def _validate_file(self):

    if os.path.islink(self._filename) and not self._filename_link_warned:
      print('File: %s is a symbolic link.' % self._filename)
      self._filename_link_warned = True


def auth(filename):

  FLOW = OAuth2WebServerFlow(
    '129265312357.apps.googleusercontent.com',
    'IFN3AD_F0MbYEM4jQyeNbEYJ',
    'https://www.googleapis.com/auth/youtube.readonly',
    auth_uri='https://accounts.google.com/o/oauth2/auth',
    token_uri='https://accounts.google.com/o/oauth2/token',
  )

  storage = Storage(filename)
  credentials = storage.get()
  if credentials is None or credentials.invalid:
    credentials = run_flow(FLOW, storage, argparser.parse_args([]))

  http = httplib2.Http()
  return build("youtube", "v3", http=credentials.authorize(http))


def main():

  p = argparse.ArgumentParser(description='better YouTube recommendations')
  p.add_argument('-s', '--storage', default=API_STORAGE,
                 help='the credential file (default: %(default)s)')
  p.add_argument('-j', '--json', default=DEFAULT_JSON,
                 help='showed videos save file in JSON (default: %(default)s)')
  p.add_argument('-f', '--filter', help='Python script to handle filter')
  p.add_argument('-a', '--always', action='store_true',
                 help='always output HTML, even no videos')
  p.add_argument('--auth', action='store_true',
                 help='authorize the script and exit')
  args = p.parse_args()

  yt = auth(args.storage)
  if args.auth:
    return

  FILTER = FILTER_RECO_ONLY
  # load custom filter
  if args.filter:
    _mod_data = imp.find_module(args.filter)
    try:
      FILTER = imp.load_module(args.filter, *_mod_data).FILTER
    finally:
      if _mod_data[0]:
        _mod_data[0].close()

  # get showed list
  LIST = []
  if path.exists(args.json):
    with open(args.json) as f:
      LIST = json.load(f)

  videos = []
  req = yt.activities().list(part=API_PART, home=True, maxResults=50,
                             fields=API_FIELDS)
  while req:
    resp = req.execute()

    items = filter(lambda item: ITEM_VIDEO_ID(item) not in LIST, resp['items'])
    videos += filter(FILTER, items)

    req = yt.activities().list_next(req, resp)

  items = []
  for v in videos:
    if v['snippet']['type'] in TMPL_ITEM:
      tmpl = TMPL_ITEM[v['snippet']['type']]
    else:
      tmpl = TMPL_ITEM['base']

    # TODO temporarily keep this try catch for debugging
    try:
      v['url'] = 'http://youtu.be/' + ITEM_VIDEO_ID(v)
      items.append(tmpl.format(**v))
    except KeyError as e:
      print(repr(e), file=sys.stderr)
      import pprint
      pprint.pprint(v, stream=sys.stderr)

  if videos or args.always:
    now = datetime.datetime.now()
    utc = datetime.datetime.utcfromtimestamp(time.mktime(now.timetuple()))
    print(TMPL_HTML.format(items=u'\n'.join(items),
                           utc=utc.isoformat() + 'Z',
                           now=now.ctime()).encode('utf8'))

  if videos:
    with open(args.json, 'w') as f:
      json.dump(list(set(LIST + map(ITEM_VIDEO_ID, videos))), f, indent=0)


if __name__ == '__main__':
  main()
	#!/usr/bin/env python
	# Better YouTube Recommendations
	# Copyright (c) 2013, 2015 Yu-Jie Lin
	#
	# Permission is hereby granted, free of charge, to any person obtaining a copy
	# of this software and associated documentation files (the "Software"), to deal
	# in the Software without restriction, including without limitation the rights
	# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	# copies of the Software, and to permit persons to whom the Software is
	# furnished to do so, subject to the following conditions:
	#
	# The above copyright notice and this permission notice shall be included in
	# all copies or substantial portions of the Software.
	#
	# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
	# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
	# THE SOFTWARE.
	#
	# Gist: https://gist.github.com/livibetter/7428568

	from __future__ import print_function

	import argparse
	import datetime
	import imp
	import json
	import os
	import sys
	import time
	from os import path

	import httplib2

	from apiclient.discovery import build
	from oauth2client.client import OAuth2WebServerFlow
	from oauth2client.file import Storage as BaseStorage
	from oauth2client.tools import argparser, run_flow

	API_STORAGE = 'yt-reco.dat'

	API_PART = 'snippet,contentDetails'
	API_FIELDS = 'items(contentDetails,snippet),nextPageToken'

	# default JSON file is in current working directory
	DEFAULT_JSON = path.abspath(path.basename(sys.argv[0]) + '.json')
	# extract video ID from thumbnail URL
	# https://i.ytimg.com/vi/VIDEO_ID/default.jpg
	# the videoId field deep in items[].contentDetails isn't nested same:
	# contentDetails.upload.videoId
	# contentDetails.recommendation.resourceId.videoId
	ITEM_THUMB_URL = lambda item: item['snippet']['thumbnails']['default']['url']
	ITEM_VIDEO_ID = lambda item: ITEM_THUMB_URL(item).rsplit('/', 2)[-2]
	FILTER_RECO_ONLY = lambda item: item['snippet']['type'] == 'recommendation'

	TMPL_HTML = u'''\
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<title>Better YouTube Recommendations</title>
	<style>
	body {{
	font-family: Arial;
	}}
	a {{
	color: #4AF;
	text-decoration: none;
	font-weight: bold;
	}}
	a:hover {{
	text-decoration: underline;
	}}
	.wrapper {{
	width: 640px;
	margin: 0 auto;
	}}
	.video {{
	margin: 10px 0;
	}}
	.video > .type {{
	float: right;
	color: #04C;
	font-style: italic;
	}}
	.video > .type > a {{
	color: #04C;
	}}
	.video > .thumbnail {{
	width: 120px;
	float: left;
	}}
	.video > .snippet {{
	margin-left: 130px;
	}}
	.video > .snippet > .channelTitle {{
	font-weight: bold;
	line-height: 1.5em;
	}}
	.video > .snippet > .channelTitle > a {{
	color: #666;
	}}
	.video > .snippet > .title {{
	line-height: 1.5em;
	}}
	.video > .description {{
	clear: both;
	padding-top: 10px;
	white-space: pre-wrap;
	line-height: 1.2em;
	max-height: 12em;
	overflow: auto;
	}}
	.generated {{
	clear: both;
	text-align: center;
	}}
	</style>
	</head>
	<body>
	<div class='wrapper'>
	{items}
	<div class='generated'>
	Generated at <time datetime='{utc}'>{now}</time>
	</div>
	</div>
	</body>
	</html>'''
	TMPL_ITEM_BASE = u'''\
	<div class='video'>
	<div class='type'>{snippet[type]}</div>
	<div class='thumbnail'>
	<img src="{snippet[thumbnails][default][url]}"/>
	</div>
	<div class='snippet'>
	<div class='channelTitle'>
	<a href="https://www.youtube.com/channel/{snippet[channelId]}">\
	{snippet[channelTitle]}</a>
	</div>
	<div class='title'>
	<a href="{url}">{snippet[title]}</a>
	</div>
	</div>
	<pre class='description'>{snippet[description]}</pre>
	</div>
	'''
	TMPL_ITEM_PL = u'''\
	<div class='video'>
	<div class='type'>
	<a href="https://www.youtube.com/watch?\
	v={contentDetails[playlistItem][resourceId][videoId]}&\
	list={contentDetails[playlistItem][playlistId]}\
	">{snippet[type]}</a>
	</div>
	<div class='thumbnail'>
	<img src="{snippet[thumbnails][default][url]}"/>
	</div>
	<div class='snippet'>
	<div class='channelTitle'>
	<a href="https://www.youtube.com/channel/{snippet[channelId]}">\
	{snippet[channelTitle]}</a>
	</div>
	<div class='title'>
	<a href="{url}">{snippet[title]}</a>
	</div>
	</div>
	<pre class='description'>{snippet[description]}</pre>
	</div>
	'''
	TMPL_ITEM_RECO = u'''\
	<div class='video'>
	<div class='type'>
	<a href="\
	http://youtu.be/{contentDetails[recommendation][seedResourceId][videoId]}\
	">{snippet[type]}</a>
	</div>
	<div class='thumbnail'>
	<img src="{snippet[thumbnails][default][url]}"/>
	</div>
	<div class='snippet'>
	<div class='channelTitle'>
	<a href="https://www.youtube.com/channel/{snippet[channelId]}">\
	{snippet[channelTitle]}</a>
	</div>
	<div class='title'>
	<a href="{url}">{snippet[title]}</a>
	</div>
	</div>
	<pre class='description'>{snippet[description]}</pre>
	</div>
	'''
	TMPL_ITEM_SUB = u'''\
	<div class='video'>
	<div class='type'>{snippet[type]}</div>
	<div class='thumbnail'>
	<img src="{snippet[thumbnails][default][url]}"/>
	</div>
	<div class='snippet'>
	<div class='channelTitle'>
	<a href="\
	https://www.youtube.com/channel/\
	{contentDetails[subscription][resourceId][channelId]}">\
	{snippet[channelTitle]}</a>
	</div>
	</div>
	<pre class='description'></pre>
	</div>
	'''
	TMPL_ITEM = {
	'base': TMPL_ITEM_BASE,
	'playlistItem': TMPL_ITEM_PL,
	'recommendation': TMPL_ITEM_RECO,
	'subscription': TMPL_ITEM_SUB,
	}


	class Storage(BaseStorage):
	"""Inherit the API Storage to suppress CredentialsFileSymbolicLinkError
	"""

	def __init__(self, filename):

	super(Storage, self).__init__(filename)
	self._filename_link_warned = False

	def _validate_file(self):

	if os.path.islink(self._filename) and not self._filename_link_warned:
	print('File: %s is a symbolic link.' % self._filename)
	self._filename_link_warned = True


	def auth(filename):

	FLOW = OAuth2WebServerFlow(
	'129265312357.apps.googleusercontent.com',
	'IFN3AD_F0MbYEM4jQyeNbEYJ',
	'https://www.googleapis.com/auth/youtube.readonly',
	auth_uri='https://accounts.google.com/o/oauth2/auth',
	token_uri='https://accounts.google.com/o/oauth2/token',
	)

	storage = Storage(filename)
	credentials = storage.get()
	if credentials is None or credentials.invalid:
	credentials = run_flow(FLOW, storage, argparser.parse_args([]))

	http = httplib2.Http()
	return build("youtube", "v3", http=credentials.authorize(http))


	def main():

	p = argparse.ArgumentParser(description='better YouTube recommendations')
	p.add_argument('-s', '--storage', default=API_STORAGE,
	help='the credential file (default: %(default)s)')
	p.add_argument('-j', '--json', default=DEFAULT_JSON,
	help='showed videos save file in JSON (default: %(default)s)')
	p.add_argument('-f', '--filter', help='Python script to handle filter')
	p.add_argument('-a', '--always', action='store_true',
	help='always output HTML, even no videos')
	p.add_argument('--auth', action='store_true',
	help='authorize the script and exit')
	args = p.parse_args()

	yt = auth(args.storage)
	if args.auth:
	return

	FILTER = FILTER_RECO_ONLY
	# load custom filter
	if args.filter:
	_mod_data = imp.find_module(args.filter)
	try:
	FILTER = imp.load_module(args.filter, *_mod_data).FILTER
	finally:
	if _mod_data[0]:
	_mod_data[0].close()

	# get showed list
	LIST = []
	if path.exists(args.json):
	with open(args.json) as f:
	LIST = json.load(f)

	videos = []
	req = yt.activities().list(part=API_PART, home=True, maxResults=50,
	fields=API_FIELDS)
	while req:
	resp = req.execute()

	items = filter(lambda item: ITEM_VIDEO_ID(item) not in LIST, resp['items'])
	videos += filter(FILTER, items)

	req = yt.activities().list_next(req, resp)

	items = []
	for v in videos:
	if v['snippet']['type'] in TMPL_ITEM:
	tmpl = TMPL_ITEM[v['snippet']['type']]
	else:
	tmpl = TMPL_ITEM['base']

	# TODO temporarily keep this try catch for debugging
	try:
	v['url'] = 'http://youtu.be/' + ITEM_VIDEO_ID(v)
	items.append(tmpl.format(**v))
	except KeyError as e:
	print(repr(e), file=sys.stderr)
	import pprint
	pprint.pprint(v, stream=sys.stderr)

	if videos or args.always:
	now = datetime.datetime.now()
	utc = datetime.datetime.utcfromtimestamp(time.mktime(now.timetuple()))
	print(TMPL_HTML.format(items=u'\n'.join(items),
	utc=utc.isoformat() + 'Z',
	now=now.ctime()).encode('utf8'))

	if videos:
	with open(args.json, 'w') as f:
	json.dump(list(set(LIST + map(ITEM_VIDEO_ID, videos))), f, indent=0)


	if __name__ == '__main__':
	main()