Skip to content

Instantly share code, notes, and snippets.

@livibetter
Last active January 17, 2021 23:55
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save livibetter/7428568 to your computer and use it in GitHub Desktop.
Save livibetter/7428568 to your computer and use it in GitHub Desktop.
Better YouTube recommendation and more

Better YouTube Recommendation

yt-reco.py was written to filter recommendation and prevent repeatedly seeing the same videos. Once a video is showed, the video ID is stored in a JSON for look-up in next invocation.

example HTML

installation

  1. Save the script somewhere.

  2. Run the script to authorize it:

    $ /path/to/yt-reco.py -s /path/to/yt-reco.dat --auth

    Follow the instruction, then there should be a /path/to/yt-reco.dat created.

  3. Run the script to process:

    $ /path/to/yt-reco.py -s /path/to/yt-reco.dat -j /path/to/yt-reco.json > output.html

You can also set up cron job to do it automatically:

With cron

I have this cron task to run the helper scrip five minutes every three hours from midnight:

5     */3   *     *       *       /path/to/yt-reco.sh

and this yt-reco.sh helper script:

#!/bin/bash

SCRIPT=/path/to/yt-reco.py
JSON=/path/to/yt-reco.json
STORAGE=/path/to/yt-reco.dat
printf -v TS '%(%Y%m%d-%H%M%S)T' -1
DEST="/path/to/yt-reco.$TS.html"

OUTPUT=$(PYTHONPATH="/path/to/custom/filter" "$SCRIPT" -s "$STORAGE" -j "$JSON" -f yt_reco_custom)
if [[ $OUTPUT ]]; then
  echo "$OUTPUT" > "$DEST"
fi

This helper script only write the HTML when there is output, and the filename is time stamped. I also have a custom filter script, /path/to/custom/filter/yt_reco_custom.py. You can see the Bash script add path to PYTHONPATH, so it can be found.

usage

usage: yt-reco.py [-h] [-s STORAGE] [-j JSON] [-f FILTER] [-a]

optional arguments:
  -h, --help            show this help message and exit
  -s STORAGE, --storage STORAGE
                        the credential file (default: yt-reco.dat)
  -j JSON, --json JSON  showed videos save file in JSON (default:
                        /home/livibetter/p/gist/yt-reco/yt-reco.py.json)
  -f FILTER, --filter FILTER
                        Python script to handle filter
  -a, --always          always output HTML, even no videos

The STORAGE is where the credential stores, and JSON is where the list of video IDs stored.

custom filter

You can use your own filter, for example:

$ ./yt-reco.py -f custom

There should be a custom.py to match custom in the option. In the script, FILTER function is used to filter:

def FILTER(item):

  return True

The code above does no filter, that's everything will be shown. The following code is equivalent to without -f:

def FILTER(item):

  return item['snippet']['type'] == 'recommendation'

You can do whatever to filter anyway you like, return True if you want the video to be shown. The structure can be found on API documentation.

Here is an actual code I am using to exclude certain channel by channel ID and/or title:

BLOCKED_CHANNEL_IDS = [
  'tHeChhNALLID',
]


def FILTER(item):

  s = item['snippet']
  channelId = s['channelId']

  c = item['contentDetails']

  if channelId in BLOCKED_CHANNEL_IDS:
    return False

  if s['type'] == 'upload':
    title = s['title']
    if channelId == 'AnOTheRCHANNelID' and 'unwanted string' in title:
      return False

  return True

I only use simple string check because that is all I need now, I might use regular expression in the future when there is a need for more advanced filter.

type

  • recommendation: recommended reason video is linked in "recommendation" text on top-right corner.
  • upload: the channel owner uploads the video.
  • comment, like: the channel owner comments or likes the video.
  • playlistItem: the video is added to the playlist, playlist item is linked in "playlistItem" text on top-right corner.
  • subscription: this type is useless because there is no data about what the channel owner is subscribed to in the returned JSON.

There are more types and they may cause troubles, but I've only deal with those above.

related links

#!/usr/bin/env python
# Better YouTube Recommendations
# Copyright (c) 2013, 2015 Yu-Jie Lin
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#
# Gist: https://gist.github.com/livibetter/7428568
from __future__ import print_function
import argparse
import datetime
import imp
import json
import os
import sys
import time
from os import path
import httplib2
from apiclient.discovery import build
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.file import Storage as BaseStorage
from oauth2client.tools import argparser, run_flow
API_STORAGE = 'yt-reco.dat'
API_PART = 'snippet,contentDetails'
API_FIELDS = 'items(contentDetails,snippet),nextPageToken'
# default JSON file is in current working directory
DEFAULT_JSON = path.abspath(path.basename(sys.argv[0]) + '.json')
# extract video ID from thumbnail URL
# https://i.ytimg.com/vi/VIDEO_ID/default.jpg
# the videoId field deep in items[].contentDetails isn't nested same:
# contentDetails.upload.videoId
# contentDetails.recommendation.resourceId.videoId
ITEM_THUMB_URL = lambda item: item['snippet']['thumbnails']['default']['url']
ITEM_VIDEO_ID = lambda item: ITEM_THUMB_URL(item).rsplit('/', 2)[-2]
FILTER_RECO_ONLY = lambda item: item['snippet']['type'] == 'recommendation'
TMPL_HTML = u'''\
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Better YouTube Recommendations</title>
<style>
body {{
font-family: Arial;
}}
a {{
color: #4AF;
text-decoration: none;
font-weight: bold;
}}
a:hover {{
text-decoration: underline;
}}
.wrapper {{
width: 640px;
margin: 0 auto;
}}
.video {{
margin: 10px 0;
}}
.video > .type {{
float: right;
color: #04C;
font-style: italic;
}}
.video > .type > a {{
color: #04C;
}}
.video > .thumbnail {{
width: 120px;
float: left;
}}
.video > .snippet {{
margin-left: 130px;
}}
.video > .snippet > .channelTitle {{
font-weight: bold;
line-height: 1.5em;
}}
.video > .snippet > .channelTitle > a {{
color: #666;
}}
.video > .snippet > .title {{
line-height: 1.5em;
}}
.video > .description {{
clear: both;
padding-top: 10px;
white-space: pre-wrap;
line-height: 1.2em;
max-height: 12em;
overflow: auto;
}}
.generated {{
clear: both;
text-align: center;
}}
</style>
</head>
<body>
<div class='wrapper'>
{items}
<div class='generated'>
Generated at <time datetime='{utc}'>{now}</time>
</div>
</div>
</body>
</html>'''
TMPL_ITEM_BASE = u'''\
<div class='video'>
<div class='type'>{snippet[type]}</div>
<div class='thumbnail'>
<img src="{snippet[thumbnails][default][url]}"/>
</div>
<div class='snippet'>
<div class='channelTitle'>
<a href="https://www.youtube.com/channel/{snippet[channelId]}">\
{snippet[channelTitle]}</a>
</div>
<div class='title'>
<a href="{url}">{snippet[title]}</a>
</div>
</div>
<pre class='description'>{snippet[description]}</pre>
</div>
'''
TMPL_ITEM_PL = u'''\
<div class='video'>
<div class='type'>
<a href="https://www.youtube.com/watch?\
v={contentDetails[playlistItem][resourceId][videoId]}&\
list={contentDetails[playlistItem][playlistId]}\
">{snippet[type]}</a>
</div>
<div class='thumbnail'>
<img src="{snippet[thumbnails][default][url]}"/>
</div>
<div class='snippet'>
<div class='channelTitle'>
<a href="https://www.youtube.com/channel/{snippet[channelId]}">\
{snippet[channelTitle]}</a>
</div>
<div class='title'>
<a href="{url}">{snippet[title]}</a>
</div>
</div>
<pre class='description'>{snippet[description]}</pre>
</div>
'''
TMPL_ITEM_RECO = u'''\
<div class='video'>
<div class='type'>
<a href="\
http://youtu.be/{contentDetails[recommendation][seedResourceId][videoId]}\
">{snippet[type]}</a>
</div>
<div class='thumbnail'>
<img src="{snippet[thumbnails][default][url]}"/>
</div>
<div class='snippet'>
<div class='channelTitle'>
<a href="https://www.youtube.com/channel/{snippet[channelId]}">\
{snippet[channelTitle]}</a>
</div>
<div class='title'>
<a href="{url}">{snippet[title]}</a>
</div>
</div>
<pre class='description'>{snippet[description]}</pre>
</div>
'''
TMPL_ITEM_SUB = u'''\
<div class='video'>
<div class='type'>{snippet[type]}</div>
<div class='thumbnail'>
<img src="{snippet[thumbnails][default][url]}"/>
</div>
<div class='snippet'>
<div class='channelTitle'>
<a href="\
https://www.youtube.com/channel/\
{contentDetails[subscription][resourceId][channelId]}">\
{snippet[channelTitle]}</a>
</div>
</div>
<pre class='description'></pre>
</div>
'''
TMPL_ITEM = {
'base': TMPL_ITEM_BASE,
'playlistItem': TMPL_ITEM_PL,
'recommendation': TMPL_ITEM_RECO,
'subscription': TMPL_ITEM_SUB,
}
class Storage(BaseStorage):
"""Inherit the API Storage to suppress CredentialsFileSymbolicLinkError
"""
def __init__(self, filename):
super(Storage, self).__init__(filename)
self._filename_link_warned = False
def _validate_file(self):
if os.path.islink(self._filename) and not self._filename_link_warned:
print('File: %s is a symbolic link.' % self._filename)
self._filename_link_warned = True
def auth(filename):
FLOW = OAuth2WebServerFlow(
'129265312357.apps.googleusercontent.com',
'IFN3AD_F0MbYEM4jQyeNbEYJ',
'https://www.googleapis.com/auth/youtube.readonly',
auth_uri='https://accounts.google.com/o/oauth2/auth',
token_uri='https://accounts.google.com/o/oauth2/token',
)
storage = Storage(filename)
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(FLOW, storage, argparser.parse_args([]))
http = httplib2.Http()
return build("youtube", "v3", http=credentials.authorize(http))
def main():
p = argparse.ArgumentParser(description='better YouTube recommendations')
p.add_argument('-s', '--storage', default=API_STORAGE,
help='the credential file (default: %(default)s)')
p.add_argument('-j', '--json', default=DEFAULT_JSON,
help='showed videos save file in JSON (default: %(default)s)')
p.add_argument('-f', '--filter', help='Python script to handle filter')
p.add_argument('-a', '--always', action='store_true',
help='always output HTML, even no videos')
p.add_argument('--auth', action='store_true',
help='authorize the script and exit')
args = p.parse_args()
yt = auth(args.storage)
if args.auth:
return
FILTER = FILTER_RECO_ONLY
# load custom filter
if args.filter:
_mod_data = imp.find_module(args.filter)
try:
FILTER = imp.load_module(args.filter, *_mod_data).FILTER
finally:
if _mod_data[0]:
_mod_data[0].close()
# get showed list
LIST = []
if path.exists(args.json):
with open(args.json) as f:
LIST = json.load(f)
videos = []
req = yt.activities().list(part=API_PART, home=True, maxResults=50,
fields=API_FIELDS)
while req:
resp = req.execute()
items = filter(lambda item: ITEM_VIDEO_ID(item) not in LIST, resp['items'])
videos += filter(FILTER, items)
req = yt.activities().list_next(req, resp)
items = []
for v in videos:
if v['snippet']['type'] in TMPL_ITEM:
tmpl = TMPL_ITEM[v['snippet']['type']]
else:
tmpl = TMPL_ITEM['base']
# TODO temporarily keep this try catch for debugging
try:
v['url'] = 'http://youtu.be/' + ITEM_VIDEO_ID(v)
items.append(tmpl.format(**v))
except KeyError as e:
print(repr(e), file=sys.stderr)
import pprint
pprint.pprint(v, stream=sys.stderr)
if videos or args.always:
now = datetime.datetime.now()
utc = datetime.datetime.utcfromtimestamp(time.mktime(now.timetuple()))
print(TMPL_HTML.format(items=u'\n'.join(items),
utc=utc.isoformat() + 'Z',
now=now.ctime()).encode('utf8'))
if videos:
with open(args.json, 'w') as f:
json.dump(list(set(LIST + map(ITEM_VIDEO_ID, videos))), f, indent=0)
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment