Skip to content

Instantly share code, notes, and snippets.

@Linusp
Created December 25, 2021 12:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Linusp/eeaa2aed4b90cb130fd002d0189ad7ff to your computer and use it in GitHub Desktop.
Save Linusp/eeaa2aed4b90cb130fd002d0189ad7ff to your computer and use it in GitHub Desktop.
导出B站观看记录数据
  1. 登录 B 站

  2. 前往 https://www.bilibili.com/account/history 页面

  3. 打开浏览器 DevTools 的 Network 面板,刷新页面后,在 Network 面板点开任意一个返回结果为 json 类型的请求,复制 请求Cookie 中的内容到 cookies.json

  4. 执行脚本导出数据

    python export_bili_history.py -c cookies.json -o results.jsonl -p 10

    B 站的 API 每页返回 20 条数据,如果想获得更多可以修改 -p 10 为更大的数,不过最多只能导出最近三个月的数据,更早的无法获取了。

import json
import time
import click
import requests
@click.command()
@click.option("-c", "--cookie-file", required=True)
@click.option("-o", "--outfile", required=True)
@click.option("-p", "--page-num", type=int, default=10)
def main(cookie_file, outfile, page_num):
cookies = json.load(open(cookie_file))
headers = {
'Connection': 'keep-alive',
'Host': 'api.bilibili.com',
'Referer': 'https://www.bilibili.com/account/history',
'User-Agent': (
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) '
'Gecko/20100101 Firefox/90.0'
)
}
session = requests.Session()
url = 'https://api.bilibili.com/x/web-interface/history/cursor'
params = {'max': 0, 'view_at': 0, 'business': ''}
with open(outfile, 'w') as fout:
for page_num in range(page_num):
time.sleep(1)
resp = session.get(url, params=params, headers=headers, cookies=cookies)
if resp.status_code != 200:
break
result = resp.json()
if not result.get('data') or result['data']['cursor']['ps'] == 0:
break
print(
f'page = {page_num} '
f'code = {result.get("code")} '
f'datalen = {len(result["data"]["list"])} '
f'cursor = {result["data"]["cursor"]}'
)
for item in result['data']['list']:
print(json.dumps(item, ensure_ascii=False), file=fout)
params = {
'max': result['data']['cursor']['max'],
'view_at': result['data']['cursor']['view_at'],
'business': result['data']['cursor']['business'],
}
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment