Skip to content

Instantly share code, notes, and snippets.

@upbit
Forked from ZipFile/README.md
Last active May 3, 2024 05:48
Show Gist options
  • Star 70 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save upbit/6edda27cb1644e94183291109b8a5fde to your computer and use it in GitHub Desktop.
Save upbit/6edda27cb1644e94183291109b8a5fde to your computer and use it in GitHub Desktop.
Pixiv OAuth Flow (with Selenium)

Retrieving Auth Token (with Selenium)

  1. Setup, install Selenium/ChromeDriver
pip install selenium
# download chromedriver from https://chromedriver.storage.googleapis.com/index.html?path=91.0.4472.101/
# eg: wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_mac64.zip && unzip chromedriver_mac64.zip
  1. unzip chromedriver (due to Gatekeeper, run it if you use MacOS):
+
|-> pixiv_auth.py
\-> chromedriver      <- place to here
  1. Run the command:
python pixiv_auth.py login

This will open the browser with Pixiv login page.

If you did everything right and Pixiv did not change their auth flow, pair of auth_token and refresh_token should be displayed.

After entering the password, wait for a while, chrome will disappear, and the following information will be output in the console window:

❯ python3 pixiv_auth.py login
[INFO] Get code: 3s3Xc075wd7njPLJBXgXc4qS-...
access_token: Fp9WaXhNapC8myQltgEn...
refresh_token: uXooTT7xz9v4mflnZqJ...
expires_in: 3600

Refresh Tokens

python pixiv_auth.py refresh OLD_REFRESH_TOKEN

使用方法:

  1. 下载 pixiv_auth.py 脚本,以及 chromedriver(放在脚本目录)
  2. pip install selenium
  3. python pixiv_auth.py login

成功后会在窗口内自动显示refresh_token,保存起来以后通过 api.auth(refresh_token=REFRESH_TOKEN) 登录

注:如果在墙内访问,请手动设置 REQUESTS_KWARGS.proxies 的代理,不然获取code后无法正确提交请求到Pixiv (现象是 [INFO] Get code: xxxxx 后一直卡住,未requests配置代理即可)

#!/usr/bin/env python
import time
import json
import re
import requests
from argparse import ArgumentParser
from base64 import urlsafe_b64encode
from hashlib import sha256
from pprint import pprint
from secrets import token_urlsafe
from sys import exit
from urllib.parse import urlencode
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# Latest app version can be found using GET /v1/application-info/android
USER_AGENT = "PixivIOSApp/7.13.3 (iOS 14.6; iPhone13,2)"
REDIRECT_URI = "https://app-api.pixiv.net/web/v1/users/auth/pixiv/callback"
LOGIN_URL = "https://app-api.pixiv.net/web/v1/login"
AUTH_TOKEN_URL = "https://oauth.secure.pixiv.net/auth/token"
CLIENT_ID = "MOBrBDS8blbauoSck0ZfDbtuzpyT"
CLIENT_SECRET = "lsACyCD94FhDUtGTXi3QzcFE2uU1hqtDaKeqrdwj"
REQUESTS_KWARGS = {
# 'proxies': {
# 'https': 'http://127.0.0.1:1087',
# },
# 'verify': False
}
def s256(data):
"""S256 transformation method."""
return urlsafe_b64encode(sha256(data).digest()).rstrip(b"=").decode("ascii")
def oauth_pkce(transform):
"""Proof Key for Code Exchange by OAuth Public Clients (RFC7636)."""
code_verifier = token_urlsafe(32)
code_challenge = transform(code_verifier.encode("ascii"))
return code_verifier, code_challenge
def print_auth_token_response(response):
data = response.json()
try:
access_token = data["access_token"]
refresh_token = data["refresh_token"]
except KeyError:
print("error:")
pprint(data)
exit(1)
print("access_token:", access_token)
print("refresh_token:", refresh_token)
print("expires_in:", data.get("expires_in", 0))
def login():
caps = DesiredCapabilities.CHROME.copy()
caps["goog:loggingPrefs"] = {"performance": "ALL"} # enable performance logs
driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps)
code_verifier, code_challenge = oauth_pkce(s256)
login_params = {
"code_challenge": code_challenge,
"code_challenge_method": "S256",
"client": "pixiv-android",
}
print("[INFO] Gen code_verifier:", code_verifier)
driver.get(f"{LOGIN_URL}?{urlencode(login_params)}")
while True:
# wait for login
if driver.current_url[:40] == "https://accounts.pixiv.net/post-redirect":
break
time.sleep(1)
# filter code url from performance logs
code = None
for row in driver.get_log('performance'):
data = json.loads(row.get("message", {}))
message = data.get("message", {})
if message.get("method") == "Network.requestWillBeSent":
url = message.get("params", {}).get("documentURL")
if url[:8] == "pixiv://":
code = re.search(r'code=([^&]*)', url).groups()[0]
break
driver.close()
print("[INFO] Get code:", code)
response = requests.post(
AUTH_TOKEN_URL,
data={
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"code": code,
"code_verifier": code_verifier,
"grant_type": "authorization_code",
"include_policy": "true",
"redirect_uri": REDIRECT_URI,
},
headers={
"user-agent": USER_AGENT,
"app-os-version": "14.6",
"app-os": "ios",
},
**REQUESTS_KWARGS
)
print_auth_token_response(response)
def refresh(refresh_token):
response = requests.post(
AUTH_TOKEN_URL,
data={
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"grant_type": "refresh_token",
"include_policy": "true",
"refresh_token": refresh_token,
},
headers={
"user-agent": USER_AGENT,
"app-os-version": "14.6",
"app-os": "ios",
},
**REQUESTS_KWARGS
)
print_auth_token_response(response)
def main():
parser = ArgumentParser()
subparsers = parser.add_subparsers()
parser.set_defaults(func=lambda _: parser.print_usage())
login_parser = subparsers.add_parser("login")
login_parser.set_defaults(func=lambda _: login())
refresh_parser = subparsers.add_parser("refresh")
refresh_parser.add_argument("refresh_token")
refresh_parser.set_defaults(func=lambda ns: refresh(ns.refresh_token))
args = parser.parse_args()
args.func(args)
if __name__ == "__main__":
main()
@upbit
Copy link
Author

upbit commented Mar 22, 2024

If you encounter a situation where the refresh_token is not effective, please provide more detail of the issue you are facing.

I am fetching details of illustrations. I get rate limited around 200, so I use the following function to generate new tokens, and sleep for 400 seconds before resuming:

def get_new_tokens(api, auto_refresh_token, config):
    subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
    config.read("credentials/config.ini")
    auto_access_token = config["pixiv"]["access_token"]
    auto_refresh_token = config["pixiv"]["refresh_token"]
    api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
    return auto_refresh_token

The problem, even after all this, my next API call is still rate limited, putting me in a loop of getting new tokens, and sleeping

while True:
            illust_detail = api.illust_detail(illust_id)
            if "error" not in illust_detail:
                break
            elif "Rate Limit" in illust_detail["error"]["message"]:
                auto_refresh_token = get_new_tokens(api, auto_refresh_token, config)
                time.sleep(400)

The API rate limited seems to be applied at the account level, change the refresh_token does not allow exceeding the limit.
One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

@SakiSakiSakiSakiSaki
Copy link

The API rate limited seems to be applied at the account level, change the refresh_token does not allow exceeding the limit.
One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

May have to do this ...

Do you know how long is an appropriate sleep time? 400 seconds doesn't seem enough.

@SakiSakiSakiSakiSaki
Copy link

One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

So I tried the following:

def get_new_tokens(api, config, index):
      config.read("credentials/config.ini")
      index += 1
      if index > 3:
          index = 1
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
      auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
      return auto_refresh_token, index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)

It basically rotates 3 different pairs of tokens from 3 different accounts. Issue is even after get_new_tokens(api, config, index) and waiting 400 seconds, the next request still gets rate limited. Do you see any mistakes I might've made?

@upbit
Copy link
Author

upbit commented Mar 23, 2024

One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

So I tried the following:

def get_new_tokens(api, config, index):
      config.read("credentials/config.ini")
      index += 1
      if index > 3:
          index = 1
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
      auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
      return auto_refresh_token, index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)

It basically rotates 3 different pairs of tokens from 3 different accounts. Issue is even after get_new_tokens(api, config, index) and waiting 400 seconds, the next request still gets rate limited. Do you see any mistakes I might've made?

There seems to be no problem here. api.set_auth(access_token) will provide the Authorization: Bearer token, which can be used continuously until an authentication failure is returned.

I am worried that Pixiv has added IP or other statistical dimensions to trigger frequency limits. You can try using a 'clean' account, and manually verify it after triggering the Rate Limit.

@SakiSakiSakiSakiSaki
Copy link

SakiSakiSakiSakiSaki commented Mar 23, 2024

You can try using a 'clean' account

Can you define what you mean by 'clean'? Do you mean an account that's never sent an API request?

Edited my script to see if I was failing to fetch the new tokens, it introduced a new Auth issue:

def get_new_tokens(api, config, index):
        config.read("credentials/config.ini")
        auto_refresh_token = config["pixiv"]["refresh_token"]
        subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
        config.read("credentials/config.ini")
        auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
        api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
        return index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)
{
   'error': {
      'user_message': '',
      'message': 'Error occurred at the OAuth process. Please check your Access Token to fix this. Error Message: invalid_grant',
      'reason': '',
      'user_message_details': {}
   }
}

@upbit
Copy link
Author

upbit commented Mar 23, 2024

  1. Yes. If an account that has not sent a request recently still encounters Rate limit, it means that Pixvi may block by IP address.
  2. invalid_grant means that the access token is expired (Only valid for about 3600 seconds), and needs to be reauthorized. Store refresh_token replace access_token+refresh_token like this:
-        auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
-        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
-        api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)

+       auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
+       api.auth(refresh_token=auto_refresh_token)

@SakiSakiSakiSakiSaki
Copy link

SakiSakiSakiSakiSaki commented Mar 23, 2024

  1. Yes. If an account that has not sent a request recently still encounters Rate limit, it means that Pixvi may block by IP address.

I change my IP address via VPN during the 400 second sleep.

2. invalid_grant means that the access token is expired (Only valid for about 3600 seconds), and needs to be reauthorized. Store refresh_token replace access_token+refresh_token like this:

-        auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
-        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
-        api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)

+       auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
+       api.auth(refresh_token=auto_refresh_token)

Sure

def get_new_tokens(api, config, index):
        index += 1
        if index > 3:
            index = 1
        config.read("credentials/config.ini")
        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
        refresh(auto_refresh_token, index)
        config.read("credentials/config.ini")
        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
        api.auth(refresh_token=auto_refresh_token)
        return index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)

With VPN, and multiple accounts, I still get rate-limited. After my 218th illustration (its 218 everytime with this same dataset).
And I know my tokens are getting updated because I'm keeping my eye on config.ini; access_token for each alt account changes per iteration.

I even put a 10 minute sleep, plus changing IP with VPN, and I still cant get past 218th illustrations. It's worth mentioning that when I restart my program and use set_auth() with the last known tokens, it lets me parse the 218 images again, no problem.

Am I the only one facing issues with illust_detail()?

@zjDrummond
Copy link

Having a problem with the script. When running "python pixiv_auth.py login" I get this...

Traceback (most recent call last):
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 6, in
import requests
ModuleNotFoundError: No module named 'requests'

@upbit
Copy link
Author

upbit commented Mar 27, 2024

Having a problem with the script. When running "python pixiv_auth.py login" I get this...

Traceback (most recent call last): File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 6, in import requests ModuleNotFoundError: No module named 'requests'

for python3.x: pip install pixivpy3

@zjDrummond
Copy link

Installed that, and now I get this...

PS C:\Users\Zachary\Downloads\pixiv auth> python pixiv_auth.py login
Traceback (most recent call last):
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 157, in
main()
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 153, in main
args.func(args)
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 148, in
login_parser.set_defaults(func=lambda _: login())
^^^^^^^
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 68, in login
driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: WebDriver.init() got an unexpected keyword argument 'desired_capabilities'

@upbit
Copy link
Author

upbit commented Mar 27, 2024

Installed that, and now I get this...

PS C:\Users\Zachary\Downloads\pixiv auth> python pixiv_auth.py login Traceback (most recent call last): File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 157, in main() File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 153, in main args.func(args) File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 148, in login_parser.set_defaults(func=lambda _: login()) ^^^^^^^ File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 68, in login driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: WebDriver.init() got an unexpected keyword argument 'desired_capabilities'

See the answer before:
https://gist.github.com/upbit/6edda27cb1644e94183291109b8a5fde?permalink_comment_id=4996990#gistcomment-4996990

@lFitzl
Copy link

lFitzl commented May 3, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment