Skip to content

Instantly share code, notes, and snippets.

@upbit
Forked from ZipFile/README.md
Last active May 3, 2024 05:48
Show Gist options
  • Star 70 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save upbit/6edda27cb1644e94183291109b8a5fde to your computer and use it in GitHub Desktop.
Save upbit/6edda27cb1644e94183291109b8a5fde to your computer and use it in GitHub Desktop.
Pixiv OAuth Flow (with Selenium)

Retrieving Auth Token (with Selenium)

  1. Setup, install Selenium/ChromeDriver
pip install selenium
# download chromedriver from https://chromedriver.storage.googleapis.com/index.html?path=91.0.4472.101/
# eg: wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_mac64.zip && unzip chromedriver_mac64.zip
  1. unzip chromedriver (due to Gatekeeper, run it if you use MacOS):
+
|-> pixiv_auth.py
\-> chromedriver      <- place to here
  1. Run the command:
python pixiv_auth.py login

This will open the browser with Pixiv login page.

If you did everything right and Pixiv did not change their auth flow, pair of auth_token and refresh_token should be displayed.

After entering the password, wait for a while, chrome will disappear, and the following information will be output in the console window:

❯ python3 pixiv_auth.py login
[INFO] Get code: 3s3Xc075wd7njPLJBXgXc4qS-...
access_token: Fp9WaXhNapC8myQltgEn...
refresh_token: uXooTT7xz9v4mflnZqJ...
expires_in: 3600

Refresh Tokens

python pixiv_auth.py refresh OLD_REFRESH_TOKEN

使用方法:

  1. 下载 pixiv_auth.py 脚本,以及 chromedriver(放在脚本目录)
  2. pip install selenium
  3. python pixiv_auth.py login

成功后会在窗口内自动显示refresh_token,保存起来以后通过 api.auth(refresh_token=REFRESH_TOKEN) 登录

注:如果在墙内访问,请手动设置 REQUESTS_KWARGS.proxies 的代理,不然获取code后无法正确提交请求到Pixiv (现象是 [INFO] Get code: xxxxx 后一直卡住,未requests配置代理即可)

#!/usr/bin/env python
import time
import json
import re
import requests
from argparse import ArgumentParser
from base64 import urlsafe_b64encode
from hashlib import sha256
from pprint import pprint
from secrets import token_urlsafe
from sys import exit
from urllib.parse import urlencode
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# Latest app version can be found using GET /v1/application-info/android
USER_AGENT = "PixivIOSApp/7.13.3 (iOS 14.6; iPhone13,2)"
REDIRECT_URI = "https://app-api.pixiv.net/web/v1/users/auth/pixiv/callback"
LOGIN_URL = "https://app-api.pixiv.net/web/v1/login"
AUTH_TOKEN_URL = "https://oauth.secure.pixiv.net/auth/token"
CLIENT_ID = "MOBrBDS8blbauoSck0ZfDbtuzpyT"
CLIENT_SECRET = "lsACyCD94FhDUtGTXi3QzcFE2uU1hqtDaKeqrdwj"
REQUESTS_KWARGS = {
# 'proxies': {
# 'https': 'http://127.0.0.1:1087',
# },
# 'verify': False
}
def s256(data):
"""S256 transformation method."""
return urlsafe_b64encode(sha256(data).digest()).rstrip(b"=").decode("ascii")
def oauth_pkce(transform):
"""Proof Key for Code Exchange by OAuth Public Clients (RFC7636)."""
code_verifier = token_urlsafe(32)
code_challenge = transform(code_verifier.encode("ascii"))
return code_verifier, code_challenge
def print_auth_token_response(response):
data = response.json()
try:
access_token = data["access_token"]
refresh_token = data["refresh_token"]
except KeyError:
print("error:")
pprint(data)
exit(1)
print("access_token:", access_token)
print("refresh_token:", refresh_token)
print("expires_in:", data.get("expires_in", 0))
def login():
caps = DesiredCapabilities.CHROME.copy()
caps["goog:loggingPrefs"] = {"performance": "ALL"} # enable performance logs
driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps)
code_verifier, code_challenge = oauth_pkce(s256)
login_params = {
"code_challenge": code_challenge,
"code_challenge_method": "S256",
"client": "pixiv-android",
}
print("[INFO] Gen code_verifier:", code_verifier)
driver.get(f"{LOGIN_URL}?{urlencode(login_params)}")
while True:
# wait for login
if driver.current_url[:40] == "https://accounts.pixiv.net/post-redirect":
break
time.sleep(1)
# filter code url from performance logs
code = None
for row in driver.get_log('performance'):
data = json.loads(row.get("message", {}))
message = data.get("message", {})
if message.get("method") == "Network.requestWillBeSent":
url = message.get("params", {}).get("documentURL")
if url[:8] == "pixiv://":
code = re.search(r'code=([^&]*)', url).groups()[0]
break
driver.close()
print("[INFO] Get code:", code)
response = requests.post(
AUTH_TOKEN_URL,
data={
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"code": code,
"code_verifier": code_verifier,
"grant_type": "authorization_code",
"include_policy": "true",
"redirect_uri": REDIRECT_URI,
},
headers={
"user-agent": USER_AGENT,
"app-os-version": "14.6",
"app-os": "ios",
},
**REQUESTS_KWARGS
)
print_auth_token_response(response)
def refresh(refresh_token):
response = requests.post(
AUTH_TOKEN_URL,
data={
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"grant_type": "refresh_token",
"include_policy": "true",
"refresh_token": refresh_token,
},
headers={
"user-agent": USER_AGENT,
"app-os-version": "14.6",
"app-os": "ios",
},
**REQUESTS_KWARGS
)
print_auth_token_response(response)
def main():
parser = ArgumentParser()
subparsers = parser.add_subparsers()
parser.set_defaults(func=lambda _: parser.print_usage())
login_parser = subparsers.add_parser("login")
login_parser.set_defaults(func=lambda _: login())
refresh_parser = subparsers.add_parser("refresh")
refresh_parser.add_argument("refresh_token")
refresh_parser.set_defaults(func=lambda ns: refresh(ns.refresh_token))
args = parser.parse_args()
args.func(args)
if __name__ == "__main__":
main()
@amber6hua
Copy link

session not created: This version of ChromeDriver only supports Chrome version 91
Current browser version is 109.0.5414.75 with binary path C:\Program Files\Google\Chrome\Application\chrome.exe

这里看报错 是chrome驱动和你当前的浏览器版本不对应 换个

@atomlayer
Copy link

For selenium 4

options = ChromeOptions()
log_prefs = {'performance': 'ALL'}
options.set_capability('goog:loggingPrefs', log_prefs)

driver = webdriver.Chrome(options)

@InvisibleTroll
Copy link

我可以问一下这个代码是干什么的嘛

@shiwei25519
Copy link

I made a modification to pixiv_auth.py so that it works on Selenium 4 and my Ubuntu 22.04 environment.
Here's my proposal of the modification:
https://gist.github.com/shiwei25519/0d0d333b422707db9772ae66f4ef2030
You may merge my proposal into the original gist if it also works in your environment. (I wonder if there are pull request feature in gist)

@upbit
Copy link
Author

upbit commented Dec 29, 2023

I made a modification to pixiv_auth.py so that it works on Selenium 4 and my Ubuntu 22.04 environment. Here's my proposal of the modification: https://gist.github.com/shiwei25519/0d0d333b422707db9772ae66f4ef2030 You may merge my proposal into the original gist if it also works in your environment. (I wonder if there are pull request feature in gist)

Great job!

@SakiSakiSakiSakiSaki
Copy link

@upbit

python pixiv_auth.py login returns the following error for me:

driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: WebDriver.__init__() got an unexpected keyword argument 'desired_capabilities'

I have selenium installed selenium==4.18.1, and chromedriver in my dir:
image

What should I do?

@upbit
Copy link
Author

upbit commented Mar 22, 2024

@upbit

python pixiv_auth.py login returns the following error for me:

driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: WebDriver.__init__() got an unexpected keyword argument 'desired_capabilities'

I have selenium installed selenium==4.18.1, and chromedriver in my dir: image

What should I do?

The simplest solution is to downgrade to selenium==4.9.1.

The new version of selenium also has a way to pass parameters, see reference: https://stackoverflow.com/a/76432856

@SakiSakiSakiSakiSaki
Copy link

SakiSakiSakiSakiSaki commented Mar 22, 2024

The simplest solution is to downgrade to selenium==4.9.1.

The new version of selenium also has a way to pass parameters, see reference: https://stackoverflow.com/a/76432856

Selenium works now, but the issue I've ran into is the same issue with the other script that refreshes your token:
It keeps returning my same refresh_token:

(venv) PS C:\Users\USER\Desktop\test> python pixiv_auth.py login
access_token: ----N6GVnsSKs
refresh_token: ----8DRmkg
expires_in: 3600
(venv) PS C:\Users\USER\Desktop\test> python pixiv_auth.py refresh ----8DRmkg
access_token: ----_IHtnZVFVd8
refresh_token: ----8DRmkg
expires_in: 3600

It's actually been returning this same refresh_token for about 2 days, even when I used api.auth(refresh_token=OLD_REFRESH_TOKEN)

It's been a while since I've studied your repo, but I distinctly remember that you'd get both new access_token's and refresh_token's.

@upbit
Copy link
Author

upbit commented Mar 22, 2024

Firstly, you can validate if the refresh_token are working properly. refresh_token can be used for several months, just call:

_REFRESH_TOKEN = "8DRmkg***"

print(api.auth(refresh_token=_REFRESH_TOKEN))

It is possible to obtain the same refresh_token in the short term, which could be considered normal.
From a technical standpoint, the refresh_token can be used repeatedly until it expires. Therefore, receiving the same refresh_token twice in a row may be an internal strategy by Pixiv.

If you encounter a situation where the refresh_token is not effective, please provide more detail of the issue you are facing.

@SakiSakiSakiSakiSaki
Copy link

If you encounter a situation where the refresh_token is not effective, please provide more detail of the issue you are facing.

I am fetching details of illustrations. I get rate limited around 200, so I use the following function to generate new tokens, and sleep for 400 seconds before resuming:

def get_new_tokens(api, auto_refresh_token, config):
    subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
    config.read("credentials/config.ini")
    auto_access_token = config["pixiv"]["access_token"]
    auto_refresh_token = config["pixiv"]["refresh_token"]
    api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
    return auto_refresh_token

The problem, even after all this, my next API call is still rate limited, putting me in a loop of getting new tokens, and sleeping

while True:
            illust_detail = api.illust_detail(illust_id)
            if "error" not in illust_detail:
                break
            elif "Rate Limit" in illust_detail["error"]["message"]:
                auto_refresh_token = get_new_tokens(api, auto_refresh_token, config)
                time.sleep(400)

@upbit
Copy link
Author

upbit commented Mar 22, 2024

If you encounter a situation where the refresh_token is not effective, please provide more detail of the issue you are facing.

I am fetching details of illustrations. I get rate limited around 200, so I use the following function to generate new tokens, and sleep for 400 seconds before resuming:

def get_new_tokens(api, auto_refresh_token, config):
    subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
    config.read("credentials/config.ini")
    auto_access_token = config["pixiv"]["access_token"]
    auto_refresh_token = config["pixiv"]["refresh_token"]
    api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
    return auto_refresh_token

The problem, even after all this, my next API call is still rate limited, putting me in a loop of getting new tokens, and sleeping

while True:
            illust_detail = api.illust_detail(illust_id)
            if "error" not in illust_detail:
                break
            elif "Rate Limit" in illust_detail["error"]["message"]:
                auto_refresh_token = get_new_tokens(api, auto_refresh_token, config)
                time.sleep(400)

The API rate limited seems to be applied at the account level, change the refresh_token does not allow exceeding the limit.
One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

@SakiSakiSakiSakiSaki
Copy link

The API rate limited seems to be applied at the account level, change the refresh_token does not allow exceeding the limit.
One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

May have to do this ...

Do you know how long is an appropriate sleep time? 400 seconds doesn't seem enough.

@SakiSakiSakiSakiSaki
Copy link

One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

So I tried the following:

def get_new_tokens(api, config, index):
      config.read("credentials/config.ini")
      index += 1
      if index > 3:
          index = 1
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
      auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
      return auto_refresh_token, index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)

It basically rotates 3 different pairs of tokens from 3 different accounts. Issue is even after get_new_tokens(api, config, index) and waiting 400 seconds, the next request still gets rate limited. Do you see any mistakes I might've made?

@upbit
Copy link
Author

upbit commented Mar 23, 2024

One approach could be to create a pool of multiple accounts, and randomly selecting from the account pool on each request.

So I tried the following:

def get_new_tokens(api, config, index):
      config.read("credentials/config.ini")
      index += 1
      if index > 3:
          index = 1
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
      auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
      auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
      api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
      return auto_refresh_token, index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)

It basically rotates 3 different pairs of tokens from 3 different accounts. Issue is even after get_new_tokens(api, config, index) and waiting 400 seconds, the next request still gets rate limited. Do you see any mistakes I might've made?

There seems to be no problem here. api.set_auth(access_token) will provide the Authorization: Bearer token, which can be used continuously until an authentication failure is returned.

I am worried that Pixiv has added IP or other statistical dimensions to trigger frequency limits. You can try using a 'clean' account, and manually verify it after triggering the Rate Limit.

@SakiSakiSakiSakiSaki
Copy link

SakiSakiSakiSakiSaki commented Mar 23, 2024

You can try using a 'clean' account

Can you define what you mean by 'clean'? Do you mean an account that's never sent an API request?

Edited my script to see if I was failing to fetch the new tokens, it introduced a new Auth issue:

def get_new_tokens(api, config, index):
        config.read("credentials/config.ini")
        auto_refresh_token = config["pixiv"]["refresh_token"]
        subprocess.run(f"python pixiv_auth.py refresh {auto_refresh_token}", shell=True)
        config.read("credentials/config.ini")
        auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
        api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)
        return index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)
{
   'error': {
      'user_message': '',
      'message': 'Error occurred at the OAuth process. Please check your Access Token to fix this. Error Message: invalid_grant',
      'reason': '',
      'user_message_details': {}
   }
}

@upbit
Copy link
Author

upbit commented Mar 23, 2024

  1. Yes. If an account that has not sent a request recently still encounters Rate limit, it means that Pixvi may block by IP address.
  2. invalid_grant means that the access token is expired (Only valid for about 3600 seconds), and needs to be reauthorized. Store refresh_token replace access_token+refresh_token like this:
-        auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
-        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
-        api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)

+       auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
+       api.auth(refresh_token=auto_refresh_token)

@SakiSakiSakiSakiSaki
Copy link

SakiSakiSakiSakiSaki commented Mar 23, 2024

  1. Yes. If an account that has not sent a request recently still encounters Rate limit, it means that Pixvi may block by IP address.

I change my IP address via VPN during the 400 second sleep.

2. invalid_grant means that the access token is expired (Only valid for about 3600 seconds), and needs to be reauthorized. Store refresh_token replace access_token+refresh_token like this:

-        auto_access_token = config[f"pixiv_alt_{index}"]["access_token"]
-        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
-        api.set_auth(access_token=auto_access_token, refresh_token=auto_refresh_token)

+       auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
+       api.auth(refresh_token=auto_refresh_token)

Sure

def get_new_tokens(api, config, index):
        index += 1
        if index > 3:
            index = 1
        config.read("credentials/config.ini")
        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
        refresh(auto_refresh_token, index)
        config.read("credentials/config.ini")
        auto_refresh_token = config[f"pixiv_alt_{index}"]["refresh_token"]
        api.auth(refresh_token=auto_refresh_token)
        return index

while True:
      illust_detail = api.illust_detail(illust_id)
      if "error" not in illust_detail:
          break
      elif "Rate Limit" in illust_detail["error"]["message"]:
          auto_refresh_token, index = get_new_tokens(api, config, index)
          time.sleep(400)

With VPN, and multiple accounts, I still get rate-limited. After my 218th illustration (its 218 everytime with this same dataset).
And I know my tokens are getting updated because I'm keeping my eye on config.ini; access_token for each alt account changes per iteration.

I even put a 10 minute sleep, plus changing IP with VPN, and I still cant get past 218th illustrations. It's worth mentioning that when I restart my program and use set_auth() with the last known tokens, it lets me parse the 218 images again, no problem.

Am I the only one facing issues with illust_detail()?

@zjDrummond
Copy link

Having a problem with the script. When running "python pixiv_auth.py login" I get this...

Traceback (most recent call last):
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 6, in
import requests
ModuleNotFoundError: No module named 'requests'

@upbit
Copy link
Author

upbit commented Mar 27, 2024

Having a problem with the script. When running "python pixiv_auth.py login" I get this...

Traceback (most recent call last): File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 6, in import requests ModuleNotFoundError: No module named 'requests'

for python3.x: pip install pixivpy3

@zjDrummond
Copy link

Installed that, and now I get this...

PS C:\Users\Zachary\Downloads\pixiv auth> python pixiv_auth.py login
Traceback (most recent call last):
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 157, in
main()
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 153, in main
args.func(args)
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 148, in
login_parser.set_defaults(func=lambda _: login())
^^^^^^^
File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 68, in login
driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: WebDriver.init() got an unexpected keyword argument 'desired_capabilities'

@upbit
Copy link
Author

upbit commented Mar 27, 2024

Installed that, and now I get this...

PS C:\Users\Zachary\Downloads\pixiv auth> python pixiv_auth.py login Traceback (most recent call last): File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 157, in main() File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 153, in main args.func(args) File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 148, in login_parser.set_defaults(func=lambda _: login()) ^^^^^^^ File "C:\Users\Zachary\Downloads\pixiv auth\pixiv_auth.py", line 68, in login driver = webdriver.Chrome("./chromedriver", desired_capabilities=caps) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: WebDriver.init() got an unexpected keyword argument 'desired_capabilities'

See the answer before:
https://gist.github.com/upbit/6edda27cb1644e94183291109b8a5fde?permalink_comment_id=4996990#gistcomment-4996990

@lFitzl
Copy link

lFitzl commented May 3, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment