Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@leptos-null
Last active April 12, 2024 03:31
Show Gist options
  • Star 24 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save leptos-null/5ae739d2a561f5d1910fd9af3bb8a945 to your computer and use it in GitHub Desktop.
Save leptos-null/5ae739d2a561f5d1910fd9af3bb8a945 to your computer and use it in GitHub Desktop.
Writing an iOS YouTube Music client

Writing an iOS YouTube Music client

I’ve been using YouTube Music as my main music streaming service for almost a year and a half. The iOS client is great- I’ve never had a single complaint. It’s potentially one of the most bug free apps I’ve ever used, it has an extremely friendly, and simple, graphical interface, and the service itself is great.

I was curious how the client worked in terms of networking, and while curiosity may treat cats poorly, it lands researchers in black sites can provide a lot of insight.

Step 0

The first thing I do when reverse engineering a client is monitor HTTP requests while the application starts up, and when doing the tasks interested in. On a jailbroken iOS device, I use FLEX by FlipBoard.

In retrospect, I should have looked up if YouTube had a public API for getting videos and playlist, but I didn’t, until later. It turns out they didn’t, so this research was still helpful.

The first thing I noticed while watching the HTTP requests were that there were any. There was a very real possibility that YouTube directly used TCP. In Steps to Step 1, an article I posted a few days after I began research, I enumerated some of the authentication mechanisms I observed at this stage. The below list is comprised of general components of the request.

  • API key
  • HTTP Authorization header field
  • HTTP X-Goog-Device-Auth header field
  • HTTP X-Goog-Visitor-Id header field
  • HTTP body (Content-Type: application/x-protobuf)

At this point, I don’t believe the X-Goog-Visitor-Id HTTP header field is required, and it’s not included in the below list for this reason.

Step 1 (API key)

The API key was passed as the sole query parameter in the URL requests. After running the app twice, and seeing the same API key, I ran strings over the binary, and saw the key, so I decided to it was safe to hardcore this value.

Step 2 (Device-Auth)

The X-Goog-Device-Auth HTTP header field consisted of a commas delimited dictionary with three constant keys: "device_id", "data", and "content". The values appeared to be some kind of encoded value. Using Hopper, I was able to find the cross references to the "X-Goog-Device-Auth" string. I found the YTApiaryDeviceCrypto class, and reverse engineered it. My implementation is LMApiaryDeviceCrypto. The HTTP body and URL get encrypted using a “device key” which was mapped to the “device_id” on Google’s servers. This is explained in more detail in the above article, and the code is attached as well (the header contains information on how to obtain device keys and IDs).

On November 27th, I tweeted that this field is not correctly validated on the server. To clarify, I can’t say that it’s not “correct”, as there’s no public specification to compare against. That mentioned, I don’t see a reason not to validate in this manner. The issue the tweets outline is that nil, or otherwise malformed data, passes validation tests (these scenarios should not pass validation tests).

Step 3 (Authorization)

I originally thought that the Authorization field was another hard coded string. I didn’t see any reference in UserDefaults, and no network requests were going out when the app lunched. I had doubts when I couldn’t find the string in the binary.

Google’s public OAuth API documentation was helpful in figuring this out. The process Google apps use is mostly public, however portions are private. The Google Account login page, for example, is different than the public method. The private version allows access to private auth scopes, which is required to consume InnerTube API.

The refresh token provided by the login page is saved in Keychain. Using the refresh token, an access token for the OAuth scope can be requested. These are valid for 24 hours. Using an OAuth scoped access token, an access token for InnerTube API’s can be requested. These are valid for 60 minutes at a time. Once either of these token expire, they have to be refreshed using whichever dependency is required. Refresh tokens are discussed in Google’s public API, and do not expire. Standard (acquired via public API) refresh tokens can be revoked by the user. Refresh tokens retrieved via the private method are not visible to the user, and can only be revoked through the same private REST API.

Step 4 (Body)

I think the most difficult part of this project was recreating the HTTP body contents. The HTTP content header was marked application/x-protobuf. I had never used Protobuf before, so this was very intimidating. After some poking around, I found out protoc had a —raw_decode argument. This helped to find out the meaning of binary messages.

I started working on a tool, ProtoDump, to get the original proto files of GPBMessage subclasses, however at the time of writing this, it’s not fully working. I instead wrote a small tool that copies the descriptor data of each message. With an Objective-C header dumping tool, I was able to reconstruct all 6577 message classes YouTube Music had. Originally I tried using only the classes needed for the requests I wanted, however this didn’t work out, because of the runtime class check. The entire class tree had to be available, otherwise the Protobuf runtime library would raise an exception.

I wrote a MobileSubstrate tweak to log GPBMessage encodes and decodes, as well as Requests signed by YTApiaryDeviceCrypto. Using these three log points, I was able to put together which messages were being sent to which endpoints, and then how to decode the response. Fortunetly, Google made this fairly straight forward. The browse endpoint took a BrowseRequest and returned a BrowseResponse, etc. In this example the REST call looks like POST https://youtubei.googleapis.com/youtubei/v1/browse?key=AIzaSyDK3iBpDP9nHVTk2qL73FLJICfOC3c51Og.

Conclusion

Thanks! I hope this blog was fun or helpful. I have a mostly working YouTube Music client, and hope to make that project public on GitHub too. If you have any questions, please reach out to me on Twitter @leptos_null, or email = "leptos.%d.null@gmail.com", NULL

This is my first time doing a write up on a project like this. If you have any recommendations, please let me know!

@steinybot
Copy link

Thanks for sharing!

I'm investigating whether it would be possible to write an Amazon Alexa skill for YouTube Music. This is some useful information.

Have you made any progress on the client?

@leptos-null
Copy link
Author

@steinybot The project is here: https://github.com/leptos-null/LeptosMusic

The comments in this thread may also be helpful: https://gist.github.com/leptos-null/8792b9c50fddc00cf525ed5055a872dc

Feel free to ask any questions you have

@onewayticket255
Copy link

Thanks a lot!

I'm interesting in protobuf serialization and deserialization(Request Body and Response Body)

My goal is to write a MITM server which can deserialization body and remove all AD then serialization to YouTube app.

I think the most difficult part of this project was recreating the HTTP body contents.

LMusic is an iOS client for YouTube Music. It uses Google's InnerTube backend service, posing as YouTube Music. Currently, the protobuf messages are not included with this project

Is there any possibility or other ways to dump protobuf?

@leptos-null
Copy link
Author

@onewayticket255 I developed ProtoDump for this purpose, however it's not currently sufficient for sophisticated projects such as YouTube.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment