Writing an iOS YouTube Music client
I’ve been using YouTube Music as my main music streaming service for almost a year and a half. The iOS client is great- I’ve never had a single complaint. It’s potentially one of the most bug free apps I’ve ever used, it has an extremely friendly, and simple, graphical interface, and the service itself is great.
I was curious how the client worked in terms of networking, and while curiosity may treat cats poorly, it
lands researchers in black sites can provide a lot of insight.
The first thing I do when reverse engineering a client is monitor HTTP requests while the application starts up, and when doing the tasks interested in. On a jailbroken iOS device, I use FLEX by FlipBoard.
In retrospect, I should have looked up if YouTube had a public API for getting videos and playlist, but I didn’t, until later. It turns out they didn’t, so this research was still helpful.
The first thing I noticed while watching the HTTP requests were that there were any. There was a very real possibility that YouTube directly used TCP. In Steps to Step 1, an article I posted a few days after I began research, I enumerated some of the authentication mechanisms I observed at this stage. The below list is comprised of general components of the request.
- API key
- HTTP body (
At this point, I don’t believe the
X-Goog-Visitor-Id HTTP header field is required, and it’s not included in the below list for this reason.
Step 1 (API key)
The API key was passed as the sole query parameter in the URL requests. After running the app twice, and seeing the same API key, I ran
strings over the binary, and saw the key, so I decided to it was safe to hardcore this value.
Step 2 (Device-Auth)
X-Goog-Device-Auth HTTP header field consisted of a commas delimited dictionary with three constant keys: "device_id", "data", and "content". The values appeared to be some kind of encoded value. Using Hopper, I was able to find the cross references to the "X-Goog-Device-Auth" string. I found the
YTApiaryDeviceCrypto class, and reverse engineered it. My implementation is LMApiaryDeviceCrypto. The HTTP body and URL get encrypted using a “device key” which was mapped to the “device_id” on Google’s servers. This is explained in more detail in the above article, and the code is attached as well (the header contains information on how to obtain device keys and IDs).
On November 27th, I tweeted that this field is not correctly validated on the server. To clarify, I can’t say that it’s not “correct”, as there’s no public specification to compare against. That mentioned, I don’t see a reason not to validate in this manner. The issue the tweets outline is that nil, or otherwise malformed data, passes validation tests (these scenarios should not pass validation tests).
Step 3 (Authorization)
I originally thought that the Authorization field was another hard coded string. I didn’t see any reference in UserDefaults, and no network requests were going out when the app lunched. I had doubts when I couldn’t find the string in the binary.
Google’s public OAuth API documentation was helpful in figuring this out. The process Google apps use is mostly public, however portions are private. The Google Account login page, for example, is different than the public method. The private version allows access to private auth scopes, which is required to consume InnerTube API.
The refresh token provided by the login page is saved in Keychain. Using the refresh token, an access token for the OAuth scope can be requested. These are valid for 24 hours. Using an OAuth scoped access token, an access token for InnerTube API’s can be requested. These are valid for 60 minutes at a time. Once either of these token expire, they have to be refreshed using whichever dependency is required. Refresh tokens are discussed in Google’s public API, and do not expire. Standard (acquired via public API) refresh tokens can be revoked by the user. Refresh tokens retrieved via the private method are not visible to the user, and can only be revoked through the same private REST API.
Step 4 (Body)
I think the most difficult part of this project was recreating the HTTP body contents. The HTTP content header was marked
application/x-protobuf. I had never used Protobuf before, so this was very intimidating. After some poking around, I found out
protoc had a
—raw_decode argument. This helped to find out the meaning of binary messages.
I started working on a tool, ProtoDump, to get the original proto files of
GPBMessage subclasses, however at the time of writing this, it’s not fully working. I instead wrote a small tool that copies the descriptor data of each message. With an Objective-C header dumping tool, I was able to reconstruct all 6577 message classes YouTube Music had. Originally I tried using only the classes needed for the requests I wanted, however this didn’t work out, because of the runtime class check. The entire class tree had to be available, otherwise the Protobuf runtime library would raise an exception.
I wrote a MobileSubstrate tweak to log
GPBMessage encodes and decodes, as well as Requests signed by
YTApiaryDeviceCrypto. Using these three log points, I was able to put together which messages were being sent to which endpoints, and then how to decode the response. Fortunetly, Google made this fairly straight forward. The
browse endpoint took a
BrowseRequest and returned a
BrowseResponse, etc. In this example the REST call looks like
Thanks! I hope this blog was fun or helpful. I have a mostly working YouTube Music client, and hope to make that project public on GitHub too. If you have any questions, please reach out to me on Twitter @leptos_null, or
email = "firstname.lastname@example.org", NULL
This is my first time doing a write up on a project like this. If you have any recommendations, please let me know!