Skip to content

Instantly share code, notes, and snippets.

@oleavr

oleavr/_.md Secret

Created September 5, 2016 11:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oleavr/0de8b1ce0d877d55ee720573dcaf4f9e to your computer and use it in GitHub Desktop.
Save oleavr/0de8b1ce0d877d55ee720573dcaf4f9e to your computer and use it in GitHub Desktop.
Response to saurik's comment

I strongly believe that the concepts of a language runtime and "dynamic introspection" should be fundamentally decoupled.

That we agree on, and is precisely how I designed Frida.

The architecture of Frida, and I'm going to be extremely blunt here (...snip...) is a disorganized mess that mixes all of these layers together in ways that I find completely unacceptable.

Frida has a modular architecture that is highly decoupled in nature, and you can use it à la carte. You can grab frida-gum, the instrumentation core, and use it from C. This gives you access to function hooking, introspection of loaded libraries, their exports, mapped memory ranges, etc. You can also use this through one of its two language bindings. Either from C++ via Gum++, or JavaScript via GumJS. Now, for a lot of applications you need a way to inject your own code into another process that you want to introspect or instrument. This is where frida-core provides an injector per OS, which is not in any way coupled to one specific payload. (As a side-note, on Mac and iOS frida-gum provides an out-of-process dynamic linker that frida-core's injector uses to map your .dylib into sandboxed processes.) These injectors are not currently exposed through that public API, but the plan is to expose them, there just hasn't been much demand for it. So, recognizing that most applications actually just want an easy way to use Gum's APIs from the inside of another process, we do provide a high-level API where you can simply say "run this piece of JavaScript with full access to Gum's APIs from the inside of that other process, and let me optionally exchange JSON messages with it". This simply composes the previously mentioned components to take care of the nitty gritty for you, i.e. packages GumJS into a shared library, frida-agent, which it injects using the platform-specific injector, and does the necessary RPC for you to instantiate scripts and exchanges messages. This high-level API is also exposed through multiple language bindings, like Python, Node.js, Swift, .NET, and Qt/Qml. So, going back to the standalone injection, a lot of applications will also need a bi-directional communications channel. Once they've implemented that, they'll run into problems like this. This is where frida-core's internal pipe library gives you a portable transport that uses a low-level API on each platform, e.g. mach ports on Mac and iOS, a named pipe on Windows, etc. This is a lot of complexity to burden every single application with, and Cycript is just one of many that will have to solve this problem.

(as you keep making tons of annoying and arbitrary claims about my work while simultaneously never giving me credit for things I pioneer, so I don't really feel like you deserve kid gloves)

My blog post opened by saying Cycript is awesome and that you created it. As far as Frida goes, it was created before I had ever heard of your project, so the way I see it no credit is due there.

If you want to provide Frida's function hooking to Cycript, write a language binding for it... that's all Substrate is to Cycript: a module you can import.

This was among the options I considered, and is really straight-forward to do, but I took a step back and saw a DSL wanting to come out, I saw a type system that didn't have to be written in C++ and tied to JavaScriptCore, and applications beyond an interactive console (which is awesome in itself, but it doesn't have to be coupled). The thinking behind GumJS is to provide a lean and mean runtime that gives you some essential APIs, and then make it easy for people to compile their own scripts by composing them from Frida-specific modules developed by the community, e.g. for tracing APIs, interacting with UIKit, grabbing screenshots, etc., while also giving you access to thousands of generic modules from npm. We do currently have two Frida-specific modules built in, specifically Objective-C and Java, but the plan is to move these out to their own modules in npm.

If you want to use Frida's process injection to support Cycript injecting into other processes on Windows or on Linux, you should note that cycript literally just runs, as an external process, cynject, which is a tool from Substrate which provides only process injection. As far as I know Frida doesn't have anything similar, but you should.

As mentioned earlier this is already there and the plan is to expose it, but there hasn't been much demand for it. Last time I had to do this it was in order to get Cycript loaded into a sandboxed process, so I quickly cobbled together this in a matter of minutes.

As it stands, Cycript is actually already extremely portable: it requires readline, JavaScriptCore, and libffi. It doesn't have any concept of assembly outside of libffi. It has no machine-specific concepts embedded into it. Its implementation of Objective-C bindings works on GNU Objective-C as well as Apple's runtime. Its implementation of Java bindings works almost entirely at the JNI level and works with both Google's runtimes for Android as well as Oracle's official runtimes.

For sure a lot of impressive work, I am merely suggesting how to make it even more portable. E.g. it does use a fair amount of GNU C extensions, POSIX APIs that don't exist on Windows, implements its own transport, needs to deal with the architecture-dependent quirks of objc_msgSend (stret, fpret), etc.

Given this context--that Frida's architecture seems almost hopelessly coupled

Except it isn't, as I debunked earlier.

--I want to examine your claims of a performance improvement by doing this: you seriously are linking to a comparison of one underutilized feature of Substrate vs. Frida. Putting aside for a second that I'd be surprised if most features of Substrate aren't actually faster than Frida, Cycript's language bindings are almost certainly faster than Frida's as Frida seems to be implementing its FFI layer in JavaScript :/.

This is incorrect. Mjølner is simply implementing a Cycript-compatible type system on top of the bare metal libffi API provided by GumJS. This just boils down to expressing Memory.writeU32(dimensions.add(8), 640) as dimensions->width = 640 (i.e. dimensions.$cyi.width = 640 after compilation).

I am going to repeat: Cycript is a programming environment.

That's what it is, but couldn't it be more if you decoupled the pieces?

It is in many ways quite comparable to Python, and as such it is the implementation of a language called "Cycript" (where the hell did you come up with "cylang"?!?).

I was trying to disambiguate the language from the interactive console. Just as Python's REPL does not have to be part of its runtime.

Of course, a more obvious comparison is to node.js, but it is a weird mix of syntax designed to let you slide between semantics of various other programming languages. It integrates these syntax features to provide seamless and fluent bindings to Objective-C and Java.

I understand your point, and what you have built is awesome, but does that mean it should not evolve to the next level of awesomeness? :-) It's fine that we disagree on what precisely that is, though.

That's all Cycript is, and I've been careful to remove, not add, dependencies or concepts of "dynamic introspection" from Cycript. Cycript is extremely portable, and is only going to get more portable over time. As an example of this, right now Cycript has a -p argument which internally resolves a process name into a process identifier. That functionality should not be in cycript: instead, that functionality should be in cynject, which is part of Substrate. I actually have had that on my todo list the past two weeks.

That's great, but things are still highly coupled. But OTOH tighter integration vs highly decoupled architectures both have their pros and cons. Tigher integration obviously gives you more vertical control.

Moving that from Cycript to Substrate means that more complex ways of resolving processes, such as Frida's device target, would also be supported in a very natural manner... of course, assuming Frida had a stand-alone injection mechanism Cycript could run instead of cynject, which it doesn't seem to have. I would be more than happy to provide a way to specify the name of the code injection tool to use is (which would be really clean now that Cycript no longer relies on socket backchannels).

As mentioned earlier this is trivially exposable and I plan to do that, but there just hasn't been much demand for it, as requesters typically end up realizing that Frida already has a higher-level solution to their problem: "Oh, those hooks don't have to be written in C, JavaScript with a feedback loop of milliseconds makes me so much more productive, and no need for my own platform-specific communication channels, etc."

I have friends who are on the ECMAScript standardization committee, and I care a lot about implementation details in the JavaScript runtime. The use cases and vision I have for Cycript are things which may eventually require modifications to the runtime to get the kind of performance I want binding across different VMs. Future versions of Cycript might not generate vanilla JavaScript.

That's fine, there's definitely more than one way to skin this cat.

I'm not going to merge a massive dependency on Frida, when Frida is literally just v8/duktape and doesn't have any benefits.

Agree to disagree on this one, as already explained. And if footprint is a concern it is trivial to build GumJS without V8, which is what we currently do for embedded targets. It is also trivial to build frida-core without the local backend, if you only care about remote iOS devices for example.

The reverse, however, just doesn't seem to be true: I don't understand why you have mingled all these parts together, and I don't understand why you are keen to keep them together.

They are not mingled together, as explained earlier.

If you provided tiny tools instead of a massive wad of stuff, Frida would probably get more use in the field.

Again, this is just not true. And it is being heavily used in the field. People are building tools on top of it, e.g., in no particular order:

https://github.com/dpnishant/appmon https://github.com/mwrlabs/needle https://github.com/antojoseph/diff-gui https://github.com/AndroidSecurityTools/lobotomy https://immunityproducts.blogspot.no/2015_09_01_archive.html https://github.com/Nightbringer21/fridump https://github.com/OALabs/frida-extract https://github.com/nowsecure/r2frida etc.

There are also companies building products on top of it. This is precisely the kind of things Frida was designed for – a platform for cross-platform dynamic instrumentation. Use it through its low-level building blocks or a higher level API.

It also would let you use Cycript as is and yet still have all of your more-cross-platform function hooking and more-cross-platform code injection functionality. That architecture is just so beautiful and clean :(.

This is already possible, and I would encourage you to do it, but this fork is about exploring an even deeper integration where I tried to reimagine Cycript by recognizing that there is a lot of overlap. Both approaches have their pros and cons, as I mentioned regarding vertical control.

Ability to attach to sandboxed apps on Mac, without touching /usr or modifying the system in any way; Other than turning off SEP, I don't think Cycript still requires this as of a few weeks ago? The same changes I made for iOS 9.3 to cynject I believe also bypass all of these weird problems on Mac as well (and at 360|iDev I was working with people who had Cycript in a random folder in their home directory and we were able to inject into sandboxed apps, but maybe I wasn't testing an app with a sufficiently strong sandbox).

Actually s/sandboxed apps/sandboxed processes/ to be more precise, e.g. a daemon without any filesystem access.

Instead of crashing the process if you make a mistake and access a bad pointer, you will get a JavaScript exception; It is not clear to me this is actually a good thing, though I sometimes consider it; this would be a trivial thing to add to Cycript.

It's trivial, in theory, but this is where you quickly end up with OS and arch-specific quirks. Already a solved problem in GumJS.

Frida's function hooking is able to hook many functions not supported by Cydia Substrate. This literally has nothing to do with Cycript.

It is however what comes bundled with Cycript, and I have yet to speak to a Cycript user who did not do their function hooking through MS.

If nothing else, I'm going to strongly ask that you rename your project from "cycript" to "frida-cycript" or something like that, in the same way that Microsoft calls their fork of nodejs nodejs-chakracore. Someone might start using your fork, and then start talking about what it can and can't do, or provide code examples using it, and you are going to undermine people being able to talk about Cycript. People who find your fork should know that it isn't actually Cycript: it is extremely unrelated, really.

We wholeheartedly agree on this point. It just seemed premature to do this before we knew whether there's a chance you're interested in merging our changes. It's now been renamed frida-cycript.

@saurik
Copy link

saurik commented Sep 5, 2016

My blog post opened by saying Cycript is awesome and that you created it. As far as Frida goes, it was created before I had ever heard of your project, so the way I see it no credit is due there.

My comments there were with respect to Substrate. Reading your release notes on your update of Frida to iOS 9 was essentially "fuck you saurik, we are so smart for finally figuring out this CS_VALID thing you had working on the day of the jailbreak release, and people should now use Frida because it is faster (in a way that doesn't matter to users of Substrate anyway)" (and that's also how it read to some of my users when you came to r/jailbreak to challenge me with your post).

Except it isn't, as I debunked earlier.

No, you didn't. You showed how in theory I could build the things I wanted, but you also claimed you didn't care about those use cases. If you actually had some of these parts standalone it would be trivial to let Cycript use them.

This is where frida-core's internal pipe library gives you a portable transport that uses a low-level API on each platform, e.g. mach ports on Mac and iOS, a named pipe on Windows, etc. This is a lot of complexity to burden every single application with, and Cycript is just one of many that will have to solve this problem.

Cycript no longer even uses a "transport": it is now exactly as portable as its underlying injection technique. The Unix domain socket server code was removed in the most recent release, and will never return. The concept of having platform-specific "transports" was a mistake. Again: my goal has been to remove platform specific things. There are no longer any platform-specific "transports" in Cycript.

Again, this is just not true. And it is being heavily used in the field. People are building tools on top of it, e.g., in no particular order:

These are all tools for researchers, not deployments of software to users or targets. I was talking about "in the field", not "in the lab".

It is however what comes bundled with Cycript, and I have yet to speak to a Cycript user who did not do their function hooking through MS.

This is like saying people tend to use a particular SQL binding from PHP, one which sucks, so let's replace all of PHP's internals instead of just writing a better SQL binding library. It is a completely insane way of going about showing a benefit.

This was among the options I considered, and is really straight-forward to do, but I took a step back and saw a DSL wanting to come out, I saw a type system that didn't have to be written in C++ and tied to JavaScriptCore, and applications beyond an interactive console (which is awesome in itself, but it doesn't have to be coupled).

The reason I implemented the type system in C++ is because I need efficient native access from the FFI layer. I don't see any benefit in doing this from JavaScript: it might make the execution of the setup parts slightly faster, but it is just going to make the FFI parts slower later. (FWIW, I wanted to see how much slower frida-cycript was at doing basic things like sending Objective-C messages, but when I tried to stress test it it didn't even work: I allocated ten thousand objects and it just started throwing errors and quit.)

(BTW, if you don't want to use the console, you don't have to. The library itself exposes an interface for running individual commands in the remote process. "cynject pid dylib -e command" will just output the result of running the command. The console is extremely uncoupled to everything else.)

That's great, but things are still highly coupled. But OTOH tighter integration vs highly decoupled architectures both have their pros and cons. Tighter integration obviously gives you more vertical control.

You say Cycript is highly coupled in multiple places. Please tell me what is coupled. As I have said: if you wanted to provide your own function hooking, Cycript is no more coupled to Substrate than Python is coupled to the Python Image Library: Substrate is literally just a library like any other library, and if Frida is going to provide bindings to C or Python or Ruby, why not also Cycript?

The process injection in Cycript is also extremely decoupled: the only coupling points currently are that 1) the process identifier is looked up inside of Cycript before providing it to the underlying process injection tool (which is due the "will this be running in the iOS simulator" check; I need to work on that) and 2) the filename "cynject" is hardcoded. #1 is on my todo list and #2 is a trivial fix. There is no transport anymore, so if you provide a tool which works like cynject then Cycript will work fully out of the box in the way you are hoping.

Currently, Cycript does in fact compile down to vanilla JavaScript. Whether this is always the case or not, being a JavaScript compiler and minifier was always a goal, and you can actually use Cycript to just get its compiler by using "-c". I even have a supported build feature for building Cycript to just be a compiler, as I wanted to be able to use it as a minifier in places where I didn't have JavaScriptCore.

As I said in a different comment, if the goal is to just use Cycript for just its console, due to the extremely minimal interaction Cycript needs with the remote side, I could see turning Cycript into something like rlwrap, but for JavaScript. I have this on my todo list, actually, so you can use Cycript's console over node.js. I could easily see Cycript being layered over frida-repl in that way. This seems like it would be a really clean way to use Cycript's "user experience" without having any of its underlying logic you want to replace (as I really don't mind if you want to replace it: I just want to support it in a way that doesn't involve a dependency on Frida in Cycript's core).

All I'm seeing with what you are doing is tying Cycript to Frida, which in turn ties Cycript to tons of stuff that Cycript currently avoids having to understand. Cycript's implementations of all these bindings is extremely portable: you managed to point to exactly one thing in all of Cycript that is even sort of machine dependent (handling stret and fpret issues), and even that is barely a problem (and Cycript handles this great and I have been dealing with these issues since I started this work in 2007). In contrast, you seem to relish all of these machine-specific details, and you enjoy talking about them, and you want to accumulate more of them; I want less of them, and the fewer of them I have more the clear it is how Frida can interface with Cycript.

So I've now given you tons of examples of clean architecture that would allow you to use Frida's function hooking from Cycript, use Frida's process injection to inject Cycript, or even (with some very minimal work which would also benefit node.js in addition to Frida) use Cycript's console without any of Cycript's internals. I've pointed out that Cycript already has a compiler mode, and that Cycript doesn't even have this transport feature you care so much about avoiding. So why continue with this massive fork to Cycript rather than breaking Frida apart ever so slightly to make its parts reusable in all these contexts?

@saurik
Copy link

saurik commented Sep 7, 2016

On that first paragraph, another great example is "choose". The release notes for Frida 4.2 are all about how awesome this feature is, without even a single mention of Cycript, even though the entire concept of the feature is something I pioneered (including the name--"choose"--which I made up as part of a joke in my documentation, and is definitely not something one would have gotten from any other context). In fact: Tyilo kept coming onto #cycript asking how the feature worked while trying to work through the code for this feature in Cycript, and apparently then went and took what he learned to implement it for Frida, so the pedigree of the code even comes directly from Cycript, and yet still: "Frida is awesome and no mention of saurik or Cycript, even though we read his non-trivial AGPL-3 code to reimplement the feature for our LGPL project"). It's all just so outright rude :/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment