Skip to content

Instantly share code, notes, and snippets.

@AadilGillani
Created October 1, 2021 06:49
Show Gist options
  • Save AadilGillani/8c5690ebbaceda2914f9dc37197bd154 to your computer and use it in GitHub Desktop.
Save AadilGillani/8c5690ebbaceda2914f9dc37197bd154 to your computer and use it in GitHub Desktop.
Smalli Cheat-Sheet
A little help in Smali
(To be supplemented)
#
general information
#
Smali
Types
Dalvik bytecode has two main type classes, primitive types and reference types. Reference types are objects and arrays, everything else is primitive.
Primitives are represented by a single letter.
V - Void - can only be used for return types
Z - Boolean (logical)
B - Byte (byte)
S - Short
C - Char
I - Integer
J - Long (64 bits)
F - Float (floating)
D - Double (64 bits)
Objects take the form Lpackage / name / ObjectName; - where the leading "L" indicates that this is the type of object, package / name / is the package that contains the object, ObjectName is the name of the object, and ";" marks the end of the object name. This will be equivalent to package.name.ObjectName in java. Or for a more specific example, Ljava / lang / String; equivalent to java.lang.String
Arrays take the form [I - this will be an array of integers with one dimension. those. int [] in Java. For multidimensional arrays, you simply add more "[" characters. [[I = int [] [], [[[I = int [] [] [] etc.) (Note: the maximum number of measurements you can have is 255).
You can also have arrays of objects, [Ljava / lang / String; there will be an array of strings.
Methods
Methods are always specified in very detailed form, which includes the type that contains the method, the name of the method, the types of the parameters, and the return type. All this information is needed so that the virtual machine can find the correct method and be able to perform static analysis on bytecode.
They take the form Lpackage / name / ObjectName; -> MethodName (III) Z
In this example, you must recognize Lpackage / name / ObjectName; as a type. MethodName is the name of the method. (III) Z is the signature of the method. III are the parameters (in this case, 3 integers), and Z is the return type (bool).
Method parameters are listed one after the other, with no separators between them.
Here's a more complex example:
Lpackage / name / ObjectName; -> MethodName (I [[IILjava / lang / String; [Ljava / lang / Object;) Ljava / lang / String;
In Java, this would be
String MethodName (int, int [] [], int, String, Object [])
Fields
Fields are also always specified in a verbose form that includes the type containing the field, the field name, and the field type. Again, this allows the VM to find the correct field and also perform static analysis on the bytecode.
They take the form Lpackage / name / ObjectName; -> FieldName: Ljava / lang / String;
It should be pretty obvious - it's the package and object name, field name and field type respectively
#
Registers
Introduction
In the dalvik bytecode, registers are always 32 bits and can contain any type of value. 2 registers are used to store 64-bit types (long - Long and double - Double).
Specifying the number of registers in a method
There are two ways to specify how many registers are available in a method. the .registers directive specifies the total number of registers in the method, while the alternative .locals directive specifies the number of registers without parameters in the method. The total number of registers will also include, however many registers are needed to store method parameters.
How method parameters are passed to a method
When the method is called, the parameters of the method are placed in the last n registers. If the method has 2 arguments and 5 registers (v0-v4), the arguments will be placed in the last 2 registers - v3 and v4.
The first parameter for non-static methods is always the object on which the method is called (this object)
For example, let's say you are writing a non-static method LMyObject; -> callMe (II) V. This method has 2 integer parameters, but also has an implicit LMyObject; parameter before both integer parameters, so there are only 3 arguments for the method.
Suppose you specified that there are 5 registers in method (v0-v4), either with the .registers directive 5 or with the .locals directive 2 (i.e. 2 local registers + 3 parameter registers). When the method is called, the object to which the method is executed (i.e. this reference) will be in v2, the first integer parameter will be in v3, and the second integer parameter will be in v4.
For static methods, they are the same, except that this argument is implicit.
Register names
There are two naming schemes for registers - the usual naming scheme v # and the p # naming scheme for parameter registers. The first register in the p # naming scheme is the first register of parameters in the method. So, let's go back to the previous example of a method with 3 arguments and 5 full registers. The following table shows the common name v # for each register followed by the name p # for parameter registers
v0 First local register
v1 Second local register
v2 p0 First parameter register
v3 p1 Second parameter register
v4 p2 Third parameter register
You can refer to parameter registers by name - it doesn't matter.
Parameter Registers Insertions
p # naming scheme was introduced as a practice question
Let's say you have an existing method with multiple parameters and you add some code to that method and you find that you need extra case. You think: "It's okay, I'll just increase the number of registers specified in the .registers directive!"
Unfortunately, it is not that easy. Be aware that method parameters are stored in the last registers in the method. If you increase the number of registers, you change which registers enter the method arguments. Therefore, you will have to change the .registers directive and renumber each parameter register.
But if the p # naming scheme was used to refer to parameter registers throughout the method, you can easily change the number of registers in the method without worrying about re-numbering any existing registers.
Long / Double values
As mentioned earlier, long and double primitives (J and D respectively) have 64-bit values ​​and require 2 registers. This is important to keep in mind when referring to method arguments. For example, suppose you have a (non-static) method LMyObject; -> MyMethod (IJZ) V. The method parameters are LMyObject;, int, long, bool. Thus, all of its parameters will require 5 registers.
p0 this
p1 I
p2, p3 J
p4 Z
Also, when you call the method later, you need to specify both registers for any double-expanded arguments in the register list for an invoke statement.
#
array (arrays)
array-length vA, vB
A: Destination register (4 bits)
B: Array of reference-bearing register (4 bits)
Stores the length (number of entries) of the specified vB array to vA
fill-array-data vA +,: ​​target
A: Registering a pair containing an array reference
B: Target label defining the array data table
Populates the specified array vA + with the specified data in the target. The link must be in an array of primitives and the data table must match it in type and size. The array width is defined in the table.
The register pairs are occupied by vX and vX + 1. for example v1, v2.
Example data table:
: target
.array-data 0x2
0x01 0x02
0x03 0x04
.end array-data
new-array vA +, vB, Lclass; -> type
A: Destination register (8 bits)
B: Size register
C: Type reference
Creates a new array of the specified type and size. The type must be an array type.
filled-new-array {vA [vB, v .., vX]}, Lclass; -> type
vA-vX: Argument Registers (4 bits each)
B: Type reference
Creates a new array of the specified type and size. The type must be an array type. A reference to the newly generated array can be obtained with the move-result-object command, immediately after the fill-new-array command.
filled-new-array / range {vA .. vX}, Lclass; -> type
vA .. vX: Range of registers containing array parameters (4 bits each)
B: Type reference (16 bit)
Creates a new array of the specified type. The type must be an array type. A reference to the newly created array can be obtained with the move-result-object command, immediately after the fill-new-array / range command.
#
array accessors
Legend:
A (aget): Destination register
A (aput): Source register
B: Array reference
C: Index in the array
aget vA, vB, vC
Retrieves the integer value at index vC from the array referenced by vB and stores it in vA
aput vA, vB, vC
Stores the integer value from vA in the array referenced by vB at the index of vC
There are also other aget / aput, adding an ending changes the value type. For example: aget-objec (Gets an object).
-boolean
-byte
-char
-object
-short
-wide
#
comparison
Legend:
A: Destination register
B: First source register
C: Second source register
B +: First pair of source registers (pair)
C +: Second pair of source registers (pair)
cmp-long vA, vB +, vC +
Compares long values ​​in original registers, keeping 0;
If vB + == vC + then preserves 1;
If vB + <vC + or vB +> vC + then retains -1.
cmpg-double vA, vB +, vC +
Compares double values ​​in original registers, keeping 0;
If vB + == vC + then preserves 1;
If vB + <vC + or vB +> vC + then retains -1.
If vB + or vC + is not a number, 1 is returned.
cmpg-float vA, vB, vC
Compares float values ​​in source registers, keeping 0;
If vB == vC then preserves 1;
If vB <vC or vB> vC then retains -1.
If vB or vC is not a number, 1 is returned.
cmpl-double vA, vB +, vC +
Compares double values ​​in original registers, keeping 0;
If vB + == vC + then preserves 1;
If vB + <vC + or vB +> vC + then retains -1.
If either vB + or vC + is not a number, -1 is returned.
cmpl-float vA, vB, vC
Performs the specified float comparison, storing 0;
If vB == vC then preserves 1;
If vB <vC or vB> vC then retains -1.
If vB or vC is not a number, -1 is returned.
#
const
const vAA, # + BBBBBBBB
A: Destination register (8 bits)
B: 32-bit signed constant integer
Move the specified constant integer value to the specified vAA register.
const / 16 vAA, # + BBBB
A: Destination register (8 bits)
B: Integer (16 bit)
Pushes # + BBBB into vAA register
const / 4 vA, # + B
A: Destination register (4 bits)
B: Integer (4 bits)
Places the specified 4-bit integer constant in the destination register vA.
const / high16 vAA, # + BBBB
A: Destination register (8 bits)
B: Integer (16 bits)
Places a 16-bit constant in the uppermost bits of the vAA register. Used to initialize float values.
const-class vAA, Lclass
A: Destination register (8 bits)
class: Class reference
Will move the reference to the class specified in the vAA destination register. In the case where the specified type is primitive, this will store a reference to a special class of the primitive type.
const-string vAA, "BBBB"
A: Destination register (8 bits)
B: String value
Move the reference to the string specified in the vAA destination register
const-string / jumbo vAA, "BBBBBBBB"
A: Destination register (8 bits)
B: String value
Move the reference to the string specified in the vAA destination register
jumbo - indicates that the value will be "large"
const-wide / 16 vA +, # + BBBB
# While empty
const-wide / high16 vA +, # + BBBB
# While empty
const-wide vA +, # + BBBBBBBBBBBBBBBB
# While empty
#
goto
goto - Unconditional jump to: target.
goto: target
goto / 16: target # 16bit
goto / 32: target # 32bit
Note: goto literally uses +/- offsets from the current command. APKTool converts them to labels for readability. If within the code, a 16-bit value is required for an offset, goto / 16 should be used, or for a 32-bit value, goto / 32 should be used. It's almost impossible to tell if goto / 16 or goto / 32 is required when adding a new command (unless you know for sure). If you don't know exactly which bit, goto / 16 can replace any goto, and goto / 32 can replace any goto / 16 or goto.
Only the replacement cannot be made for a turn: goto cannot replace goto / 16, and it, in turn, cannot replace goto / 32.
#
if
Legend:
A: First register to check (integer)
B: Second register to check (integer)
target: Target label
Note:! = Not equal
if-eq vA, vB,: target
Execution jumps to: target if vA == vB
if-eqz vA,: target
: target if vA == 0
if-ge vA, vB,: target
: target if vA> = vB
if-gez vA,: target
: target if vA> = 0
if-gt vA, vB,: target
: target if vA> vB
if-gtz vA,: target
: target if vA> 0
if-le vA, vB,: target
: target if vA <= vB
if-lez vA,: target
: target if vA <= 0
if-lt vA, vB,: target
: target if vA <vB
if-ltz vA,: target
: target if vA <0
if-ne vA, vB,: target
: target if vA! = vB
if-nez vA,: target
: target if vA! = 0
#
invoke
Legend:
vA-vX: Arguments passed to the method
class: The name of the class containing the method
method: The name of the method to call
R: Return type.
invoke-direct {vA, v .., vX}, Lclass; -> method () R
Calls a non-static direct method (that is, an instance method that by its nature is not overridden, namely either a private instance method or a constructor).
invoke-interface {vA, v .., vX}, Lclass; -> method () R
Calls an interface method (that is, an object whose specific class is unknown using a method that refers to an interface).
invoke-static {vA, v .., vX}, Lclass; -> method () R
Calls a static method (which is always considered a direct method).
invoke-super {vA, v .., vX}, Lclass; -> method () R
Calls the virtual method of the immediate parent class.
invoke-virtual {vA, v .., vX}, Lclass; -> method () R
Calls a virtual method (a method that is not static or final, and is not a constructor).
Note:
If the method returns (R is not "V" for Void), it must be committed to the next line by one of the move-result statements, or it will be lost.
You can also not list all the vA-vX arguments, but make the Range of arguments by adding the / range ending. For example: invoke-direct / range {vA .. vX}, Lclass; -> method () R And this can be done with any of the above invoke.
invoke-direct {v1, v2, v3} is the same as invoke-direct / range {v1 .. v3}
invoke-direct {v0} is the same as invoke-direct / range {v0 .. v0}
It often leads to errors using invoke-virtual {vX} instead of invoke-virtual / range {vX .. vX} in methods with a large number of local registers (v1, v2, v22)
#
misc / misc
check-cast vAA, Lclass
A: Reference register (8 bits)
B: Type reference (16 bits)
Checks if an object reference in vAA can be passed to an instance of the type referenced by class.
Throws a ClassCastException; if this is not possible, execution continues otherwise.
instance-of vA, vB, Lclass
A: Destination register (4 bits)
B: Reference register (4 bits)
C: Class reference (16 bits)
# No description yet
new-instance vAA, Lclass
A: Destination register (8 bits)
B: Type reference
Creates a class object of type and places a reference to the newly created instance in vAA.
The type must be of the non-array class.
nop
Empty command / No operation
throw vAA
A: Exception-bearing register (8 bits)
Throws the specified exception. The exception object reference is in vAA.
#
move
Legend:
A: Destination register (4, 8, 16 bits)
B: Original register (4, 16 bits)
#A: x bits. B: x bits is not part of the code. Added only to denote bits in registers
move vA, vB #A: 4 bits. B: 4 bits
Moves the contents of one non-object register to another.
move / 16 vAAAA, vBBBB #A: 16 bits. B: 16 bits
Does the same as move. Source register and destination register only 16 bits
move / from16 vAA, vBBBB #A: 8 bits. B: 16 bits
Does the same as move / 16. Destination register only 8 bits
move-exception vAA #A: 8 bits
Saves the just caught exception to vAA. This must be the first statement of any exception handler whose exception should not be ignored, and this statement can only ever occur as the first statement of an exception handler. PS: nowhere without tautology)
move-object vA, vB #A: 4 bits. B: 4 bits
Moves the contents of one register object to another.
move-object / 16 vAAAA, vBBBB #A: 16 bits. B: 16 bits
Does the same as move-object. Source register and destination register only 16 bits
move-object / from16 vAA, vBBBB #A: 8 bits. B: 16 bits
Does the same as move-object / from16. Destination register only 8 bits
move-result vAA #A: 8 bits.
Wraps the result of a single word non-object from the most recent invoke type to vAA. This should be done as a statement immediately after the invoke type, the result of which (one-word, not an object) should not be ignored.
move-result-object vAA #A: 8 bits.
Transfers the object result from the last invoke to vAA. This should be executed as a statement immediately following an invoke type or fill-new-array, whose (object) result should not be ignored.
move-result-wide vA + #A: 8 bits.
# While empty
move-wide vA +, vB + #A: 4 bits. B: 16 bits
# While empty
move-wide / 16 vA +, vB + #A: 16 bits. B: 16 bits
# While empty
move-wide / from16 vA +, vBBBB #A: 8 bits. B: 16 bits
# While empty
#
operations
ADD operator - adds values ​​on either side of the operator
#
add-double vA +, vB +, vC +
A: Pair of destination registers (8 bits)
B: Source register pair 1 (8 bits)
C: Source register pair 2 (8 bits)
Calculates vB + + vC + and stores the result in vA +
add-double / 2addr vA +, vB +
A: Source register 1 / destination register pair (8 bits)
B: Source register pair 2 (8 bits)
Calculates vA + vB and store the result in vA +
add-float vA, vB, vC
A: Destination register (4 bits)
B: Source register 1 (4 bits)
C: Source register 2 (4 bits)
Calculates vB + vC and stores the result in vA
add-float / 2addr vA, vB
A: source register 1 / destination register (4 bits)
B: source register 2 (4 bits)
Calculates vA + vB and stores the result in vA
add-int vA, vB, vC
A: destination register (4 bits)
B: source register 1 (4 bits)
C: source register 2 (4 bits)
Calculates vB + vC and stores the result in vA
add-int / lit8 vA, vB, 0xC
A: destination register (8 bits)
B: source register (8 bits)
C: signed constant value constant (8 bits)
Calculates vB + 0xC and stores the result in vA
add-int / lit16 vA, vB, 0xC
A: destination register (4 bits)
B: source register (4 bits)
C: signed constant value constant (16 bit)
Calculates vB + 0xC and stores the result in vA
add-int / 2addr vA, vB
A: source register 1 / destination register (4 bits)
B: source register 2 (4 bits)
Calculates vA + vB and stores the result in vA
AND Operator - A binary operator copies a bit into the result if it exists in both operands.
#
# While empty
DIV Operator - Divides the left operand by the right operand
#
# While empty
MUL operator - multiplies values ​​on either side of the operator
#
# While empty
OR Operator - Copies a bit if it exists in any of the operands.
#
# While empty
REM operator - divides the left operand by the right operand and returns the remainder
#
# While empty
SHL Operator - The value of the left operands is moved left by the number of bits specified by the right operand.
#
# While empty
SHR operator - the value of the right operands is moved to the right by the number of bits specified by the left operand.
#
# While empty
SUB - operator subtracts the left operand from the right operand
#
# While empty
USHR operator - # no description
#
# While empty
XOR Operator - Copies a bit if it is set in one operand, but not in both.
#
# While empty
#
return
The return statement is used to make an explicit return from a method. That is, it again transfers control to the object that called this method. The return statement instructs the interpreter to stop executing the current method. If the method returns a value, the return statement is followed by some expression. The value of this expression becomes the return value of the method.
return vAA
A: Return value register (8 bits)
Returns from the return method of a non-object with the value vAA.
return-object vAA
A: Return value register (8 bits)
Returning from the object-returning method using the object-reference in vAA.
return-void
Returning from a void method with no value.
return-wide vA +
A: Pair of return value registers (8 bits)
Returns a double / long (64-bit) value in vA +.
#
switch
Legend:
A: The register that is being checked
target: Target label of packed-switch table (switches)
packed-switch vAA,: target
Implements a switch statement where case constants are sequential. The instruction (code execution script) uses the index table. vAA pointers to this table to find the instruction offset for a specific case. If vAA drops out of the index table, execution continues with the next command (default case). pack-switch is used when the possible vAA values ​​are consistent regardless of the lowest value.
Example of a table with radio buttons:
: target
.packed-switch 0x1 # 0x1 = Lowest / Lowest vAA
: pswitch_0 # Jump to pswitch_0 if vAA == 0x1
: pswitch_1 # Jump to pswitch_1 if vAA == 0x2
.end packed-switch
sparse-switch vAA,: target
Implements a switch statement where case constants are not sequential. The statement uses a lookup table with case constants and offsets for each case constant. If there is no match in the table, execution continues with the next command (default case).
: target
.sparse-switch
0x3 ->: sswitch_1 # Will go to sswitch_1 if vAA == 0x3
0x65 ->: sswitch_2 # Will go to sswitch_2 if vAA == 0x65
.end sparse-switch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment