Created
September 11, 2020 20:42
Compare the performance of `subi' and `andi'
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
; subi_vs_andi.S: Compare the performance of `subi' and `andi' | |
; | |
; Does a bitwise "and" perform better than a subtraction on an | |
; Arduino Uno? | |
; | |
; There are two common ways of converting an ASCII digit to its numeric | |
; value: | |
; | |
; - subtraction: character - '0' | |
; - bitwise and: character & 0x0f | |
; | |
; On an AVR target, the first compiles to a `subi' instruction (subtract | |
; immediate), and the second to `andi' (and with immediate). Both are | |
; single-word, single-cycle instructions. The only possible difference, | |
; performance-wise, can be in their energy consumption. | |
; | |
; This program aims at measuring this consumption by repeatedly running | |
; those instructions. The `subi' instruction is tested with an infinite | |
; loop consisting of the following: | |
; | |
; ldi r24, '0' ; let character = '0' | |
; subi r24, '0' ; compute character - '0' | |
; ldi r24, '1' ; let character = '1' | |
; subi r24, '0' ; compute character - '0' | |
; ... ; and so on up to: | |
; ldi r24, '9' ; let character = '9' | |
; subi r24, '0' ; compute character - '0' | |
; | |
; The loop is unrolled 10 times in order to minimize the fraction of the | |
; time spent looping back: for every 202 CPU cycles, | |
; | |
; - 100 are spent executing `ldi' (load immediate) | |
; - 100 are spent on `subi' | |
; - 2 are spent on `rjmp' (relative jump) | |
; | |
; The `andi' instruction is tested by an identical loop, with the `subi' | |
; instructions replaced by: | |
; | |
; andi r24, 0x0f ; compute character & 0x0f | |
; | |
; Compile: | |
; avr-gcc -mmcu=atmega328p subi_vs_andi.S -o subi_vs_andi.elf | |
; | |
; Upload: | |
; DUDEFLAGS="-p atmega328p -c arduino -P /dev/ttyACM0 -b 115200" | |
; avrdude $DUDEFLAGS -D -U subi_vs_andi.elf | |
; | |
; Usage: | |
; - leave PD2 (digital 2) unconnected to test the `subi' instruction, | |
; or ground it to test `andi' | |
; - power the Arduino through Vin or +5V; after the bootloader runs, | |
; the built-in LED will blink once if testing `subi' and twice if | |
; testing `andi' | |
; - measure the current consumption. | |
#include <avr/io.h> | |
#define io(x) _SFR_IO_ADDR(x) | |
; How many times to unroll the loop. | |
.set unrolling, 10 | |
; Delay about 164 ms @ 16 MHz. | |
delay: | |
clr r26 | |
clr r27 | |
ldi r28, 8 | |
0: sbiw r26, 1 | |
sbci r28, 0 | |
brne 0b | |
ret | |
; Test `subi' once for every ASCII digit from '0' to max_val. | |
.macro test_subi_once max_val='9' | |
.if \max_val > '0' | |
test_subi_once (\max_val-1) | |
.endif | |
ldi r24, \max_val | |
subi r24, '0' | |
.endm | |
; Test `subi' in an infinite loop. | |
test_subi: | |
.rept unrolling | |
test_subi_once | |
.endr | |
rjmp test_subi | |
; Test `andi' once for every ASCII digit from '0' to max_val. | |
.macro test_andi_once max_val='9' | |
.if \max_val > '0' | |
test_andi_once (\max_val-1) | |
.endif | |
ldi r24, \max_val | |
andi r24, 0x0f | |
.endm | |
; Test `andi' in an infinite loop. | |
test_andi: | |
.rept unrolling | |
test_andi_once | |
.endr | |
rjmp test_andi | |
.global main | |
main: | |
; Pull up on PD2 (digital 2) to sense whether it is grounded. | |
sbi io(PORTD), PD2 | |
; Blink the builtin LED once to show the program is running. | |
sbi io(DDRB), PB5 ; PB5 output LOW | |
sbi io(PORTB), PB5 ; PB5 output HIGH | |
call delay | |
cbi io(PORTB), PB5 ; PB5 output LOW | |
; If PD2 is not grounded, test the `subi' instruction. | |
sbic io(PIND), PD2 ; unless PD0 is LOW | |
rjmp test_subi ; test the `subi' instruction | |
; Blink again to show we noticed PD2 is grounded. | |
call delay | |
sbi io(DDRB), PB5 ; PB5 output LOW | |
sbi io(PORTB), PB5 ; PB5 output HIGH | |
call delay | |
cbi io(PORTB), PB5 ; PB5 output LOW | |
; Test the `andi' instruction. | |
rjmp test_andi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The results of my tests, on an Arduino Uno rev. 3, are
subi
: 33.1 mAandi
: 33.3 mAThe
subi
instruction seems to consume a little bit less energy. This may be somewhat surprising, as a subtraction is a more complex operation than a bitwise and. On the other hand, the addition/subtraction circuitry of the ALU is not powered down during the execution ofandi
, and it may well consume as much energy as when executingsubi
. Since the RTL design of the ALU is not public, the results are hard to interpret.The final conclusion of the experiment is that the relative difference, at about 1%, is too small to care about. It is also close to the noise limit of my multimeter. It could be made somewhat larger by using a bare microcontroller (no serial/USB bridge, no power LED). On the other hand, this test greatly exaggerates the consumption difference by spending almost half the CPU cycles executing either
subi
orandi
. Any realistic program is likely to spend only a very small fraction of its time converting digits to their numeric values.In any case, the difference is very unlikely to be relevant to real world applications.
Edit: Note that, for the
andi
test, I disconnected PD2 before making the measurement in order to not be affected by the pull-up current.