Skip to content

Instantly share code, notes, and snippets.

@edgar-bonet
Created September 11, 2020 20:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edgar-bonet/7afbf8b1df02534ac142cf84bb12e87a to your computer and use it in GitHub Desktop.
Save edgar-bonet/7afbf8b1df02534ac142cf84bb12e87a to your computer and use it in GitHub Desktop.
Compare the performance of `subi' and `andi'
; subi_vs_andi.S: Compare the performance of `subi' and `andi'
;
; Does a bitwise "and" perform better than a subtraction on an
; Arduino Uno?
;
; There are two common ways of converting an ASCII digit to its numeric
; value:
;
; - subtraction: character - '0'
; - bitwise and: character & 0x0f
;
; On an AVR target, the first compiles to a `subi' instruction (subtract
; immediate), and the second to `andi' (and with immediate). Both are
; single-word, single-cycle instructions. The only possible difference,
; performance-wise, can be in their energy consumption.
;
; This program aims at measuring this consumption by repeatedly running
; those instructions. The `subi' instruction is tested with an infinite
; loop consisting of the following:
;
; ldi r24, '0' ; let character = '0'
; subi r24, '0' ; compute character - '0'
; ldi r24, '1' ; let character = '1'
; subi r24, '0' ; compute character - '0'
; ... ; and so on up to:
; ldi r24, '9' ; let character = '9'
; subi r24, '0' ; compute character - '0'
;
; The loop is unrolled 10 times in order to minimize the fraction of the
; time spent looping back: for every 202 CPU cycles,
;
; - 100 are spent executing `ldi' (load immediate)
; - 100 are spent on `subi'
; - 2 are spent on `rjmp' (relative jump)
;
; The `andi' instruction is tested by an identical loop, with the `subi'
; instructions replaced by:
;
; andi r24, 0x0f ; compute character & 0x0f
;
; Compile:
; avr-gcc -mmcu=atmega328p subi_vs_andi.S -o subi_vs_andi.elf
;
; Upload:
; DUDEFLAGS="-p atmega328p -c arduino -P /dev/ttyACM0 -b 115200"
; avrdude $DUDEFLAGS -D -U subi_vs_andi.elf
;
; Usage:
; - leave PD2 (digital 2) unconnected to test the `subi' instruction,
; or ground it to test `andi'
; - power the Arduino through Vin or +5V; after the bootloader runs,
; the built-in LED will blink once if testing `subi' and twice if
; testing `andi'
; - measure the current consumption.
#include <avr/io.h>
#define io(x) _SFR_IO_ADDR(x)
; How many times to unroll the loop.
.set unrolling, 10
; Delay about 164 ms @ 16 MHz.
delay:
clr r26
clr r27
ldi r28, 8
0: sbiw r26, 1
sbci r28, 0
brne 0b
ret
; Test `subi' once for every ASCII digit from '0' to max_val.
.macro test_subi_once max_val='9'
.if \max_val > '0'
test_subi_once (\max_val-1)
.endif
ldi r24, \max_val
subi r24, '0'
.endm
; Test `subi' in an infinite loop.
test_subi:
.rept unrolling
test_subi_once
.endr
rjmp test_subi
; Test `andi' once for every ASCII digit from '0' to max_val.
.macro test_andi_once max_val='9'
.if \max_val > '0'
test_andi_once (\max_val-1)
.endif
ldi r24, \max_val
andi r24, 0x0f
.endm
; Test `andi' in an infinite loop.
test_andi:
.rept unrolling
test_andi_once
.endr
rjmp test_andi
.global main
main:
; Pull up on PD2 (digital 2) to sense whether it is grounded.
sbi io(PORTD), PD2
; Blink the builtin LED once to show the program is running.
sbi io(DDRB), PB5 ; PB5 output LOW
sbi io(PORTB), PB5 ; PB5 output HIGH
call delay
cbi io(PORTB), PB5 ; PB5 output LOW
; If PD2 is not grounded, test the `subi' instruction.
sbic io(PIND), PD2 ; unless PD0 is LOW
rjmp test_subi ; test the `subi' instruction
; Blink again to show we noticed PD2 is grounded.
call delay
sbi io(DDRB), PB5 ; PB5 output LOW
sbi io(PORTB), PB5 ; PB5 output HIGH
call delay
cbi io(PORTB), PB5 ; PB5 output LOW
; Test the `andi' instruction.
rjmp test_andi
@edgar-bonet
Copy link
Author

edgar-bonet commented Sep 11, 2020

The results of my tests, on an Arduino Uno rev. 3, are

  • subi: 33.1 mA
  • andi: 33.3 mA

The subi instruction seems to consume a little bit less energy. This may be somewhat surprising, as a subtraction is a more complex operation than a bitwise and. On the other hand, the addition/subtraction circuitry of the ALU is not powered down during the execution of andi, and it may well consume as much energy as when executing subi. Since the RTL design of the ALU is not public, the results are hard to interpret.

The final conclusion of the experiment is that the relative difference, at about 1%, is too small to care about. It is also close to the noise limit of my multimeter. It could be made somewhat larger by using a bare microcontroller (no serial/USB bridge, no power LED). On the other hand, this test greatly exaggerates the consumption difference by spending almost half the CPU cycles executing either subi or andi. Any realistic program is likely to spend only a very small fraction of its time converting digits to their numeric values.

In any case, the difference is very unlikely to be relevant to real world applications.

Edit: Note that, for the andi test, I disconnected PD2 before making the measurement in order to not be affected by the pull-up current.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment