Measter/abstraction_adventures.md

## abstraction_adventures.md

      
    Raw
  

              abstraction_adventures.md
            
          
    My Adventures in MMIO Abstraction

Some years ago, I came across a simple Roguelike on Reddit called coreRL.
It's very simplistic; levels are just a box with two walls, only one enemy type with basic AI, no health or
character attributes, and the only goal is to see how far you can get before you die. Having nothing better to do, I
thought it'd be a fun little project to write a port for an Arduino Nano. The only inputs needed are the four movement
keys, and the display can just be a basic SSD1306-driven 128x64 OLED panel.
I could, of course, do this in C++. The language is a known quantity for the ATmega328P that powers the Arduino Nano.
The toolchain is mature, as are the abstractions for interacting with the onboard peripherals. There are also libraries
for SSD1306-driven displays. It's the obvious choice. All I would have to really do is write the game logic.
But where's the fun in that?
Memory-Mapped IO

To actually do anything useful, the microcontroller needs to interact with the outside world. The ATmega328P has four
peripherals that I would be using for this project: An IO Port, a Timer, the Two-Wire Interface (TWI) bus, and the
USART for sending debugging info back to my PC.
These are used by manipulating Memory-Mapped IO (MMIO) registers. Normally, when you read or write to a
memory address, you are accessing some sort of, well... memory. Memory-Mapped IO is what its name suggests: it's
a piece of hardware mapped to a memory address, so that instead of simply accessing a value in memory, you access the
connected hardware. The specifics of how exactly this happens is dependent on the hardware in question. Writing to the
IO ports, for example, just sets a couple of flip-flops. But other devices will be more complex.
Hello World!

To illustrate how you can use these, I'll start with the hello world of microcontrollers: turn on an LED. The Arduino
Nano has an LED connected to pin PB5. The name tells us what set of registers, and which bit we'll need to
manipulate: Port B, bit 5. The IO Ports on the 328P are quite simple: data direction is set using the
DDRn registers (set a bit to 0 for input, 1 for output), while output level is controlled with the PORTn registers
(set a bit to 0 for low, 1 for high).
We want port B, so this will be the DDRB and PORTB registers. The addresses for which are:

PORTB: 0x25
DDRB: 0x24

This part, however, is something that Rust makes a bit awkward. Rust is very... particular about safety, and the problem
here is that, to Rust, these are arbitrary memory addresses that I've magicked out of nowhere. It doesn't know
anything about them, and certainly not that they're IO peripherals. This means I need to access them as raw pointers.
Manipulating raw pointers requires unsafe. Additionally, to prevent an overzealous optimizer removing our
(apparently) unused memory accesses, you need to use volatile reads and writes. In C++ this is easy: you just mark
the entire pointer type as volatile, and every access is handled correctly. Rust is different. In Rust, whether an
access is volatile or not is determined by the site of access, not the type of pointer.
So, with that in mind, here's how I can turn on that LED. I first define some constants:
const PORTB: *mut u8 = 0x25 as *mut _;
const DDRB: *mut u8 = 0x24 as *mut _;
const PB5: u8 = 5;
And then I can set the specific bits in those registers:
unsafe {
    DDRB.write_volatile(1 << PB5);
    PORTB.write_volatile(1 << PB5);
}
And my LED lights up.
Abstraction

This is clearly going to get very messy, very quickly. All this noise around the memory access is going to end up
making code that interacts with the registers hard to read. Hard to read code is hard to understand, buggy, code.
And it gets worse if you want to just set one bit without altering any of the others:
let port_val = PORTB.read_volatile() | (1 << PB5);
PORTB.write_volatile(port_val);
What I want to do is hide away the details of twiddling the bits, and just leave me with the higher level concept.
To do this, I thought about what common functionality these registers will have:

They all have some address in memory.
Reading or replacing the entire contents of the register.
Setting or clearing a specific bit.
Getting the value of a specific bit.

There's a couple ways this could be done, but what I settled on was this trait:
pub trait Register {
    const ADDR: *mut u8;

    unsafe fn set_value(val: u8) {
        Self::ADDR.write_volatile(val);
    }

    unsafe fn get_value() -> u8 {
        Self::ADDR.read_volatile()
    }

    unsafe fn get_bit(bit: u8) -> bool {
        let bit = 1 << bit;

        (Self::get_value() & bit) != 0
    }

    unsafe fn set_bit(bit: u8) {
        let bit = 1 << bit;
        let val = Self::get_value();
        Self::set_value(val | bit);
    }

    unsafe fn clear_bit(bit: u8) {
        let bit = 1 << bit;
        let val = Self::get_value();
        Self::set_value(val & !bit);
    }
}
The function implementations are identical for each register, so I just have a default implementation. The functions
should be unsafe, because there's no way to prove here that what we're doing is actually correct. That will be down to
the caller. With this abstraction, I can then change my register definitions to this:
struct PORTB;
impl Register for PORTB {
    const ADDR: *mut u8 = 0x25 as *mut _;
}
struct DDRB;
impl Register for DDRB {
    const ADDR: *mut u8 = 0x24 as *mut _;
}
And finally, lighting the LED, without clobbering the other bits, now looks like this:
unsafe {
    DDRB::set_bit(PB5);
    PORTB::set_bit(PB5);
}
Improving (or Complicating) the Abstraction

There's two problems with the current implementation which, while aren't huge, do bug me. The first is that not
all registers are 8-bit; some are 16-bit. Now, I could just define the high and low bytes of the registers separately,
and this is what the AVR C headers do, but I would prefer to be able to address it in one operation.
The second is that all the inputs for bit manipulation are just plain u8. This means I could do PORTB::set_bit(30)
and have the compiler accept an incorrect input. It's also not immedately clear whether I should be passing in a bit ID
or a pre-shifted value. There's an additional problem: not all bits in a register have meaning. For example TWI Control
Register (TWCR) bit 1 has no function. Yet I can just pass in a 1 to the set_bit function without issue. This could
all be part of the documentation, but wouldn't it be better if I just couldn't do it wrong in the first place?
The first one is a little easier to tackle, so I'll do that first. I need the register to be generic over the type it
stores. That's fairly simple: just introduce a generic type T and replace all instances of u8 with T. If you just
do that, you'll run into a series of compile errors along the lines of:
error[E0277]: no implementation for `{integer} << T`
  --> src\register.rs:16:21
   |
16 |         let bit = 1 << bit;
   |                     ^^ no implementation for `{integer} << T`
   |
   = help: the trait `core::ops::Shl<T>` is not implemented for `{integer}`

I need to constrain T. The compiler errors give an idea of what traits we'll need: Shl, BitAnd, BitOr,
Not, and Eq. I should also constrain it to only the types that the registers can be: u8 and u16. It should also
be Copy, firstly because we're implicitly copying the value, but also because I'm doing pointer read and writes which
do not play well with drop implementations and soundness related to that, so it should enforce that there's no complex
drop behaviour. Copy also does this, because Copy cannot be implemented for types implementing Drop.
In addition to all of this, I'm also using two constants: 0 and 1. The compiler has no idea in these function
implementations that the 0 and 1 literals are Ts. I'll need associated constants.
To cover these requirements, I introduce another trait, RegisterType, which requires the types listed above, and
implement it for u8 and u16:
pub trait RegisterType: Copy + BitAnd<Output=Self> + BitOr<Output=Self> + Shl<Output=Self> + Not<Output=Self> + Eq + PartialEq {
    const ZERO: Self;
    const ONE: Self;
}

impl RegisterType for u8 {
    const ZERO: Self = 0;
    const ONE: Self = 1;
}

impl RegisterType for u16 {
    const ZERO: Self = 0;
    const ONE: Self = 1;
}
Now to solve the other problem. The bits that can and cannot be used are specific to the register, and also have names
(e.g. the TWCR register's bit 5 is called TWSTA). So what we have is a fixed set of specific, named, values. An enum
is perfect for this. This is how we could represent the valid bits for TWCR:
enum TWCRBits {
    TWIE,
    TWEN,
    TWWC,
    TWSTO,
    TWSTA,
    TWEA,
    TWINT
}
I then want to ensure that the bit twiddling functions can only take the enum representing that register's bits.
For that I need an associated type on the Register trait, so I can name it in the function signatures:
pub trait Register<T: RegisterType> {
    const ADDR: *mut T;
    type BitType;

    ...

    unsafe fn get_bit(bit: Self::BitType) -> bool {
        ...
    }

    unsafe fn set_bit(bit: Self::BitType) {
        ...
    }

    unsafe fn clear_bit(bit: Self::BitType) {
        ...
    }
}
Because all the logic here is based on shifting, I then need to be able to get a T from the BitType telling us
which bit the variant represents. So the BitType needs to implement a function returning that. This function will also
need to return a u8 or u16, depending on the register. Enter the NamedBits trait:
pub trait NamedBits: Copy {
    type DataType: RegisterType;

    fn bit_id(self) -> Self::DataType;
}
I then change the associated type in Register to properly restrict the bit type to only those that implement
NamedBits, and update the bit twiddling functions to call the bit_id function on their input:
pub trait Register<T: RegisterType> {
    const ADDR: *mut T;
    type BitType: NamedBits<DataType = T>;

    ...

    unsafe fn get_bit(bit: Self::BitType) -> bool {
        let bit = T::ONE << bit.bit_id();

        (Self::get_value() & bit) != T::ZERO
    }

    unsafe fn set_bit(bit: Self::BitType) {
        let bit = T::ONE << bit.bit_id();
        let val = Self::get_value();
        Self::set_value(val | bit);
    }

    unsafe fn clear_bit(bit: Self::BitType) {
        let bit = T::ONE << bit.bit_id();
        let val = Self::get_value();
        Self::set_value(val & !bit);
    }
}
The implementations for PORTB and DDRB now look like this:
#[derive(Copy, Clone)]
enum PortBBits {
    PB0,
    PB1,
    PB2,
    PB3,
    PB4,
    PB5,
    PB6,
    PB7,
}

impl NamedBits for PortBBits {
    type DataType = u8;
    fn bit_id(self) -> Self::DataType {
        use PortBBits::*;
        match self {
            PB0 => 0,
            PB1 => 1,
            PB2 => 2,
            PB3 => 3,
            PB4 => 4,
            PB5 => 5,
            PB6 => 6,
            PB7 => 7,
        }
    }
}

struct PORTB;
impl Register<u8> for PORTB {
    const ADDR: *mut u8 = 0x25 as *mut _;
    type BitType = PortBBits;
}

struct DDRB;
impl Register<u8> for DDRB {
    const ADDR: *mut u8 = 0x24 as *mut _;
    type BitType = PortBBits;
}
With this implementation, lighting that LED is done like this:
unsafe {
    DDRB::set_bit(PortBBits::PB5);
    PORTB::set_bit(PortBBits::PB5);
}
There's no longer any ambiguity whether the input is a pre-shifted value or not, nor can I just throw in arbitrary
numbers like before.
There is a final issue: some registers are just data storage. An example of this would the the USART's UDR0
register, which stores the byte being sent or received over the bus. In that case, the register is a byte of data, not a
collection of control bits, so being able to set a specific bit doesn't make sense. However, the abstraction here
requires a type representing the bits.
My solution to this was to create a struct called NoBits, with a private field so it couldn't be constructed outside
of its parent module:
#[derive(Copy, Clone)]
pub struct NoBits<T>(PhantomData<T>);
impl<T: RegisterType> NamedBits for NoBits<T> {
    type DataType = T;
    fn bit_id(self) -> Self::DataType {
        T::ZERO
    }
}
The reason I went for a struct and not an enum with no variants is that it needs to be usable for both 8-bit and
16-bit registers, meaning it does need to be generic, and not using a generic type parameter is a compile error. This
means I can now define the UDR0 register, and still use the get_value and set_value functions, but not the
bit-related ones:
struct UDR0;
impl Register<u8> for UDR0 {
    const ADDR: *mut u8 = 0xC6 as *mut _;
    type BitType = NoBits<u8>;
}
At this point, those famaliar with Rust might be wondering why not just use the From or Into traits from corelib for
converting the bit enum to T? I did try these at first, but found quickly that they, for some reason, don't optimize
well. Even with all the inputs being known at compile time, it would end up not inlining the into call, so you'd get an
unnecessary function call in the final binary. Defining my own specific conversion trait resulted in the inlining taking
place, meaning the entire thing got optimised away.
Multiple Bits

Ok, I can tiddle a single bit in a way that is easy to read, and harder to get wrong. But I also want to set
(or clear) multiple bits in one operation. I could do this with successive calls to set_bit (or clear_bit), but the
volatile accesses start to become a problem. So far, with what I have, setting PORTB's PB5 bit optimises to
this:
sbi     0x05, 5`
But what if we set PB2, PB3, PB5, PB6, and PB7? We get this:
sbi     0x05, 2
sbi     0x05, 3
sbi     0x05, 5
sbi     0x05, 6
sbi     0x05, 7
Because the access is volatile, the compiler doesn't know that it can collect all the bits together and set it all at
once, so I need to do that myself. As before, I don't want to expose manual bit twiddling, so I'll implement two
functions (set_bits and clear_bits) to do it for me. Rust doesn't have variadic functions, but it does have slices,
so I'll make them take a slice:
pub trait Register<T: RegisterType> {
    
    ...

    unsafe fn set_bits(bits: &[Self::BitType]) {
        // Construct the final bit pattern by ORing the shifted bits together.
        let bits = bits.iter().copied()
            .map(NamedBits::bit_id)
            .fold(T::ZERO, |acc , b| acc | (T::ONE << b));

        let val = Self::get_value();
        Self::set_value(val | bits);
    }

    unsafe fn clear_bits(bits: &[Self::BitType]) {
        let bits = bits.iter().copied()
            .map(NamedBits::bit_id)
            .fold(T::ZERO, |acc , b| acc | (T::ONE << b));

        let val = Self::get_value();
        Self::set_value(val & !bits);
    }
}
And now we can just set all our pins like this:
PORTB::set_bits(&[
    PortBBits::PB2,
    PortBBits::PB5,
    PortBBits::PB7,
    PortBBits::PB3,
    PortBBits::PB6,
]);
And have it optimise to a single operation (0xEC is what you get when you OR together the above bits):
in      r24, 0x05
ori     r24, 0xEC
out     0x05, r24
Replacing Bits

What I've got so far works pretty nicely for setting or clearing in one operation. But one thing that comes up a
couple times is when you want to replace the values of certain bits, but leave the others intact. This could be done with
successive calls to clear_bits then set_bits, but that runs into the volatile register issue we had before, with the
complication that actually clearing the bits in the register could do unexpected things for the more complex peripherals.
The bitwise logic involved is fairly simple:
let bits_to_replace = (1 << 2) | (1 << 4) | (1 << 7);
let replace_val = (1 << 2) | (1 << 7);
let reg_val = REG::get_value();

let masked = reg_val & !bits_to_replace;
let new_reg = masked | replace_val;
REG::set_value(reg_val);
But having a function that does it for me makes it easier and reduces the chance of an error. The logic here is just a
combination of the set_bits and clear_bits functions, so let's just do that. It will need to take two sets of bits:
one representing the bits to replace, and a second for the value to replace them with.
unsafe fn replace_bits(mask: &[Self::BitType], value: &[Self::BitType]) {
    let mask = mask.iter().copied()
        .map(NamedBits::bit_id)
        .fold(T::ZERO, |acc , b| acc | (T::ONE << b));

    let value = value.iter().copied()
        .map(NamedBits::bit_id)
        .fold(T::ZERO, |acc , b| acc | (T::ONE << b));

    let masked_value = value & mask;
    let masked_reg = Self::get_value() & !mask;

    Self::set_value(masked_reg | masked_value);
}
One thing we should be careful about here is to ensure the incoming value is also masked (but not with the
inverted mask!), so it doesn't clobber bits outside the masked area. So now I can do this:
PORTB::replace_bits(
    &[
        PortBBits::PB2,
        PortBBits::PB4,
        PortBBits::PB7,
    ],
    &[
        PortBBits::PB2,
        PortBBits::PB7
    ]
);
And have it optimize to this:
in      r24, 0x05
andi    r24, 0x6B
ori     r24, 0x84
out     0x05, r24
Usage Ergonomics

So far, this is looking OK to use for bit twiddling. The need to take in a slice for the multi-bit operations isn't
great, and there's also the fact that I also need to explicitely import the bits enum, and know its name. Both of these
are mildly irritating, but not a huge problem. However, they can be solved. The first, we'll come back to later, but the
second one is trivial to deal with.
The solution is simple enough: associated constants. We simple declare a bunch of associated constants on the
register, which point to the enum variants:
pub struct TWCR;
impl TWCR {
    pub const TWIE: TWCRBits = TWCRBits::TWIE;
    pub const TWEN: TWCRBits = TWCRBits::TWEN;
    pub const TWWC: TWCRBits = TWCRBits::TWWC;
    pub const TWSTO: TWCRBits = TWCRBits::TWSTO;
    pub const TWSTA: TWCRBits = TWCRBits::TWSTA;
    pub const TWEA: TWCRBits = TWCRBits::TWEA;
    pub const TWINT: TWCRBits = TWCRBits::TWINT;
}
And now, when I want to bit-twiddle the TWCR register, I can just get the bit names through the register itself:
TWCR::set_bit(TWCR::TWEN);
There's another, much larger, issue when it comes to setting register values. At the moment, I can only set the register
value with a raw integer. But it's entirely reasonable to want to set the value based on a set of bits. An example
would be when configuring the TWI bus. There's several points where you need to replace the entire value; for example,
when releasing the bus after an arbitration loss, which which currently looks like this:
let bits = (1 << TWCR::TWEN.bit_id()) | (1 << TWCR::TWEA.bit_id()) | (1 << TWCR::TWINT.bit_id());
TWCR::set_value(bits);
What's the point in doing all this abstraction when we're back to that? I could make the set_value function take a
slice of bits like the set_bits, etc. functions, but there are still times when you need to set an integer value. The
approach I chose was (yet) another trait:
pub trait SetValueType<T> {
    fn as_value(self) -> T;
}
I then implemented it for all register types, and slices of register bits, and updated the set_value function:
impl<T: RegisterType> SetValueType<T> for T {
    fn as_value(self) -> T {
        self
    }
}

impl<T: NamedBits> SetValueType<T::DataType> for &[T] {
    fn as_value(self) -> T::DataType {
        self.iter()
            .copied()
            .map(NamedBits::bit_id)
            .fold(T::DataType::ZERO, |acc, b| acc | (T::DataType::ONE << b))
    }
}

pub trait Register<T: RegisterType> {
    ...

    unsafe fn set_value<V: SetValueType<T>>(val: V) {
        let val = val.as_value();
        Self::ADDR.write_volatile(val);
    }

    ...
}
This allows me to set the value both ways (though passing in a slice requires as_ref here, which isn't great):
DDRB::set_value(0x25);
DDRB::set_value([DDRB::PB0, DDRB::PB2, DDRB::PB5].as_ref());
It also optimizes nicely:
ldi     r24, 0x25
out     0x04, r24 // First line
out     0x04, r24 // Second line
Dealing With All These Slices

I don't like all these slices. They look weird and awkward. There's also a couple points when operating the TWI bus where
the TWCR register is being replaced in both branches of an if-statement, but the difference is only one bit. For
example, when handling a packet, you need to configure the register, but may not want to send an ACK signal. Currently
you need to do something like this:
if ack {
    TWCR::set_value(&[TWCR::TWEN, TWCR::TWIE, TWCR::TWINT, TWCR::TWEA].as_ref())
} else {
    TWCR::set_value(&[TWCR::TWEN, TWCR::TWIE, TWCR::TWINT].as_ref();
}
It would be nice if I could build up the value, optionally set the TWEA bit, then set the register. In fact, it
would be pretty great if I could do something like this while still retaining some measure of idiot-protection:
let mut bits = TWCR::TWEN | TWCR::TWIE | TWCR::TWINT;
if ack {
    bits |= TWCR::TWEA;
}
TWCR::set_value(bits);
You may notice that what I'm wanting to do is similar to the rather ugly mess I started with:
let bits = (1 << TWCR::TWEN.bit_id()) | (1 << TWCR::TWEA.bit_id()) | (1 << TWCR::TWINT.bit_id())
So what I need is some way to do the same but in a way that retains information about what register they came from. My
solution was the BitBuilder. It should retain information about what bit type it's used for, so it needs to be generic
over that. Because the bit type is now part of the overall type, we can make the internal field be the register's data
type. We'll make that field private so it can't just be replaced with an arbitrary value.
#[derive(Copy, Clone)]
pub struct BitBuilder<T: NamedBits>(T::DataType);

impl<T: NamedBits> BitBuilder<T> {
    pub fn new() -> Self {
        BitBuilder(T::DataType::ZERO)
    }

    fn set_bit(&mut self, b: T) {
        self.0 = self.0 | (T::DataType::ONE << b.bit_id());
    }
}
Now, to get the ORing behaviour I want, I can simply implement the BitOr and BitOrAssign traits for when the right
hand side is both a Bit of the same type as the BitBuilder's Bit type, and when it's another BitBuilder over the
same Bit type:
impl<T: NamedBits> BitOr<Self> for BitBuilder<T> {
    type Output = BitBuilder<T>;
    fn bitor(mut self, rhs: Self) -> Self::Output {
        self.0 = self.0 | rhs.0;
        self
    }
}

impl<T: NamedBits> BitOr<T> for BitBuilder<T> {
    type Output = BitBuilder<T>;
    fn bitor(mut self, rhs: T) -> Self::Output {
        self.set_bit(rhs);
        self
    }
}

impl<T: NamedBits> BitOrAssign<Self> for BitBuilder<T> {
    fn bitor_assign(&mut self, rhs: Self) {
        self.0 = self.0 | rhs.0;
    }
}

impl<T: NamedBits> BitOrAssign<T> for BitBuilder<T> {
    fn bitor_assign(&mut self, rhs: T) {
        self.set_bit(rhs);
    }
}
With that, I can now build up a bit pattern like this:
let bits = BitBuilder::new() | DDRB::PB0 | DDRB::PB2 | DDRB::PB5;
Much closer to what I want. Of course, I can't actually use it, because our Register doesn't take BitBuilder.
Time to change every function that takes a slice, to instead take a BitBuilder:
pub trait Register<T: RegisterType> {

    ...

    unsafe fn set_bits(bits: BitBuilder<Self::BitType>) {
        let val = Self::get_value();
        Self::set_value(val | bits.0);
    }

    unsafe fn clear_bits(bits: BitBuilder<Self::BitType>) {
        let val = Self::get_value();
        Self::set_value(val & !bits.0);
    }

    unsafe fn replace_bits(mask: BitBuilder<Self::BitType>, new_val: BitBuilder<Self::BitType>) {
        let reg_val = Self::get_value() & !mask.0;
        Self::set_value(reg_val | (new_val.0 & mask.0));
    }
}
I also need to implement SetValueType for the BitBuilder so that set_value can take it:
impl<T: NamedBits> SetValueType<T::DataType> for BitBuilder<T> {
    fn as_value(self) -> T::DataType {
        self.0
    }
}
Now to get rid of that awkard construction at the beginning of the OR operation. The way to do that is to implement
BitOr on the bit type itself, but instead of the output type being the same bit type, it's a BitBuilder over the bit
type. The implementation is similar to BitBuilder:
impl BitOr for PortBBits {
    type Output = BitBuilder<PortBBits>;
    fn bitor(self, rhs: Self) -> Self::Output {
        BitBuilder::new() | self | rhs
    }
}

impl BitOr<BitBuilder<PortBBits>> for PortBBits {
    type Output = BitBuilder<PortBBits>;
    fn bitor(self, rhs: BitBuilder<PortBBits>) -> Self::Output {
        rhs | self
    }
}
And now, finally, I can do what I wanted, and just OR together bits:
DDRB::set_value(DDRB::PB0 | DDRB::PB2 | DDRB::PB5);
One final problem, is that if I try to do this:
DDRB::set_value(DDRB::PB0);
I get the following error:
error[E0277]: the trait bound `PortBBits: register::SetValueType<u8>` is not satisfied
   --> src\main.rs:94:25
    |
94  |         DDRB::set_value(DDRB::PB0);
    |                         ^^^^^^^^^ the trait `register::SetValueType<u8>` is not implemented for `PortBBits`
    |
   ::: src\register.rs:107:5
    |
107 |     unsafe fn set_value<V: SetValueType<T>>(val: V) {
    |     ----------------------------------------------- required by `register::Register::set_value`

Which feels really inconsistant. So to fix that, SetValueType needs to be implemented for the bit type, too:
impl SetValueType<u8> for PortBBits {
    fn as_value(self) -> u8 {
        1 << self.bit_id()
    }
}
And now it builds. You might be concerned at this point about how well this optimises. There is a fair bit of
indirection now. But, the following:
DDRB::set_value(DDRB::PB0 | DDRB::PB2 | DDRB::PB5);

DDRB::set_bits(DDRB::PB0 | DDRB::PB2 | DDRB::PB5);
DDRB::clear_bits(DDRB::PB0 | DDRB::PB2 | DDRB::PB5);
DDRB::replace_bits(
    DDRB::PB0 | DDRB::PB2 | DDRB::PB5,
    DDRB::PB2 | DDRB::PB5
);
Compiles down to:
// set_value
ldi     r24, 0x25
out     0x04, r24

// set_bits
in      r24, 0x04
ori     r24, 0x25
out     0x04, r24

// clear_bits
in      r24, 0x04
andi    r24, 0xDA
out     0x04, r24

// replace_bits
in      r24, 0x04
andi    r24, 0xDA
ori     r24, 0x24
out     0x04, r24
And also, because the bit type is a part of the BitBuilder, mixing bit types becomes a compile error:
error[E0277]: no implementation for `PortBBits | TWCRBits`
   --> src\main.rs:148:30
    |
148 |         let bits = DDRB::PB5 | TWCR::TWEN;
    |                              ^ no implementation for `PortBBits | TWCRBits`
    |
    = help: the trait `core::ops::BitOr<TWCRBits>` is not implemented for `PortBBits`

One issue remains, which is that this is accepted:
let bits = DDRB::PB5 | DDRB::PB4;
TWCR::set_value(bits);
That is because the trait bound on set_value only requires that the register type match, with no way to ensure that
the bit type also matches. A way to prevent that is to add a new function for setting a raw value, and change the
SetValueType to be generic over the bit type, not the register type:
pub trait SetValueType<T: NamedBits> {
    fn as_value(self) -> T::DataType;
}

pub trait Register<T: RegisterType> {
    
    ...

    unsafe fn set_raw_value(val: T) {
        Self::ADDR.write_volatile(val);
    }

    #[inline(always)]
    unsafe fn set_value<V: SetValueType<Self::BitType>>(val: V){
        let val = val.as_value();
        Self::ADDR.write_volatile(val);
    }
}
And now we get a nice compile error:
error[E0277]: the trait bound `register::BitBuilder<PortBBits>: register::SetValueType<TWCRBits>` is not satisfied
   --> src\main.rs:90:25
    |
90  |         TWCR::set_value(bits);
    |                         ^^^^ the trait `register::SetValueType<TWCRBits>` is not implemented for `register::BitBuilder<PortBBits>`
    |
   ::: src\register.rs:103:5
    |
103 |     unsafe fn set_value<V: SetValueType<Self::BitType>>(val: V){
    |     ----------------------------------------------------------- required by `register::Register::set_value`
    |
    = help: the following implementations were found:
              <register::BitBuilder<T> as register::SetValueType<T>>

Declaration Ergonomcs

Ok, so I've got an abstraction for the registers and their associated bits which: is easy to use; is harder to get
wrong compared to bare bitwise; isn't noisy; and compiles well. Everything's great, right? We just have to define the
types for the register, and we're good to go!
#[derive(Copy, Clone, Eq, PartialEq)]
pub enum TWCRBits {
    TWIE,
    TWEN,
    TWWC,
    TWSTO,
    TWSTA,
    TWEA,
    TWINT,
}

impl NamedBits for TWCRBits {
    type DataType = u8;
    fn bit_id(self) -> u8 {
        match self {
            TWCRBits::TWIE  => 0,
            TWCRBits::TWEN  => 2,
            TWCRBits::TWWC  => 3,
            TWCRBits::TWSTO => 4,
            TWCRBits::TWSTA => 5,
            TWCRBits::TWEA  => 6,
            TWCRBits::TWINT => 7,
        }
    }
}

impl SetValueType<TWCRBits> for TWCRBits {
    fn as_value(self) -> u8 {
        1 << self.bit_id()
    }
}

impl BitOr for TWCRBits {
    type Output = BitBuilder<TWCRBits>;
    fn bitor(self, rhs: TWCRBits) -> Self::Output {
        BitBuilder::new() | self | rhs
    }
}

impl BitOr<BitBuilder<TWCRBits>> for TWCRBits {
    type Output = BitBuilder<TWCRBits>;
    fn bitor(self, rhs: BitBuilder<TWCRBits>) -> Self::Output {
        rhs | self
    }
}

pub struct TWCR;
impl TWCR {
    pub const TWIE:  TWCRBits = TWCRBits::TWIE;
    pub const TWEN:  TWCRBits = TWCRBits::TWEN;
    pub const TWWC:  TWCRBits = TWCRBits::TWWC;
    pub const TWSTO: TWCRBits = TWCRBits::TWSTO;
    pub const TWSTA: TWCRBits = TWCRBits::TWSTA;
    pub const TWEA:  TWCRBits = TWCRBits::TWEA;
    pub const TWINT: TWCRBits = TWCRBits::TWINT;
}

impl Register<u8> for TWCR {
    const ADDR: *mut u8 = 0xBC as *mut u8;
    type BitType = TWCRBits;
}
If you're like me, you just pulled quite a face looking at that. There's so much boilerplate! And it's going to be the
same for every register. There's a lot of repeated information, namely the register type (u8), and the name of the
bit type (TWCRBits) along with all its variants. It would be tedious and error-prone to write this out for the dozens
of registers needed. Fortunately, it's all very orderly, and, as mentioned, basically the same for every register.
Let's tackle the bit enum first, given it's two thirds of the entire definition. How much information is really needed
to construct this? The type name, the register type, the variant names, and which bits they are. Everything else is based
on that, so if we discard all the other boilerplate, and poke it round a bit to look Rust-like, we end up with this:
TWCRBits: u8 {
    TWIE  = 0,
    TWEN  = 2,
    TWWC  = 3,
    TWSTO = 4,
    TWSTA = 5,
    TWEA  = 6,
    TWINT = 7
}
That's all of the unique information needed. All the rest is boilerplate with copies of that information. Macros are
perfect for this kind of boilerplate creation. I'll call this one reg_named_bits, and start on our token pattern.
Looking at the definition above, the first token is an ident (TWCRBits), follow by a colon, followed by a type (u8),
then a bracket pair. Inside this bracket pair is a sequence of idents (TWIE, etc.), followed by an equals, followed by
an expression (1, etc.), separated by a comma. That's not too complex to define:
#[macro_export]
macro_rules! reg_named_bits {
    ( 
        $name:ident: $type:ty {
            $( $bit:ident = $id:expr ),+ $(,)*
        }
    ) => {
        
    };
}
The little $(,)* is so you can have an optional trailing comma. Now to actually use this information to generate the
boilerplate. One thing we need to be aware of is namespaces. There's no guarantee that traits and types will be imported
to the macro invokation site, so we need to fully qualify the traits and types we're using. The macro body is basically
what we have above, except with the specific bits replaced with the pattern matches:
#[macro_export]
macro_rules! reg_named_bits {
    ( 
        $name:ident : $type:ty {
            $( $bit:ident = $id:expr ),+ $(,)*
        }
    ) => {
        #[derive(Copy, Clone, Eq, PartialEq)]
        pub enum $name {
            $( $bit ),*
        }

        impl crate::register::NamedBits for $name {
            type DataType = $type;
            fn bit_id(self) -> $type {
                match (self) {
                    $( $name::$bit => $id ),*
                }
            }
        }

        impl crate::register::SetValueType<$name> for $name {
            fn as_value(self) -> $type {
                1 << self.bit_id()
            }
        }

        impl core::ops::BitOr for $name {
            type Output = crate::register::BitBuilder<$name>;
            fn bitor(self, rhs: $name) -> Self::Output {
                crate::register::BitBuilder::new() | self | rhs
            }
        }

        impl core::ops::BitOr<crate::register::BitBuilder<$name>> for $name {
            type Output = crate::register::BitBuilder<$name>;
            fn bitor(self, rhs: crate::register::BitBuilder<$name>) -> Self::Output {
                rhs | self
            }
        }
    };
}
Note the very verbose paths to the traits and BitBuilder. Another piece of boilerplate is the associated constants on
the struct. This could be done with the third and final macro, but it would be nice if the registers on the same port
(e.g. PORTB, DDRB, and PINB) could share the same register type, as the bits mean the same thing. So with that in
mind, we'll add another macro. This one will be similar to the first, with an input that looks like this:
TWCR: TWCRBits {
    TWIE,
    TWEN,
    TWWC,
    TWSTO,
    TWSTA,
    TWEA,
    TWINT
}
That one's just ident, colon, ident, bracket, comma-separate ident sequence, bracket. And the output should be an
impl on the struct with the constants:
#[macro_export]
macro_rules! reg_bit_consts {
    (
        $struct_name:ident : $bits_name:ident {
            $( $bit:ident ),+ $(,)*
        }
    ) => {
        impl $struct_name {
            $( pub const $bit: $bits_name = $bits_name::$bit; )*
        }
    }
}
This has shrunk the definitions down a lot, but there's still repeated information about the bit names. In most cases
this could be condensed further by declaring another macro. However, this third macro needs to take into account three
cases:

No bits.
Associating with an existing bits definition.
Fully generating the bits.

All three of these cases share some of the same information: the register name (ident), the register type, and the
register address (expression). So a similar structure to above can be used for this too:
TWCR: u8 {
    addr: 0xBC
}
This is enough for the first case, but what about the other two? The most obvious solution to me was to have
another part of the pattern, which names the bit type:
TWCR: u8 {
    addr: 0xBC,
    bits: TWCRBits
}
That handles the second case. For the third case, this could be extended with the same list from the first macro
defined earlier:
TWCR: u8 {
    addr: 0xBC,
    bits: TWCRBits {
        TWIE  = 0,
        TWEN  = 2,
        TWWC  = 3,
        TWSTO = 4,
        TWSTA = 5,
        TWEA  = 6,
        TWINT = 7
    }
}
So, that's three cases, all with with simple patterns:
#[macro_export]
macro_rules! reg {
    (
        $name:ident : $type:ty {
            addr: $addr:expr $(,)*
        }
    ) => {
        
    };

    (
        $name:ident : $type:ty {
            addr: $addr:expr,
            bits: $bits_name:path $(,)*
        }
    ) => {

    };

    (
        $name:ident: $type:ty {
            addr: $addr:expr,
            bits: $bits_name:path {
                $( $bit:ident = $id:expr ),+ $(,)*
            }
        }
    ) => {

    };
}
One thing to note is that in the second case, the $bits_name is a path, not an ident. This allow the user to
name a type in another module (e.g. crate::registers::NoBits). The first pattern is trivial to implement; we just
forward to the second pattern, specifying that the bits type is the NoBits type we defined earlier. This is why we
needed path support for the second pattern.
(
    $name:ident : $type:ty {
        addr: $addr:expr $(,)*
    }
) => {
    reg! {
        $name: $type {
            addr: $addr,
            bits: crate::register::NoBits<$type>,
        }
    }
};
The second is where the struct and Register implementation live:
(
    $name:ident : $type:ty {
        addr: $addr:expr,
        bits: $bits_name:path $(,)*
    }
) => {
    pub struct $name;
    impl crate::register::Register<$type> for $name {
        const ADDR: *mut $type = $addr as *mut $type;
        type BitType = $bits_name;
    }
};
And finally, the third pattern. This one forwards to the second pattern, as well as calling out to the previous two
macros defined earlier:
(
    $name:ident: $type:ty {
        addr: $addr:expr,
        bits: $bits_name:ident {
            $( $bit:ident = $id:expr ),+ $(,)*
        }
    }
) => {
    reg_named_bits! {
        $bits_name: $type {
            $( $bit = $id ),+
        }
    }

    reg! {
        $name: $type {
            addr: $addr,
            bits: $bits_name,
        }
    }

    reg_bit_consts! {
        $name : $bits_name {
            $( $bit ),+
        }
    }
};
With these three macros, the complete register definition listed earlier is now significantly smaller, and easier to read:
reg! {
    TWCR: u8 {
        addr: 0xBC,
        bits: TWCRBits {
            TWIE  = 0,
            TWEN  = 2,
            TWWC  = 3,
            TWSTO = 4,
            TWSTA = 5,
            TWEA  = 6,
            TWINT = 7
        }
    }
}
Final Example

To demonstrate a usage of it, here's code declaring the registers for the USART, and then using them to send a
string over serial:
const CPU_FREQ: u32 = 16_000_000;
/// The baud rate we'll use for serial.
const BAUD_RATE: u32 = 9600;
/// The calculated value to put into the UBBR register to set the baud rate.
const UBBR_VAL: u16 = ((CPU_FREQ / 8 / BAUD_RATE) - 1) as u16;

reg! {
    UCSR0A: u8 {
        addr: 0xC0,
        bits: UBSR0ABits {
            MPCM0 = 0,
            U2X0 = 1,
            UDRE0 = 5,
        }
    }
}

reg! {
    UCSR0B: u8 {
        addr: 0xC1,
        bits: UBSR0BBits {
            TXEN0 = 3,
            RXEN0 = 4,
        }
    }
}

reg! {
    UCSR0C: u8 {
        addr: 0xC2,
        bits: UCSR0CBits {
            UCSZ00 = 1,
            UCSZ01 = 2
        }
    }
}

reg!{
    UDR0: u8 {
        addr: 0xC6,
    }
}

reg! {
    UBRR0: u16 {
        addr: 0xC4,
    }
}

#[no_mangle]
extern "C" fn main() {
    unsafe {
        // Set our baud rate.
        UBRR0::set_raw_value(UBBR_VAL);

        // Configure for:
        // * 2x speed
        // * 8-bit characters
        // * 1 stop bit
        // * No parity
        // * Async mode,
        // * Enable RX/TX
        UCSR0A::set_value(UCSR0A::U2X0 | UCSR0A::MPCM0);
        UCSR0B::set_value(UCSR0B::RXEN0 | UCSR0B::TXEN0);
        UCSR0C::set_value(UCSR0C::UCSZ01 | UCSR0C::UCSZ00);

        let message = "Hello World!";

        message.bytes().for_each(|b| {
            // Wait for the data register to become available.
            while !UCSR0A::get_bit(UCSR0A::UDRE0) {}

            // Stick the byte into the buffer to send it.
            UDR0::set_raw_value(b);
        });
    }
}