(Note: This was originally from my blog post on AlexanderRobotics.com, but I brought that site down and wanted to keep this post alive since it's referenced by Stack Overflow).
The beauty of the Arduino lies in its simplicity. A look under the covers of the Arduino and its IDE, however, reveals a fascinating chipset and an open, extensive toolchain.
The MCU on the Arduino is an 8-bit RISC chip called the AVR. I'll get to the Arduino component of the chip shortly (hint: the bootloader plays a big role), but first, let's focus on what happens from programming on a computer to executing the program on the AVR. While there's an 8-bit AVR and 32-bit AVR variety, we'll assume 8-bit for this post.
One reason for the popularity of the AVR is the free (as in beer and speech) toolchain. At the second highest level of this toolchain is the compiler: AVR-GCC. AVR-GCC converts C code into assembly language files. Although AVR-GCC compiles C, it's basic C, which means referencing registers by memory address, lack of floating-point support, and so forth.
Luckily AVR-GCC is only the second highest level; at the top of the toolchain is the AVR Libc. This is the C library all AVR programs use. The library takes care of naming all the registers, supporting floating-point, adding helpful AVR-macros, and plenty more. The AVR Libc library can be included with:
#include <avr/io.h>
After writing C code that references the AVR Libc library, AVR-GCC compiles the code into individual assembly files, which the Assembler turns into object files (files in a format runnable by the AVR), the linker then combines the multiple object files into a single object file that, finally, is converted into a HEX file. Whoa, let's take a breather.
...and breather over. The resulting file after this compile process is an Intel formatted HEX file. This HEX file is the executable instructions and, once uploaded to the AVR, will be processed as is.
How do we upload the HEX file to the AVR? An 8-bit AVR by itself cannot communicate over the USB protocol. An In-System Programmer connects from the computer (via USB or Serial port) to the AVR. When starting the upload, the programmer activates the AVR's reset pin, which puts the microcontroller in Programming mode. Once in Programming mode, the AVR can accept the instructions from the programmer, which allows for writing the file to the AVR's flash memory.
The final piece of the AVR toolchain is AVRDUDE, a command line utility that sends the compiled code to the In-System Programmer or Development board for execution.
The AVR has up to 2KB of flash memory dedicated to a bootloader, a special region of memory that always executes after a reset. A bootloader combined with the AVR's ability to write to its own flash memory allows the AVR to program itself over serial. Yep, a bootloader negates the need for the In-System Programmers and Development boards, requiring only a simple serial connection to the computer to program it. The bootloader can perform this serial programming by listening to the Tx/Rx (transmit and receive) lines on the AVR and write the instructions it receives to memory.
The Arduino comes with such a bootloader burned onto the AVR chip. See, even though a bootloader allows for programming via a serial connection, the bootloader itself needs to be programmed into the board initially using an In-System Programmer.
While any serial connection should work with the AVR chip and the bootloader, most Arduinos contain a USB interface. USB's protocol is different than the simple serial protocol the AVR expects. Instead of doing the conversion on the AVR, a separate chip converts from USB to serial. On boards like the Duemilanove and earlier, this was an FTDI chip. On boards starting with the Uno, an Atmega8U2 is used to convert USB to serial. This 8U2 is more capable than the FTDI chip, allowing for the Arduino to broadcast itself as other devices, like a USB keyboard.
The Arduino IDE actually uses the same AVR toolchain described earlier. The IDE hides the complexity and presents a simple interface to the user.
But guess what? Any plain old text editor or IDE can be used to program the Arduino. Since the brain of the Arduino is simply the AVR chip with a booatloader, the exact same toolchain described in the AVR section can be used. AVRDUDE even comes with presets for dealing with the Arduino bootloader.
In addition to wrapping up the toolcahin, the Arduino IDE also includes useful C libraries that build on top of AVR Libc. The libraries included with each program can be found in the hardware/cores/arduino directory of the IDE.
This is the kind of detail I like to read - thanks!