Skip to content

Instantly share code, notes, and snippets.

@FCLC
Last active January 15, 2023 19:22
Show Gist options
  • Save FCLC/7d75d12e4c368c13e400fda1475da673 to your computer and use it in GitHub Desktop.
Save FCLC/7d75d12e4c368c13e400fda1475da673 to your computer and use it in GitHub Desktop.
What if we built a Jetson Orin AGX micro cluster?

The follow up on M1 Cluster- Jetson Orin AGX

Original M1 Piece here: https://gist.github.com/FCLC/6e0f0e79e9d4f5740573f09d7579eb72

No system exists in a vacuum, and so as a follow up to the M1 Cluster, I thought I’d look at a similar cluster based on another integrated ARM device.

Oracle typically builds a Rpi cluster every few years. Their most recent unit, built using 1060 Rpi 3B+ is an interesting piece of tech. Another is the 750 Pi cluster built by LANL. But Pi clusters seem like the domain of Jeff Geerling and co. so, let’s look at something else. The most popular developer board is the Nvidia Jetson series, and the most powerful unit is the latest Orin AGX 64GB.

Setting the stage

Similarly, to the M1 cluster, a few baselines before I go on:

  • We are going to go through this thought experiment from the point of view of a small laboratory/bootstrap cluster that can only use a single 48U, 42” deep rack.
  • You use the built in 10G baseT ethernet as a BMC of sorts.

Density

There’s this interesting product from a Canadian company called Connext Tech that fits 12 Jetson Xavier AGX modules in 1U. I’m going to suppose that they’ll be following up with a model that supports Orin AGX SoonTM. These 1U servers integrate switching for the built in ethernet and expose the built in PCIe links, which we can use for storage and connecting to network interfaces. Speaking of which, if we’re going to be doing “proper” HPC, we need proper network bandwidth, and using the PCIe 4 x8 interface, we can use a QSFP56 adapter.
12 modules means 12 QSFP56 per U, and we can hit ~25U. Accounting for the need to uplink, 8 * 40 port switches should do the trick. That leaves us with 15U for power, which just barely works out leaving us with 3% peak draw headroom.

Performance

Unfortunately, the 64GB built in EMMC isn’t going to cut it. Adding an 8TB Sabrent NVME SSD deals with it.

GPU power is in the ~5.3 FP32 Tflops per module, FP16 in the 95 Tflop range.

64 GBs per machine shared between CPU and GPU.

For our 25U of 12 module clusters:
1.6 Pflops of FP32
28.5 Pflops of FP16

Power

Between 300 modules, network adapters, NVME drives and 8 switches, I calculated 26.2 KW.

As I did in the original Mac piece, I chose Eaton 9PX6K UPSs. Each is 5.4 KW/6 KVA and 3U, leaving us a 3% power buffer.
Screen Shot 2023-01-14 at 5 35 13 PM

Parts

Component selection:

  • Nvidia Jetson Orin AGX 64GB * 300
  • MQM8700-HS2F, 40-Port 200G QSFP56 switch's * 8
  • QSFP56 direct attach copper cables * 300
  • Mellanox Connectx6 PCIe to QSFP56 * 300
  • Sabrent Rocket 8TB * 300
  • Eaton 9PX6K UPS’s * 5

Cost

A few assumptions, in line with the market.

  • 1k unit pricing

  • You can buy everything at near street prices.

  • You are not adding 300 PiKVMs

  • You buy +2 of everything for hot swap.

  • Total taxes are 10%

    Screen Shot 2023-01-14 at 5 35 38 PM

Total after taxes: $1.49M

Caveats

  • I did not include some sort of large storage array, nor the cost for cables/optics to get data to and from this rack.
  • Redundancy is not particularly great.
  • No ECC beyond what LPDDR5 already provides.

Conclusion

Screen Shot 2023-01-14 at 5 35 53 PM

Once again it is an interesting thought exercise.

Here’s a spread sheet with all the juicy data: https://www.icloud.com/numbers/05fF59zXuDyHbYG2L4Fzuz8Lg#Cursed_cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment