Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save joeharris76/28e838c0eba91013c870302b636f1f27 to your computer and use it in GitHub Desktop.

Select an option

Save joeharris76/28e838c0eba91013c870302b636f1f27 to your computer and use it in GitHub Desktop.
Check Databricks Serverless CPU architecture

Check Databricks Serverless CPU architecture and RAM

It can be insightful to know whether a Databricks Serverless SQL warehouse (or serverless notebook/job compute) is running on ARM (Graviton) or x86, and which generation. This also reports the memory available to the execution sandbox.

Here’s a Python UDF that detects all of this from /proc/cpuinfo and /proc/meminfo.

CREATE OR REPLACE FUNCTION detect_cpu_and_ram()
RETURNS STRING
LANGUAGE PYTHON
AS $$
import platform, os, re, json

def _read(path):
    try:
        with open(path) as f: return f.read()
    except Exception as e:
        return f"<unavailable: {e}>"

arch = platform.machine().lower()
is_arm = 'aarch' in arch or 'arm' in arch
arch_label = (
    "ARM (AArch64)" if is_arm
    else "x86_64" if 'x86' in arch or 'amd64' in arch
    else f"Unknown: {arch}"
)

cpuinfo = _read('/proc/cpuinfo')
model = re.search(r'model name\s*:\s*(.+)', cpuinfo)
vendor = re.search(r'vendor_id\s*:\s*(.+)', cpuinfo)
cpu_impl = re.search(r'CPU implementer\s*:\s*(\S+)', cpuinfo)
cpu_part = re.search(r'CPU part\s*:\s*(\S+)', cpuinfo)
flags_match = re.search(r'(?:Features|flags)\s*:\s*(.+)', cpuinfo)
flags = set(flags_match.group(1).split()) if flags_match else set()

# Graviton fingerprinting
# ARM CPU part codes: Neoverse N1=0xd0c, V1=0xd40, V2=0xd4f
graviton = "n/a"
if is_arm:
    part = cpu_part.group(1).lower() if cpu_part else ""
    if part == "0xd0c" or ('asimd' in flags and 'sve' not in flags):
        graviton = "Graviton 2 (Neoverse N1)"
    elif part == "0xd40" or ('sve' in flags and 'sve2' not in flags):
        graviton = "Graviton 3 (Neoverse V1)"
    elif part == "0xd4f" or 'sve2' in flags:
        graviton = "Graviton 4 (Neoverse V2)"
    else:
        graviton = f"ARM, unknown gen (part={part})"

# x86 fingerprinting
x86_gen = "n/a"
if not is_arm and flags:
    if 'avx512f' in flags:
        x86_gen = "x86 with AVX-512 (Skylake-SP+ / Ice Lake / Sapphire Rapids)"
    elif 'avx2' in flags:
        x86_gen = "x86 with AVX2 (Haswell+)"
    else:
        x86_gen = "x86, pre-AVX2"

meminfo = _read('/proc/meminfo')
mem_total = re.search(r'MemTotal:\s*(\d+)\s*kB', meminfo)
mem_gb = round(int(mem_total.group(1))/1024/1024, 2) if mem_total else None

# Notable flags for benchmarking context
notable = sorted(f for f in flags if f in {
    'sve','sve2','bf16','i8mm','svebf16','svei8mm','sveaes','svesha3',
    'avx2','avx512f','avx512bw','avx512vnni','amx_tile','amx_int8'
})

result = {
    "arch": arch_label,
    "generation": graviton if is_arm else x86_gen,
    "vendor": vendor.group(1).strip() if vendor else None,
    "model": model.group(1).strip() if model else None,
    "cpu_implementer": cpu_impl.group(1) if cpu_impl else None,
    "cpu_part": cpu_part.group(1) if cpu_part else None,
    "cpus": os.cpu_count(),
    "mem_gb": mem_gb,
    "notable_flags": notable,
}
return json.dumps(result)
$$;

Run the UDF:

SELECT detect_cpu_and_ram();

Example answer (formatted):

{
  "arch": "ARM (AArch64)",
  "generation": "Graviton 3 (Neoverse V1)",
  "vendor": null,
  "model": null,
  "cpu_implementer": "0x41",
  "cpu_part": "0xd40",
  "cpus": 8,
  "mem_gb": 30.93,
  "notable_flags": ["bf16", "i8mm", "sve", "svebf16", "svei8mm"]
}

How the generation detection works

The CPU part field is the most reliable ID — flags can shift with kernel/microcode, but the part code is the silicon:

Graviton Core CPU part Distinguishing flag
2 Neoverse N1 0xd0c asimd, no sve
3 Neoverse V1 0xd40 sve, no sve2
4 Neoverse V2 0xd4f sve2

The function checks the part code first and falls back to flag heuristics if it’s missing.

Notebook / job compute version

In a Python cell you can also call out to lscpu for more detail:

import platform, os, subprocess
print(platform.platform(), platform.machine())
print("CPUs:", os.cpu_count())
print(subprocess.run(['lscpu'], capture_output=True, text=True).stdout)
print(subprocess.run(['cat', '/proc/meminfo'], capture_output=True, text=True).stdout[:400])

Caveats

  • MemTotal reflects the container/cgroup view, not the host — it’s what your query actually gets, not the underlying instance size.
  • On some Databricks runtimes the SQL Python UDF sandbox may restrict /proc reads; if cpuinfo is unavailable, run the same logic in a notebook cell.
  • Results vary by region and rollout — worth running across us-east-1, us-west-2, and EU regions if you’re publishing comparisons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment