Skip to content

Instantly share code, notes, and snippets.

We can't make this file beautiful and searchable because it's too large.
timestamp,cluster_id,source_node,dest_node,latency_ms,packet_loss_percent,bandwidth_gbps
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-01,akgpu-lab1-001-node-02,1.283,0.1174,8.53
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-02,akgpu-lab1-001-node-03,1.534,0.0426,8.61
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-03,akgpu-lab1-001-node-04,0.778,0.0579,9.4
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-04,akgpu-lab1-001-node-05,1.305,0.116,9.22
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-05,akgpu-lab1-001-node-06,1.049,0.0678,9.38
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-06,akgpu-lab1-001-node-07,1.076,0.2336,9.01
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-07,akgpu-lab1-001-node-08,1.227,0.3968,9.6
2025-10-20T22:58:33.659198,akgpu-lab1-002,akgpu-lab1-002-node-01,akgpu-lab1-002-node-02,1.06,0.2583,9.21
2025-10-20T22:58:33.659198,akgpu-lab1-002,akgpu-lab1-002-node-02,akgpu-lab1-002-node-03,1.3
We can't make this file beautiful and searchable because it's too large.
timestamp,cluster_id,node_id,hardware_type,gpu_utilization,gpu_temperature,gpu_memory_used,gpu_memory_total,power_draw
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-01,NVIDIA-A100-40GB,65.63,68.09,73.02,80.0,334.29
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-02,NVIDIA-A100-40GB,72.64,64.1,50.7,80.0,341.39
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-03,NVIDIA-A100-40GB,78.46,71.96,49.65,80.0,303.84
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-04,NVIDIA-A100-40GB,73.96,71.41,55.34,80.0,348.22
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-05,NVIDIA-A100-40GB,67.32,71.55,45.21,80.0,253.89
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-06,NVIDIA-A100-40GB,74.01,76.17,64.52,80.0,313.85
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-07,NVIDIA-A100-40GB,79.48,72.7,69.08,80.0,302.41
2025-10-20T22:58:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-08,NVIDIA-A100-40GB,72.87,75.27,48.18,80.0,274.84
2025-10-20T2
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 7.
timestamp,cluster_id,node_id,event_type,severity,message,details
2025-10-20T22:58:33.659198,akgpu-lab2-001,akgpu-lab2-001-node-04,GPU_TEMP_WARNING,WARNING,GPU temperature high: 95.0C,"{""threshold"": 85, ""actual"": 94.99495528645227}"
2025-10-20T22:59:03.659198,akgpu-lab2-001,akgpu-lab2-001-node-03,PKT_LOSS_SPIKE,WARNING,Packet loss spike: 1.19%,"{""packets_dropped"": 1193}"
2025-10-20T22:59:33.659198,akgpu-lab1-001,akgpu-lab1-001-node-08,GPU_TEMP_WARNING,WARNING,GPU temperature high: 88.9C,"{""threshold"": 85, ""actual"": 88.91242098672411}"
2025-10-20T22:59:33.659198,akgpu-lab2-003,akgpu-lab2-003-node-03,GPU_TEMP_WARNING,WARNING,GPU temperature high: 91.4C,"{""threshold"": 85, ""actual"": 91.35443687680055}"
2025-10-20T22:59:33.659198,akgpu-lab3-002,akgpu-lab3-002-node-01,GPU_TEMP_WARNING,WARNING,GPU temperature high: 93.7C,"{""threshold"": 85, ""actual"": 93.7231912349286}"
2025-10-20T22:59:33.659198,akgpu-lab3-002,akgpu-lab3-002-node-04,GPU_TEMP_WARNING,WARNING,GPU temperature high: 87.7C,"{""threshold""
@arkenig
arkenig / cluster_info.csv
Created October 27, 2025 21:32
AKGPU Telemetry Data
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 10.
cluster_id,lab_name,location,latitude,longitude,cuda_version,hardware_type,num_nodes,created_at
akgpu-lab1-001,Seattle-Lab-A,"Seattle, USA",47.6062,-122.3321,12.1,NVIDIA-A100-40GB,8,2025-06-04T22:58:33.659198
akgpu-lab1-002,Seattle-Lab-A,"Seattle, USA",47.6062,-122.3321,12.1,NVIDIA-A100-40GB,8,2024-10-24T22:58:33.659198
akgpu-lab1-003,Seattle-Lab-A,"Seattle, USA",47.6062,-122.3321,12.1,NVIDIA-A100-40GB,8,2025-09-04T22:58:33.659198
akgpu-lab2-001,Tokyo-Lab-B,"Tokyo, Japan",35.6762,139.6503,12.2,NVIDIA-H100-80GB,8,2025-07-25T22:58:33.659198
akgpu-lab2-002,Tokyo-Lab-B,"Tokyo, Japan",35.6762,139.6503,11.8,NVIDIA-H100-80GB,8,2025-03-04T22:58:33.659198
akgpu-lab2-003,Tokyo-Lab-B,"Tokyo, Japan",35.6762,139.6503,11.8,NVIDIA-H100-80GB,8,2025-06-03T22:58:33.659198
akgpu-lab3-001,Dublin-Lab-C,"Dublin, Ireland",53.3498,-6.2603,12.1,NVIDIA-A100-40GB,8,2025-02-16T22:58:33.659198
akgpu-lab3-002,Dublin-Lab-C,"Dublin, Ireland",53.3498,-6.2603,12.1,NVIDIA-H100-80GB,8,2025-07-08T22:58:33.659198
akgpu-lab3-003,Dublin-Lab-C,"Dubl