Skip to content

Instantly share code, notes, and snippets.

View thakkarparth007's full-sized avatar

Parth Thakkar thakkarparth007

View GitHub Profile
@thakkarparth007
thakkarparth007 / sdpa_error_repro.py
Last active April 4, 2023 03:16
Simple repro for an error with pytorch's SDPA that happens in very specific settings.
# Torch version: 2.1.0.dev20230403+cu117
# Cuda: 11.7
# Issue summary:
# Python's SDPA function is a means to use flash attention. This function doesn't work on sm_86 under some scenarios:
# - if we use bs=1, there's no issue (for most sequence lengths. Found it erroring for seq len 3 though)
# - if we use bs>1, then the module throws an error, during loss.backward()
# - these both happen when head_dim > 64. In this repro, we're using codegen-2B, which has head_dim=80.
#
# See this for error log: https://pastebin.com/t2Xdyb0d
#

Making Ray work with Firewalled servers

Assumptions:

  1. Head node is firewalled but worker nodes are not.
  2. Nodes can ssh into each other.

Steps:

On the head node, run the following commands:

  1. ray stop
0x9d58bAe70c30213A275791Fe3bFf4f3940Bf57E7