Skip to content

Instantly share code, notes, and snippets.

Last active May 17, 2024 13:42
Show Gist options
  • Save fbarretto/e61dbf5bb25fd5e98975a42fc7dd5295 to your computer and use it in GitHub Desktop.
Save fbarretto/e61dbf5bb25fd5e98975a42fc7dd5295 to your computer and use it in GitHub Desktop.
StreamDiffusion on a Mac

This is a gist on how to get StreamDiffusion running on a Mac (mps)

  1. Clone the repo

git clone
  1. Setup the environment

cd StreamDiffusion
python -m venv venv (or python3)
source venv/bin/activate
  1. Install dependencies and StreamDiffusion

pip install --upgrade pip

pip install --pre torch torchvision --extra-index-url

pip install .          
  1. Edit to support non cuda

  • Go to venv/lib/python3.11/site-packages/streamdiffusion/

Line 439, replace the call function by this:

    # condition hack event sync/track for non-cuda devices, RIP profiling etc
    def __call__(
        self, x: Union[torch.Tensor, PIL.Image.Image, np.ndarray] = None
    ) -> torch.Tensor:
        if self.device == "cuda":
            start = torch.cuda.Event(enable_timing=True)
            end = torch.cuda.Event(enable_timing=True)
        if x is not None:
            x = self.image_processor.preprocess(x, self.height, self.width).to(
                device=self.device, dtype=self.dtype
            if self.similar_image_filter:
                x = self.similar_filter(x)
                if x is None:
                    return self.prev_image_result
            x_t_latent = self.encode_image(x)
            # TODO: check the dimension of x_t_latent
            x_t_latent = torch.randn((1, 4, self.latent_height, self.latent_width)).to(
                device=self.device, dtype=self.dtype
        x_0_pred_out = self.predict_x0_batch(x_t_latent)
        x_output = self.decode_image(x_0_pred_out).detach().clone()

        self.prev_image_result = x_output
        if self.device == "cuda":
            inference_time = start.elapsed_time(end) / 1000
            self.inference_time_ema = 0.9 * self.inference_time_ema + 0.1 * inference_time
        return x_output
  1. Run the demos


Make sure you have node installed and npm or pnpn in your path.

  • Edit to set
    torch.device("cuda" if torch.cuda.is_available() else "mps")
    acceleration: Literal["none", "xformers", "tensorrt"] = "none"
  • Run
cd demo/realtime-txt2img
pip install -r requirements.txt


cd demo/realtime-txt2img
pip install -r requirements.txt
cd frontend
pnpm i
pnpm run build
cd ..


  • Edit (line 161)
device = torch.device("cuda" if torch.cuda.is_available() else "mps")
  • Run the demo
cd demo/realtime-img2img
pip install -r requirements.txt
  1. References

[1] cumulo-autumn/StreamDiffusion#34

[2] cumulo-autumn/StreamDiffusion#125


[4] cumulo-autumn/StreamDiffusion#134

Copy link

fbarretto commented May 17, 2024

Thanks @fbarretto for this guide - it worked on my m1 air! For Realtime img2img - can you please correct the last code snippet for "Run the demo" to cd demo/realtime-img2img currently it says realtime-txt2img. And another thing for users from my own test, when using img2img demo, please use localhost:port instead of as the latter is causing an error in bringing up the webcam!

Thanks for the feedback! I've updated the img2img command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment