Skip to content

Instantly share code, notes, and snippets.

@Christopher-Hayes
Last active February 16, 2024 18:37
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Christopher-Hayes/684ab3a73e0e8945384d4742e6547693 to your computer and use it in GitHub Desktop.
Save Christopher-Hayes/684ab3a73e0e8945384d4742e6547693 to your computer and use it in GitHub Desktop.
Building a Serverless SlackBot with Bolt on Vercel - Things to Know

Some gotchas from my recent experience of building a serverless Next.JS + Bolt.JS Slack App on Vercel.

Note that if you're building an app that you want to distribute to other workspaces, AFAIK you need to build an API. So, Next.JS is used here to help with the public API. The alternative to an API is using "socket mode".

Slack API with Bolt must use /slack/events endpoint

  • When building out the API, Bolt ONLY uses the /slack/events endpoint. The Slack config settings will suggest you provide a different endpoint, like /slack/commands for Slash Commands. That would work if you weren't using the Node API (via Bolt), such as the Python API. However, Bolt uses the Node API which ONLY uses /slack/events for everything. You can still use Bolt functions app.command() and similar, just remember to put the /slack/events endpoint in the Slack config.

Serverless is not supported with Bolt

  • Severless is not officially supported by Slack with the Bolt API. It is possible if you take a look at this Vercel serverless Next.JS + Bolt app. However, beware that the project is only good for an app that always responds IMMEDIATELY. If your app uses any 3rd party endpoints or does anything that take a second or two, Slack will throw an "operation_timeout" error.
  • Serverless with Bolt puts you in a bind
    • If you respond immediately to an event (to avoid Slack timing out after 3 seconds), any time-consuming code you have will get prematurely terminated by the serverless function ending. This happens because Next + Bolt sees that the endpoint gave a response, thinks the work is done, and terminates anything you have running in the callback. Also, note that Bolt with processBeforeResponse: true will purposefully delay the ack(), until the entire callback is done. On the one hand, this is good to make sure your function does not early terminate, on the other hand, ack() may not get sent within the 3 second Slack timeout period.
    • If you avoid responding immediately to give your serverless function time to finish, after 3 seconds Slack will timeout and send the user an error. Strangely, with Vercel in this case you might be able to still run everything you wanted to in the callback and post a new message, but Slack will also show the user an error all the same.
  • Fixes for the serverless issue above (with long-running tasks)
    • Send time-consuming work to a queue'ing system (ie AWS SQS), but that's adding a lot of complexity with Bolt, a framework that was supposed to make things simple!
      • I ultimately ended up using a variation of this method as described at the bottom of this document. Send the long-running job to a separate Next.JS endpoint.
    • Ditch Slack's Bolt framework. Slack has no plans to fix this long-running task issue with serverless Bolt. However, Slack's Python API does have a feature that makes serverless with long tasks actually work.
      • Alternatively - Vercel is right now working on an example Slack App that runs on Vercel serverlessly, and has all the required auth functions that Bolt has.

Helpful Links

StackOverflow - "How to avoid slack command timeout error?"

StackOverflow - ack() does not send immediately, waits for entire workflow to finish before sending

GitHub Issues - Preventing AWS Lambdas from self-terminating when an ack() is sent

Helpful Example Projects

Vercel SlackBot WITHOUT using Bolt. Beware that this is a simple example, and does not do Slack Install (OAuth) for you, Bolt would be able to handle this automatically. Vercel says they're working on another example Slack app that would also do Oauth for you.

Modified Bolt.JS for web frameworks Not sure how well this works with serverless. But, it seems to be built to better work with Next.JS and similar. Built by a guy who works on Bolt.

Next.JS + Bolt boilerplate I linked this further above. The Next.JS seems to work nicely. It works with serverless only if your app responds instantly to events. Note that this is slightly different from the "Modified Bolt.JS" project, in that it does use a "custom receiver". The modified Bolt.JS project purposefully avoids a custom receiver.

The (Very) Hacky Way I Did It

I did manage to get long-running serverless tasks to work using the "Next.JS + Bolt boilerplate" linked above. The approach is basically - create a separate job to allow the function to return quickly, do this with a Next.JS endpoint to simplify infrastructure.

  • My project is in Vercel world, while this does use AWS Lambda behind the scenes, I was not interested in setting up extra AWS infrastructure to do AWS SQS jobs.
  • The work-around in Vercel was to create another Next.JS endpoint which would run a separate Vercel function. This separate "worker" function would still use Bolt, but only for sending events to Slack, not listening to events.
  • The initial API function would send a network request to this second "worker" function with all the data from Slack about the event. It does NOT await the axios.post, this is because awaiting the post would mean waiting for the entire "worker" function to finish, defeating the whole point. Instead it "fires and forgets" about the function.
    • Something important to note - the axios.post request getting sent can actually get interrupted by the Lambda terminating before the request was sent. So, the hacky fix is to use await new Promise(resolve => setTimeout(resolve, 500)) after axios.post to ensure the request is sent off.
  • For the "worker" function I used res.end() to end the function. Don't use res.status(200), it would just hang since the initial function was already terminated and the worker function would end up timing out after 30 or 60 seconds.
  • Don't forget that this worker function is also a public endpoint, so validation should be treated the same for both endpoints.
    • In the example below I crudely used an arbitrary INTERNAL_WORKER_TOKEN to only accept requests coming from the internal function. There's probably a more robust way to do this.

Sample Code for the initial /api/slack/events function

A portion of the file at /pages/api/[[...route]].ts

// Slack Slash Command for /command-a
app.command('/command-a', async ({ ack, body, context, say }) => {
  // Let the user know we're working on it
  const workingOnItMessage = await say({
    text: `:building_construction: Working on this long running task.`
  })

  // Run a post request to /api/worker with the arguments as a JSON string
  axios.post(
    'https://your-app-url-here.vercel.app/api/worker',
    {
      command: '/command-a',
      body,
      context,
      workingOnItMessage,
      internalWorkerToken: process.env.INTERNAL_WORKER_TOKEN,
    },
    {
      headers: {
        'Content-Type': 'application/json',
      },
    }
  );

  ack()
  // HACK - Ensure that the axios.post request gets sent out
  await new Promise(resolve => setTimeout(resolve, 500))
})

Sample Code for the worker /api/worker function

A portion of the (same) file at /pages/api/[[...route]].ts

router.post('/api/worker', async (req: NextApiRequest, res: NextApiResponse) => {
  if (req.method === 'POST') {
    // Check that the request is coming from an internal serverless function
    if (!req.body.internalWorkerToken || req.body.internalWorkerToken !== process.env.INTERNAL_WORKER_TOKEN) {
      return res.end()
    }
    
    let command: string = req.body.command
    let workingOnItMessage: any = req.body.workingOnItMessage
    let slackReqBody: any = req.body.body
    let context: any = req.body.context

    if (command === '/command-a') {
      await runCommandA({ body: slackReqBody, context, workingOnItMessage })
    }

    // Force this worker function to terminate now
    res.end()
  }
}

Installation Store Database

For the installation store database, Upstash seemed like the quickest and easiest to set up with Vercel. It has a Vercel integration that worked nicely. The free plan looks perfect for small slack apps, 10K commands a day. Paired that with ioredis for the fetchInstallation, storeInstallation, and deleteInstallation functions. I did run into the issue that while the serverless function fetches the installation on startup without it being a timeout issue, if you want to fetch the installation it would re-run the network request to Upstash, which pushed the app to start having Slack timeout issues. The crude solution was to locally store the installation on the serverless function. So, from Bolt auto-running fetchInstallation to an event handler like app.command running, you would still have the installation object handy without re-running a network request to Upstash.

A more reliable way to do jobs in Vercel

Using a Next.JS endpoint isn't the most robust way to run a job. ServerlessQ looks pretty nice for serverless with a good Vercel integration. AWS SQS looks like overkill, so if I transition away from a Next.JS endpoint, ServerlessQ is the way I'm leaning.

@Christopher-Hayes
Copy link
Author

Christopher-Hayes commented Aug 12, 2022

Additional notes

Having used this method for a little while now. The job doesn't always trigger, I'm pretty sure this is just due to the /api function ending before the request was sent to the job. So, the setTimeout waiting period likely needs to be increased.

I'm personally just going to skip straight to using a queue'ing service since that code is already delicate timing-wise. But, if this is not an issue - it does work 80% of the time, particularly when it is in regular use.

Also, the ack() + setTimeout() lines at the end of /api/slack/events can probably be combined into an await Promise.all(). This guarantees that the ack() runs, potentially reducing the instances of the Slack timeout message showing up.

Curious if there's a "smarter" way to check the status axios.post and end the function once the network request was sent (before receiving a response).

@jpvalery
Copy link

jpvalery commented Jan 5, 2023

Hi Christopher,

Thanks for assembling this!

The initial API function would send a network request to this second "worker" function with all the data from Slack about the event. It does NOT await the axios.post, this is because awaiting the post would mean waiting for the entire "worker" function to finish, defeating the whole point. Instead it "fires and forgets" about the function.

Something important to note - the axios.post request getting sent can actually get interrupted by the Lambda terminating before the request was sent. So, the hacky fix is to use await new Promise(resolve => setTimeout(resolve, 500)) after axios.post to ensure the request is sent off.

This is what I did with another low-code tool (Slack sends to a /incoming endpoint that pass it over untouched to a /worker endpoint).

However, I'm currently migrating this to nextjs and despite using the same structure:

fetch(`${process.env.WWW}slack/create/worker/`, {
    method: "POST",
    body: req.body,
  });

 await new Promise((r) => setTimeout(r, 1500));

it doesn't seem that the /worker endpoint is receiving it (and therefore processing it).

Have you run into this?

@Christopher-Hayes
Copy link
Author

Christopher-Hayes commented Jan 6, 2023

Hi @jpvalery,
The worker not seeing the request usually means your function is getting terminated too early. I'd say play with where you're calling ack(). ack() sends a response back to Slack from your endpoint. Slack examples have ack() be the first thing your code runs to ensure Slack doesn't time out (the timeout being Slack thinks your app is offline). However, in serverless, that response tells Vercel (or whatever platform) to terminate your function. With that said, I was putting ack() near the end but before the timeout because Vercel wouldn't kill the function instantly, so it would let me respond to Slack sooner but usually still have enough time to fire off the worker request.
This is all to say, try putting ack() after your timeout, in case ack() is making your function terminate before the worker request has been sent. I think in that situation, you can set timeout as long as you want, and your constraining factor should be responding to Slack in time to avoid a timeout.
With serverless, each platform has their own quirks and limits. So, there could be a really obscure reason your worker request is not going through. Usually these limits are related to runtime though, so if you're responding to slack within 3 seconds, then platform runtime limit is probably not an issue.
I would also say try putting an await on your worker fetch request, just to be sure that it runs at all. If it's not running in that case then you have a networking issue. Otherwise, I can only think of ack() and the function returning being the 2 ways the function can early terminate (ignoring runtime limits). Unless there's some other slack callback that manages to send a response back to slack that I'm not familiar with.

@jpvalery
Copy link

jpvalery commented Jan 9, 2023

@Christopher-Hayes I actually got this to work thanks to Delba @ Vercel's guidance:
https://gist.github.com/jpvalery/643de66ea7fa6755f29a8fca63bc09a4

@Christopher-Hayes
Copy link
Author

@AlexIsMaking
Copy link

AlexIsMaking commented Jun 1, 2023

Hi @Christopher-Hayes thank you for writing about this topic in so much detail, your posts have been a great reference point while working on this. Someone's just made me aware of Vercel's waitUntil() method and I thought I'd see whether you've tried it? It sounds like it should enable you to use ack() but keep the function running.

That's an Edge - rather than Serverless function - though and I'm not sure whether using Edge would be suitable here, as it doesn't allow access to Node.js APIs...

@Christopher-Hayes
Copy link
Author

@AlexIsMaking that's pretty cool, didn't know the edge API had that. I didn't think the edge api could be used for longer operations, but the docs mention you have up to 30 seconds to send a response. I haven't tried it, but it makes sense for the ack() use-case. Should be possible since you still can use fetch(), whether the code I used would still work would depend on what APIs are used by the libraries I relied on.

@AlexIsMaking
Copy link

Ok that sounds promising, I'm in the process of migrating my project to Firebase right now but I'll probably try it in the future.

@Christopher-Hayes
Copy link
Author

Christopher-Hayes commented Aug 17, 2023

An update - I just noticed Vercel has an official blog post / example project on this now, seems pretty capable. I trust Vercel has optimized it for performance. It doesn't use Slack's Bolt framework, but that's kind of the situation we're in with trying use the Slack API in a node+serverless environment.

This gist mentions that Vercel's example doesn't handle OAuth for you, the example below appears to be the "new example" Vercel was working on that would now handle OAuth. They have some custom code to handle it.

https://upstash.com/blog/vercel-note-taker-slackbot

@AlexIsMaking
Copy link

AlexIsMaking commented Aug 17, 2023

Great, thanks for sharing that.

Another option that I've discovered recently - Inngest lets you send a response from Vercel immediately and then process the request in the background - https://www.inngest.com/docs/guides/enqueueing-future-jobs. I've only just started using it but it's looking like a must-have tool when working in serverless environments in general.

@Christopher-Hayes
Copy link
Author

Great, thanks for sharing that.

Another option that I've discovered recently - Inngest lets you send a response from Vercel immediately and then process the request in the background - https://www.inngest.com/docs/guides/enqueueing-future-jobs. I've only just started using it but it's looking like a must-have tool when working in serverless environments in general.

Very cool, thanks! Wasn't aware of them for bg jobs.

@enesakar
Copy link

enesakar commented Jan 8, 2024

qstash would be another option: https://upstash.com/docs/qstash/overall/getstarted

@leerob
Copy link

leerob commented Feb 16, 2024

If you don't need Bolt specifically, we recently built a Slackbot using their Rest API https://vercel.com/templates/other/openai-gpt-slackbot-vercel-functions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment