Skip to content

Instantly share code, notes, and snippets.

@kallebysantos
Last active June 25, 2024 01:55
Show Gist options
  • Save kallebysantos/49b71471b518c01c64e5fb860e096693 to your computer and use it in GitHub Desktop.
Save kallebysantos/49b71471b518c01c64e5fb860e096693 to your computer and use it in GitHub Desktop.
Running Local AI models with FastAPI and Vercel AI SDK
2023-10-20.22-57-29.mp4

AI models with FastAPI Streaming and Vercel AI SDK

Getting started

First of all we need to create our project structure, start by creating a backend and frontend folders:

├── .env
├── backend/
└── frontend/

Setting up our api:

# backend/main.py

import uvicorn

from os import getenv, path
from dotenv import load_dotenv

from fastapi import BackgroundTasks, FastAPI, Request, Response

app_base = path.dirname(__file__)
app_root = path.join(app_base, '../')

load_dotenv(dotenv_path=path.join(app_root, '.env'))

app_host = getenv("APP_HTTP_HOST")
app_port = int(getenv("APP_HTTP_PORT"))

app = FastAPI()

@app.get("/api/reply")
def reply(value: str):
    print(f"reply: {value}")
    return {"reply": value}

if __name__ == "__main__":
    uvicorn.run("main:app", host=app_host, reload=True, port=app_port)

Create a new vite app inside the frontend folder:

yarn create vite --template react-ts

Setting up SPA Integration:

After creating the back and front projects we neew to combine them together, the easiest way is serve our frontend app as static file from our backend. Also we should setup a SPA Proxy for developing propourses.

1. Setting Up environment variables:

Inside the .env file we can place common environment variables for both apps.

APP_ENVIRONMENT='Development'
APP_HTTP_HOST='127.0.0.1'
APP_HTTP_PORT='5000'
APP_HTTP_URL='http://${APP_HTTP_HOST}:${APP_HTTP_PORT}'

APP_SPA_PROXY_PORT='3000'
APP_SPA_PROXY_URL='http://${APP_HTTP_HOST}:${APP_SPA_PROXY_PORT}'
APP_SPA_FOLDER_ROOT='frontend'
APP_SPA_PROXY_LAUNCH_CMD='yarn dev --port ${APP_SPA_PROXY_PORT}'

2. Configuring vite request proxy:

We can now tell vite to proxy all fetch requests to our api endpoint, so that we can to api calls without specify the host server. So our frontend app will behave as hosted by the backend app, even in development environment.

Inside the frontend/vite.config.ts add the following:

import { env } from "node:process";
import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";

const apiProxyTarget = env.APP_HTTP_URL;

export default defineConfig({
  define: {
    "process.env": process.env,
    _WORKLET: false,
    __DEV__: env.DEV,
    global: {},
  },
  plugins: [react()],
  server: {
    strictPort: true,
    proxy: {
      "/api": {
        target: apiProxyTarget,
        changeOrigin: true,
        secure: false,
        rewrite: (path) => path.replace(/^\/api/, "/api"),
      },
    },
  },
});

We can test api calls by adding a fetch request inside our App.tsx:

import "./App.css";
import { useEffect, useState } from "react";

function App() {
  const [apiResponse, setApiResponse] = useState("");

  useEffect(() => {
    fetch("/api/reply?value=Hello from React App!")
      .then((response) => response.json())
      .then((result) => setApiResponse(JSON.stringify(result)));
  }, []);

  return (
    <div>
      <code>{apiResponse}</code>
    </div>
  );
}

export default App;

3. Serving the a SPA app from FastAPI:

In this step we need to setup our backend to serve the react app as static files in production and proxy when is development.

Let's update the backend/main.py

import subprocess
import uvicorn

from os import getenv, path
from dotenv import load_dotenv

from fastapi import FastAPI, Request
from fastapi.responses import RedirectResponse
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles

app_base = path.dirname(__file__)
app_root = path.join(app_base, '../')
app_public = path.join(app_base, "public/")

load_dotenv(dotenv_path=path.join(app_root, '.env'))

app_env = getenv("APP_ENVIRONMENT")
app_host = getenv("APP_HTTP_HOST")
app_port = int(getenv("APP_HTTP_PORT"))
app_spa_folder = path.join(app_root, getenv("APP_SPA_FOLDER_ROOT"))
app_spa_proxy_url = getenv("APP_SPA_PROXY_URL")
app_spa_proxy_launch_cmd = getenv("APP_SPA_PROXY_LAUNCH_CMD")


app = FastAPI()
templates = Jinja2Templates(directory=app_public)
app.mount("/public", StaticFiles(directory=app_public), name="public")


@app.get("/api/reply")
def reply(value: str):
    print(f"reply: {value}")
    return {"reply": value}


@app.get("/{full_path:path}")
async def serve_spa_app(request: Request, full_path: str):
    """Serve the react app
    `full_path` variable is necessary to serve each possible endpoint with
    `index.html` file in order to be compatible with `react-router-dom
    """
    if app_env.lower() == "development":
        return RedirectResponse(app_spa_proxy_url)

    return templates.TemplateResponse("index.html", {"request": request})


if __name__ == "__main__":

    # Launching the SPA proxy server
    if app_env.lower() == "development":
        print("Launching the SPA proxy server...", app_spa_folder)
        spa_process = subprocess.Popen(
            args=app_spa_proxy_launch_cmd.split(" "),
            cwd=app_spa_folder)

    uvicorn.run("main:app", host=app_host, reload=True, port=app_port)

That its, now when we run our backend app with python main.py it will also launch our frontend app in develop mode.

Also will do fallback all non api routes to our SPA app.

Note: In production, we need to publish our frontend's dist files to the public folder inside our backend.

You can setup it inside your Dockerfile during some CI/CD routine.

Streaming AI results from FastAPI:

Now let's add some AI feature to our backend/main.py

# Others Imports...

from pydantic import BaseModel
from langchain.llms.openai import OpenAI
from fastapi.responses import StreamingResponse

app = FastAPI()

llm = OpenAI(
    streaming=True,
    verbose=True,
    temperature=0,
    openai_api_key=getenv("OPENAI_API_KEY")
)

class Question(BaseModel):
    prompt: str

@app.post('/api/ask')
async def ask(question: Question):
    print(question)

    def generator(prompt: str):
        for item in llm.stream(prompt):
            yield item

    return StreamingResponse(
        generator(question.prompt), media_type='text/event-stream')

# More code...

That its, we have an AI Asking endpoint with streaming support.

Now lets integrate with our frontend app

First add the ai package

yarn add ai

Then update the App.tsx:

import "./App.css";
import { useCompletion } from "ai/react";

function App() {

  // Some code here ...

  const { input, completion, handleInputChange, handleSubmit } = useCompletion({
    api: "/api/ask",
    headers: {
      "Content-Type": "application/json",
    },
  });

  return (
    <div>
      <form onSubmit={handleSubmit}>
        <label htmlFor="ask-input">Ask something:</label>
        <input id="ask-input" type="text" value={input} onChange={handleInputChange} />

        <button type="submit">POST</button>
      </form>

      <textarea value={completion} rows={20}></textarea>
    </div>
  );
}

export default App;
import "./App.css";
import { useEffect, useState } from "react";
import { useCompletion } from "ai/react";
function App() {
const [apiResponse, setApiResponse] = useState("");
useEffect(() => {
fetch("/api/reply?value=Hello from React App!")
.then((response) => response.json())
.then((result) => setApiResponse(JSON.stringify(result)));
}, []);
const { input, completion, handleInputChange, handleSubmit } = useCompletion({
api: "/api/ask",
headers: {
"Content-Type": "application/json",
},
});
return (
<div>
<code>{apiResponse}</code>
<form onSubmit={handleSubmit}>
<label htmlFor="ask-input"></label>
<input
id="ask-input"
type="text"
value={input}
onChange={handleInputChange}
/>
<button type="submit">POST</button>
</form>
<textarea value={completion} rows={20}></textarea>
</div>
);
}
export default App;
APP_ENVIRONMENT='Development'
APP_HTTP_HOST='127.0.0.1'
APP_HTTP_PORT='5000'
APP_HTTP_URL='http://${APP_HTTP_HOST}:${APP_HTTP_PORT}'
APP_SPA_PROXY_PORT='3000'
APP_SPA_PROXY_URL='http://${APP_HTTP_HOST}:${APP_SPA_PROXY_PORT}'
APP_SPA_FOLDER_ROOT='frontend'
APP_SPA_PROXY_LAUNCH_CMD='yarn dev --port ${APP_SPA_PROXY_PORT}'
OPENAI_API_KEY=""
import subprocess
from pydantic import BaseModel
import uvicorn
from os import getenv, path
from dotenv import load_dotenv
from fastapi import FastAPI, Request
from fastapi.responses import RedirectResponse, StreamingResponse
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles
from langchain.llms.openai import OpenAI
app_base = path.dirname(__file__)
app_root = path.join(app_base, '../')
app_public = path.join(app_base, "public/")
load_dotenv(dotenv_path=path.join(app_root, '.env'))
app_env = getenv("APP_ENVIRONMENT")
app_host = getenv("APP_HTTP_HOST")
app_port = int(getenv("APP_HTTP_PORT"))
app_spa_folder = path.join(app_root, getenv("APP_SPA_FOLDER_ROOT"))
app_spa_proxy_url = getenv("APP_SPA_PROXY_URL")
app_spa_proxy_launch_cmd = getenv("APP_SPA_PROXY_LAUNCH_CMD")
class Question(BaseModel):
prompt: str
app = FastAPI()
templates = Jinja2Templates(directory=app_public)
app.mount("/public", StaticFiles(directory=app_public), name="public")
llm = OpenAI(
streaming=True,
verbose=True,
temperature=0,
openai_api_key=getenv("OPENAI_API_KEY")
)
@app.post('/api/ask')
async def ask(question: Question):
print(question)
def generator(prompt: str):
for item in llm.stream(prompt):
yield item
return StreamingResponse(
generator(question.prompt), media_type='text/event-stream')
@app.get("/api/reply")
def reply(value: str):
print(f"reply: {value}")
return {"reply": value}
@app.get("/{full_path:path}")
async def serve_spa_app(request: Request, full_path: str):
"""Serve the react app
`full_path` variable is necessary to serve each possible endpoint with
`index.html` file in order to be compatible with `react-router-dom
"""
if app_env.lower() == "development":
return RedirectResponse(app_spa_proxy_url)
return templates.TemplateResponse("index.html", {"request": request})
if __name__ == "__main__":
# Launching the SPA proxy server
if app_env.lower() == "development":
print("Launching the SPA proxy server...", app_spa_folder)
spa_process = subprocess.Popen(
args=app_spa_proxy_launch_cmd.split(" "),
cwd=app_spa_folder)
uvicorn.run("main:app", host=app_host, reload=True, port=app_port)
@untilhamza
Copy link

Hello, thanks for this.

Where did you deploy this? I want to deploy a similar fast API set up on Vercel. However, in my case, my frontend is decoupled from the backend.

@kallebysantos
Copy link
Author

kallebysantos commented Feb 3, 2024

Hello, thanks for this.

Where did you deploy this? I want to deploy a similar fast API set up on Vercel. However, in my case, my frontend is decoupled from the backend.

Hi @untilhamza,
I had deploy it in a VPS, but if want to put it in vercel or any other platform without the frontend you should focus only in the Streaming AI results from FastAPI section. It provides a scaffold api route to Stream AI results that is compatible with vercel ai SDK.
Then you can call this endpoint from your frontend using ai package or even manually using HTTP fetch().

Have a look in danielcorin's example

@untilhamza
Copy link

t

Hello, thanks for this.
Where did you deploy this? I want to deploy a similar fast API set up on Vercel. However, in my case, my frontend is decoupled from the backend.

Hi @untilhamza, I had deploy it in a VPS, but if want to put it in vercel or any other platform without the frontend you should focus only in the Streaming AI results from FastAPI section. It provides a scaffold api route to Stream AI results that is compatible with vercel ai SDK. Then you can call this endpoint from your frontend using ai package or even manually using HTTP fetch().

Have a look in danielcorin's example

Thank you so much @kallebysantos

@LouisMlr
Copy link

LouisMlr commented Feb 29, 2024

Hello, I'm trying to do something similar with fastAPI, Nextjs and Vercel. I don't have any problem when calling get API from the web app but when I'm trying to send request to a post API, I got the following error : 422 Unprocessable Entity.

  • If I simulate a user's question directly in the API link of useChat (or with useCompletion), it works :
const { messages, input, setInput, handleSubmit, isLoading } = useChat({
    api: '/api/ask?question=hello',
    headers: {
      "Content-Type": "application/json",
    },
  });
  • but when I try to send the user's question from the input field of the web app, I got the above error message.
const { messages, input, setInput, handleSubmit, isLoading } = useChat({
    api: '/api/ask',
    headers: {
      "Content-Type": "application/json",
    },
  });

Do you have any idea about the issue ? Thank you in advance

@LouisMlr
Copy link

LouisMlr commented Mar 4, 2024

I finally found the problem, it was coming from the input format of the web application.
I'm now trying to send additional information from the API. For example, I'd like to retrieve the source documents to build a RAG web application. Do you know how to add information other than content and role?

@kallebysantos
Copy link
Author

kallebysantos commented Mar 4, 2024

I finally found the problem, it was coming from the input format of the web application. I'm now trying to send additional information from the API. For example, I'd like to retrieve the source documents to build a RAG web application. Do you know how to add information other than content and role?

You may use the onResponse callback, to retrieve the raw Response object from the Api call.

You could also, set two endpoints, one for searching and other for ai completion. So you should first search by your documents, doing some semantic search and then use the given documents ids to perform an AI completion.

As the completion endpoint expects a response stream, I think that the second approach is better then return all things in a single api call.

@LouisMlr
Copy link

LouisMlr commented Mar 4, 2024

Thanks for your reply! Indeed, the second option seems to be the best.
I also have another idea:

  1. Keep a single POST API that takes the user's question, retrieves the sources and stores them somewhere (like a database or folder) and then sends the LLM-generated answer to the web application.
  2. Set a variable in the web application to store the sources
    const [sources, setSources] = useState("")
  3. Use the onFinish callback to update the source variable with a GET request to the database that just stored the sources.

What do you think?

@kallebysantos
Copy link
Author

Thanks for your reply! Indeed, the second option seems to be the best. I also have another idea:

  1. Keep a single POST API that takes the user's question, retrieves the sources and stores them somewhere (like a database or folder) and then sends the LLM-generated answer to the web application.
  2. Set a variable in the web application to store the sources
    const [sources, setSources] = useState("")
  3. Use the onFinish callback to update the source variable with a GET request to the database that just stored the sources.

What do you think?

It's should work, but I probably do something like that:

  1. Perform a GET request, searching by the sources. Then responding it back while I put the results in some cache data source, like Redis.
    At this point I'll make a Unique identifier that corelates the user with the sources and generate a preassigned url of that.

  2. In the frontend I show the resulted documents and just after that I perform a API call for the preassigned url, this url will point to my completion endpoint, where will use the preassigned values to get back the sources from cache.

Using this approach I can avoid to invoke my completion endpoint with no results are found, and also the client will have his documents earlier, since the AI completion could take so long to finish.

If you're not doing some extra operation over your sources, like highlight the most relevant phrases for the question, you can just store the source Id in your cache. But if you need to keep the whole result, may you should optimize that before store.
You could also relate common questions to same results to avoid deduplication.

@rgsk
Copy link

rgsk commented Jun 21, 2024

this is not working now. Getting Error - Error: Failed to parse stream string. No separator found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment