GenAI is a platform from ABInbev, a global alcoholic beverages company with a presence in South America, Central America, and Europe. The platform under construction aims to create a Retrieval-augmented Generation (RAG) utilized for internal purposes such as training, daily sales insights, etc.
Initially, there are 2 agents:
- RAG Connector
- Orchestrator
RAG has several responsibilities. It is responsible for sending PDF files, reading these files, generating chunks that will be used to create questions/answers for training a Language Model (LLM). It also generates metrics for files referred to as "Groundtruth." These benchmarks are created to measure heuristics for each question/answer provided by the Orchestrator. In the future, through event sourcing, it aims to measure possible AI hallucination.
-
Ingest a PDF document, such as documentation about Python, defining the context and sending it to the Connector.
-
The document is divided into chunks, and metadata is stored in a Postgres database.
-
Add to the API (FastAPI) the Background_task, which handles chunk splitting and sends them to Table Storage, interacting with Cognitive Search (Azure environment) to generate 5 questions and 5 answers.
-
Trigger the Orchestrator for the learning process.
-
Ingest TXT/CSV documents in the next step, containing the same context. Questions and answers are generated by humans (initially) and synthetically. Ingestion involves storing the raw document in a blob storage, as sent by the user, and then going to the background task.
-
The API receives the document, splits questions and answers, and stores them as JSON in Postgres. Each document has a unique ID, a "data" column representing [questions, answers, answers from the orchestrator, distance between sent and received answers by the orchestrator, embeddings generated for the sent answer, and embedding generated by the orchestrator's answer].
-
After completion, metrics are generated for the document using Bert Score, which is currently disabled due to excessive memory consumption.
I cannot provide much context on this as it is a service used in C#, involving another team. It is essentially a solution utilizing Semantic Kernel to enable the creation of multiple connectors, each with its own history. The Orchestrator is responsible for generating responses, context routing, fine-tuning, etc.
This part is causing internal friction due to ongoing discussions about the implementation of context routing, etc.
Essentially, when submitting one or multiple documents, whether PDF or CSV/TXT, the API returns the status indicated in the controller and starts running the Background Task. While the background task is running, no other requests can be made to the API until the request is completed.
Added more workers to Uvicorn.
We do not know how long scaling will be necessary, and it may be a problem for the future.
/bin/bash,-c,gunicorn -k geventwebsocket.gunicorn.workers.GeventWebSocketWorker -b 0.0.0.0:5000 --workers=5 chat.app:app
This was one of the solutions implemented in another part of the project facing similar issues, but it is not clear how bringing in Guincorn would definitively solve this problem. It might be more of a temporary fix if it works.
- Python 3.9
- FastAPI
- Sklearn for question/answer metrics
- Scipy for metrics
- Bert Score to define benchmark scores for documents
In the upcoming sprints, we will implement OCR, for instance, which will be responsible for reading documents with images to insert into learning and have synthetic QA Groundtruths automatically inserted to stress-test the LLM further.