Skip to content

Instantly share code, notes, and snippets.

@Donavan
Last active November 25, 2023 00:40
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save Donavan/5f92837bfec473ed0f17265db0a0c522 to your computer and use it in GitHub Desktop.
Save Donavan/5f92837bfec473ed0f17265db0a0c522 to your computer and use it in GitHub Desktop.
Self-Directed Q&A Over Documents

Self-Directed Q&A Over Documents

In the expanding universe of machine learning, the task of accurately answering questions based on a corpus of proprietary documents presents an exciting yet challenging frontier. At the intersection of natural language processing and information retrieval, the quest for efficient and accurate "Q&A over documents" systems is a pursuit that drives many developers and data scientists.

While large language models (LLMs) such as GPT have greatly advanced the field, there are still hurdles to overcome. One such challenge is identifying and retrieving the most relevant documents based on user queries. User questions can be tricky; they're often not well-formed and can cause our neatly designed systems to stumble.

In this blog post, we'll first delve into the intricacies of this challenge and then explain a simple yet innovative solution that leverages the new function calling capabilities baked into the chat completion API for GPT. This approach aims to streamline the retrieval process, making it more robust and effective, regardless of how users phrase their questions.

The Challenges of Finding Relevant Documents Based on User Input

In the realm of building Q&A processes over documents, we, as developers, often fall into a common trap. We design and test our systems using well-formed questions, often tailor-made to our understanding of the problem. The system performs admirably, returning precise and accurate answers, which in turn, prompts a sense of accomplishment. However, this bubble of success is often burst when real users start engaging with the system.

Real-world user inquiries are rarely as neat and structured as our test queries. They come in various shapes and forms, often unstructured and ambiguous, causing the document retrieval process to falter. It's at this juncture we realize the complexity and diversity of human language and its potential impact on the performance of our Q&A systems.

To improve document retrieval, techniques like query expansion, named entity resolution, and intent analysis are often employed. Query expansion enables the system to understand a broader range of user inputs by considering synonyms, acronyms, or other related terms. Named entity resolution can identify and categorize specific entities in the user's input, such as people, places, or organizations, making it easier for the system to retrieve relevant information. Intent analysis, on the other hand, tries to understand what the user is actually asking, which is especially useful when the inquiry is implicit or unclear.

Despite these efforts, perfecting the process of document retrieval remains a challenge. But what if there was a simpler solution to improve the relevance of document retrieval, particularly one that leverages the capabilities of Language Learning Models (LLMs) such as GPT?

"Self-Directed Q&A" using function calling

The conventional method of finding relevant documents relies on querying a vector store with user questions, augmented with any applicable query processing techniques. Once the query results are obtained, they're coupled with the original user question and sent to the model for further processing.

This standard process, however, has its limitations, mainly due to the noise present in the user's question and the dependency on phrasing. It's here that the function calling capabilities of the chat completion API for GPT offers a promising solution.

By exposing "query the vector store" as a callable function to the model, we can leverage the model's ability to generate an optimal query based on the user's question. This means the model can effectively filter out noise from the question and construct a cleaner, more targeted query. Consider this complex question: "Josephus Miller and Dimitri Havelock were detectives on Ceres station. Why did Captain Shaddid, their boss, dislike Havelock?" With traditional vector-based retrieval methods, a question this detailed could return a whopping 1000 documents. Why so many? The reason lies in the nature of vector space models.

Vector space models map words and phrases into a high-dimensional space, where semantic similarities between them translate into geometric closeness. When a user query is executed, it's converted into a vector and the system retrieves documents whose vectors are closest to that of the query. This is where the challenge arises.

In our example, the query contains multiple entities and a relationship ("dislike") between them. Each entity and relationship forms its own vector and pulls in documents that are similar in some way to those vectors. Due to the abundance of information in the query, it touches a broad region in the vector space, hence pulling in a vast array of documents—each relevant in some aspect but potentially diluting the focus from the main query. This leads to the retrieval of a massive amount of loosely related documents, like the 1000 we've hypothetically mentioned.

Even when the question is pared down to "Why did Captain Shaddid, their boss, dislike Havelock?", the nature of the vector-based search can still lead to an avalanche of results, around 500 documents in this case. This is because the query, while more focused than the previous one, still contains several vectors—Captain Shaddid, the boss, dislike, Havelock—that can each pull in a swath of related documents, some of which might only tangentially touch upon the central question.

In contrast, our new approach uses the LLM's ability to extract the essential crux of the query, distilling the question into a more concise form: "Shaddid dislike Havelock reason".

Now, this seemingly minor change has a significant impact on the vector space. By reducing the number of entities and relationships, we effectively narrow down the region in the vector space that we're interested in. This results in the LLM retrieving a much smaller, but far more focused set of documents—let's say around 50.

This compact and relevant selection dramatically improves the user experience, as it reduces the noise and increases the signal, thereby making it easier for the user to find the specific information they're looking for.

This approach harnesses the superior language understanding capabilities of LLMs like GPT, enhancing the effectiveness of document retrieval in the Q&A process. While this method is no silver bullet and has its own complexities to consider, it presents a considerable leap forward in improving the way we find and present relevant information to user inquiries.

Parting thoughts

The evolving landscape of large language models continues to offer promising avenues to improve our interaction with information. As developers and data scientists, the task before us is not just to revel in these advancements but also to find creative ways to leverage them.

The function calling capabilities of the chat completion API for GPT represents one such opportunity. By enabling us to generate more targeted queries, it significantly improves the relevance of document retrieval in Q&A processes. It's an exciting development that demonstrates the potential of machine learning and NLP to enhance our ability to make sense of information.

But, like every solution, it has its complexities and limitations. As we move forward, we'll continue to test, iterate, and refine this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment