Building a fast and efficient AI Assistant with LangChain and OpenAI

Mauro Krikorian explores how to harness the power of AI to improve customer experiences in the banking and finance sector.

A Short Intro

In an era where digital transformation is reshaping industries, the financial sector is no exception. We’ve recently embarked on a pivotal project to implement an AI assistant for a large bank. This initiative reflects a strategic move by this big bank to leverage AI technologies and deliver enhanced solutions for their customers.

The bank’s primary objective was to explore innovative solutions that harness the power of AI to improve customer experiences. Our team collaborated closely with theirs, examining several business cases to identify the most critical and impactful one. Through this process, we pinpointed the area causing the most significant pain points for their customers and them, making it our focal point for AI enhancement.

By addressing this critical business case, the bank stands to significantly elevate its reputation. The targeted enhancements are expected to boost customer satisfaction, and in the competitive banking industry, such improvements can differentiate the bank from its competitors, fostering stronger customer relationships and loyalty, in addition to attracting new customers through innovation.

The implementation aligns with stringent government regulations on financial information sharing and openness, ensuring compliance while driving innovation. This balance of regulatory adherence and cutting-edge technology positions the bank as a leader in the financial sector.

The Business Case

Unrecognized transactions on customers’ bank statements are a common issue that frequently leads to increased support calls. When customers identify transactions they do not recognize, they often reach out to the bank’s support team for clarification. Support agents then have to go through multiple data sources to link and verify records, a process that can be time-consuming and prone to errors. This cumbersome procedure not only impacts the efficiency of support operations but also contributes to customer frustration and dissatisfaction.

In many instances, if the support agent cannot resolve the issue quickly, a formal claim must be filed to follow up and consolidate all relevant information. This additional step not only delays resolution but also increases the administrative burden on the bank. Financial regulations mandate that the number of claims remains below certain thresholds, which adds pressure on the bank to resolve unrecognized transaction issues promptly and efficiently, and more important avoid them from even happening. Failure to comply with these regulatory requirements can result in penalties and damage to the bank’s reputation.

To address these challenges, a fast and smart solution for identifying and contextualizing unrecognized transactions is crucial. Implementing an AI-powered assistant can significantly streamline this process by providing both customers and support agents with instant access to transaction details and context. This technology can quickly analyze transaction data, cross-referencing it with multiple sources to provide accurate and comprehensive information. By doing so, it reduces the need for filing claims and ensures that issues are resolved promptly, keeping the number of claims within regulatory limits.

The adoption of this AI assistant not only improves operational efficiency but also enhances customer satisfaction and loyalty. Customers benefit from quicker resolutions and a more seamless banking experience, while support agents can focus on more complex issues rather than spending excessive time on routine transaction inquiries. Ultimately, this smart solution helps the bank maintain compliance with financial regulations, while it enhances its reputation and strengthens its competitive edge in the financial services industry.

Our Solution

Leveraging the capabilities of LangChain and Azure Open AI, we developed an advanced AI agent specifically designed to tackle this issue. This agent, crafted in Python, incorporates a wide array of tools to enhance LLM’s capabilities, ensuring comprehensive and accurate transaction analysis.

In a multi-cloud environment where compute and data sources are physically isolated and deployed, our solution maintains optimal performance and security. The AI assistant can operate seamlessly across these isolated systems, effectively managing the complex data landscape typical of large financial institutions. By enabling real-time querying over a few Data Lake sources, the assistant utilizes various Retrieval-Augmented Generation (RAG) techniques to ground its responses with precise and relevant information. This ability to instantly cross-reference multiple data sources ensures that both customers and support agents receive accurate and comprehensive information regarding unrecognized transactions.

Furthermore, the AI assistant employs the latest OpenAI models, including GPT-Turbo and GPT-4, to deliver sophisticated and contextually relevant financial insights. These models enhance the assistant’s ability to understand and respond to complex queries, providing users with detailed explanations and solutions. By quickly analyzing transaction data and offering clear context, the AI assistant reduces the need for filing claims and ensures that issues are resolved promptly.

Ultimately, the implementation of this AI-powered assistant transforms the way unrecognized transactions are handled. Customers benefit from quicker resolutions and a more seamless banking experience, while support agents can focus on more complex issues, reducing their administrative burden.

Sounds amazing, right? But…

Problems along the way

As we progressed through the project, we encountered several issues that added layers of complexity to our implementation. Initially, our scope included a manageable number of data sources. However, as different teams within the bank continued to bring more data sources into the picture, the complexity of information retrieval and grounding queries increased significantly. Each new data source required integration, validation, and adjustment, which complicated the AI assistant’s prompt engineering and data handling processes.

The initial approach of using a single ReAct (Reason & Act) agent to respond to all types of questions became increasingly cumbersome and slow over time. This methodology, which relies on a cycle of interactions between the LangChain ReAct agent and the underlying LLM model, struggled to keep up with the growing complexity. As the data’s schema and quality varied widely, the original prompts became laden with detailed instructions to handle specific use cases. This increased prompt complexity led to the model “unlearning” certain tasks and hallucinating more frequently, producing inaccurate or fabricated responses.

One significant issue we encountered was LLM hallucinations. On multiple occasions, the language model responded with seemingly proper answers that were actually based on incorrect or invented data. For instance, when users asked for a summary of expenses, the LLM sometimes generated a summary without actually processing any real data, resulting in a fabricated expense report. These hallucinations were particularly problematic because they undermined the trust in the AI assistant’s reliability.

In some use cases, these hallucinations were easy to detect because the LLM’s responses did not correlate with any accessed data. Re-iterating the user’s question often resolved the issue temporarily, but this was not a sustainable solution. The need to continually adjust and refine prompts to mitigate hallucinations and ensure accurate data retrieval highlighted the limitations of our initial approach and underscored the need for more robust and scalable solutions. As a result, we ended up exploring more advanced techniques and possibly different architectural models to enhance the accuracy and reliability of the AI assistant.

Have you ever heard about the old and known ‘divide and conquer’ technique? 😏

Refactoring for Performance and Efficiency

After considering different options and architectures, that I’ll share in another article as I want to keep this one short, we ended up implementing a multi-agent system to better manage the increasing complexity. This refactoring aimed to improve the speed and accuracy of the AI assistant’s responses, thereby enhancing the overall user experience.

This new approach involves first identifying the type of inquiry and then routing it to a more specialized agent. By leveraging a mix of GPT-Turbo and GPT-4 models, we can tailor responses more effectively while ensuring that each agent is optimized for specific tasks. By creating specific agents for a subset of use cases, we managed to keep the prompts smaller and the LLMs interactions faster and more reliable. This modular approach allows each agent to handle its designated tasks more efficiently, reducing the risk of hallucinations and improving response times.

Before moving forward allow me sharing a high-level diagram of our multi-agent architecture:

In our multi-agent system, we employ different OpenAI models to validate and detect the user inquiry initially. These models are responsible for identifying the proper use case and routing the query to the appropriate specialized agent. This initial validation step ensures that each inquiry is directed to an agent best suited to handle it, streamlining the entire process and enhancing accuracy. By routing inquiries correctly from the start, we minimize unnecessary interactions with the LLMs and ensure that users receive precise and relevant responses.

The need for the ReAct model diminishes in this refactored approach, as each use case is now tailored to a specific data query. When necessary, the LLMs can still be asked to generate the required data query, but many use cases do not require further LLMs interaction once the initial inquiry is detected. For example, a “balance inquiry” can be directly mapped to a specific data query and a templated text response, eliminating the need for additional processing. This targeted handling of use cases not only speeds up response times but also reduces the cognitive load on the LLMs, further minimizing the risk of hallucinations.

Conclusions

Large prompts were one of the significant challenges we faced. Depending on the underlying LLM model, extensive prompts with numerous instructions were often ignored, took longer to process, and tended to cause more hallucinations compared to smaller, more focused prompts. Additionally, maintaining large and complex prompts proved to be cumbersome.

The multi-agent approach allowed us to manage the complexity of data sources and queries more effectively. By assigning specialized agents to handle specific tasks, we ensure that the AI assistant remains responsive and accurate, even as the scope of the project expands.

This new implementation after the refactor not only addresses the initial problems but also provides a scalable framework that can adapt to future requirements, ultimately delivering a more robust and reliable AI-powered assistant to solve your business case.

This collaboration between SOUTHWORKS and this top-5 bank on building this Personal Finance AI assistant underscores the transformative impact of AI in financial services. By addressing key pain points, enhancing customer satisfaction, and ensuring regulatory compliance, the bank is set to achieve significant gains in reputation, customer loyalty, and market expansion.