Customer Feedback Analysis with Vertex AI Search and LLMs

Explore the architecture and roadmap of a system that processes thousands of reviews into valuable insights. See how a two-stage LLM pipeline and advanced search transform data into strategic decisions.

Łukasz Kidoń
Łukasz Kidoń Published on: March 5, 2025
Contact the author

This article presents the practical implementation of an advanced system for analyzing and synthesizing thousands of customer reviews using a dedicated search engine based on Google Vertex AI Search and a two-stage Large Language Model (LLM) pipeline. Discover the architecture that transforms unstructured data into strategic insights, and a detailed roadmap for its evolution towards a fully autonomous analytical engine.

What business problem does this system solve?

Technology companies, especially in the SaaS sector, collect vast amounts of customer feedback—in support tickets, surveys, or reviews. Despite having this data, product teams are often unable to process it effectively. Manual analysis is time-consuming and inefficient, and traditional keyword search engines struggle with the nuances of natural language. This leads to a situation where the company is "data-rich, but insight-poor."

The main goal of the implemented system was to solve this problem by creating an automated tool that synthesizes thousands of opinions into a single, coherent, and searchable knowledge base. This allows the product team to verify hypotheses in real-time, prioritize feature development, and make decisions based on the authentic voice of the customer.

Workflow diagram of the feedback synthesis system, showing the steps from a query in Slack, through searching in Vertex AI, to the two-stage processing by LLM and the final response.

What does the RAG-based system architecture look like?

The system's architecture is a classic example of the Retrieval-Augmented Generation (RAG) approach, which ensures that every generated response is grounded in actual customer data, eliminating the risk of AI hallucinations. The process is fully integrated with the company's existing communication tools to simplify its use as much as possible.

The workflow is as follows:

  1. User Interface (Slack): An employee (e.g., a Product Manager) asks a question in natural language on a dedicated Slack channel.
  2. Search Core (Vertex AI Search): The question is transformed into a query for the Google Vertex AI-based search engine. The search engine scans the entire feedback repository and returns a list of the most relevant text fragments (snippets) based on semantic analysis.
  3. Synthesis Engine (Two-Stage LLM Pipeline): The returned fragments are sent to a two-stage processing pipeline where two different LLMs collaborate.
  4. Answer Delivery: The final, concise, and precise answer is published on the Slack channel, closing the loop and delivering a ready-made insight in moments.

What is the two-stage LLM pipeline (Generator-Refiner)?

The use of two LLMs in sequence is a deliberate architectural decision that represents an early form of a modular multi-agent system. This pattern, known as "Generator-Refiner," allows for a balance of computational power, cost, and precision.

The first model (Generator), analogous to GPT-4o, is tasked with the initial, creative synthesis. It reads the often-inconsistent fragments and creates a coherent, working draft of the answer. Then, this draft is passed to the second model (Refiner), analogous to Claude 3 Haiku. Its role is different—it is a fast and efficient model that does not generate new content but polishes and verifies the draft for factual consistency, conciseness, and compliance with guidelines. This two-stage LLM pipeline is the foundation for the quality and reliability of the entire system.

System Evolution: From RAG to an Autonomous Analytical Engine

The described system is a solid foundation. Its true potential is unlocked through the systematic evolution of each component, transforming it from a question-answering tool into a proactive Insight Engine. The development roadmap is outlined below.

  • Step 1: Advanced Data Pre-processing. Instead of indexing raw text, "semantic chunking" is implemented—intelligently dividing opinions into logical, complete fragments. Then, for each fragment, an LLM automatically extracts and assigns metadata, such as: sentiment, key topics (e.g., "interface," "price"), and named entities (e.g., names of specific product features). This creates a rich, structured knowledge base.
  • Step 2: Intelligent Search Strategies. Standard semantic search is enhanced with hybrid search (combining meaning and keywords for precision), and the embedding model is fine-tuned on company-specific vocabulary. Additionally, the system implements query transformation—another LLM improves and expands the user's query before it reaches the search engine, ensuring more relevant results.
  • Step 3: Rigorous Post-Retrieval Verification. Instead of immediately generating an answer, the system introduces a "re-ranking" stage, where a more powerful model (a cross-encoder) re-evaluates and re-ranks the search results to place the most critical ones at the very top. Furthermore, the role of the second LLM evolves into "corrective RAG" (Self-RAG), where its task is to actively fact-check the generated draft and compare it with the source data to eliminate any inaccuracies.
  • Step 4: Integration with Business Intelligence (BI). The structured metadata (topics, sentiment, trends) is made available via an API to tools like Tableau or Power BI. This allows for the creation of interactive dashboards for managers, who can independently explore customer data without involving analysts.

The culmination of this evolution is an AI agent. Equipped with these advanced components, it can autonomously carry out complex tasks. For example, it could be given the goal: "Analyze the potential impact of a 15% price increase." The agent would first analyze historical data on price sensitivity, then generate thousands of simulated customer reactions, and finally present a report with the predicted sentiment distribution and key concerns. This is no longer just an analytical tool—it's an autonomous strategic partner.

Interactive BI dashboard visualizing trends in customer feedback, such as sentiment analysis over time and most frequently discussed topics, integrated directly with the RAG system.

Frequently Asked Questions (FAQ)

The "Generator-Refiner" pattern with two models allows for optimization. A more powerful, more expensive model (Generator) is used for the complex task of synthesis, while a faster, cheaper model (Refiner) handles validation and formatting. This architecture ensures high-quality responses while controlling costs and speed.

The key difference lies in understanding meaning (semantic analysis), not just matching words (lexical analysis). The RAG system understands the context of the query. For example, for a question about "pricing issues," it will find opinions containing phrases like "the subscription is too expensive" or "unfavorable value for money," even if they don't contain the word "price."

Thanks to the Retrieval-Augmented Generation (RAG) architecture. The LLM does not answer from memory but is required by the prompt to create a response based solely on the data fragments (snippets) provided by the search engine. In the more advanced evolutionary version, the second LLM acts as a fact-checker, further minimizing the risk.

A basic RAG reacts to questions by retrieving and summarizing existing information. The "Insight Engine" is its evolution—it proactively processes data, enriches it with metadata (sentiment, topics), enables trend analysis, and integrates with BI systems. It can answer "why is something happening?" and not just "what was said?".

Vertex AI Search offers advanced hybrid search capabilities (combining semantic analysis and keywords), which ensures high relevance of results. Additionally, as a managed service, it provides scalability and easy integration with other tools in the Google Cloud ecosystem, which is crucial for the system's further evolution.

Yes, it is a realistic and logical next step. Modern frameworks for AI agents allow "equipping" LLMs with tools (such as APIs for an analytical system, Jira, or simulation tools). By giving the agent a goal and access to these tools, it can independently plan and execute complex tasks that today require the work of many analysts.

Łukasz Kidoń - Specjalista AI

Contact the author

If you want to automate processes in your company or have any questions, I will gladly analyze your needs and propose a dedicated solution.

Or write directly to: lukasz@kidon.pro