Gustavo De Mari Pereira

MSc. Computer Science | University of Sao Paulo

LLM + RAG for public brazilian companies | Gustavo De Mari Pereira

LLM + RAG for public brazilian companies

October 24, 2024

This project combines Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) to efficiently answer queries about public Brazilian companies. I’ve used PostgreSQL to store structured company data, with pgvector enabling fast vector search for document retrieval.

For natural language processing, we utilized the Google Gemini Flash model and Hugging Face models to generate embeddings. User queries are handled through an interactive Gradio interface, while Polars was employed for high-performance data manipulation and analysis.

By integrating these technologies, the system retrieves relevant documents from the Securities and Exchange Commission of Brazil and generates accurate responses, making it easier to access corporate data in Brazil.