LLM + RAG for public brazilian companies
This project combines Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) to efficiently answer queries about public Brazilian companies. I’ve used PostgreSQL to store structured company data, with pgvector enabling fast vector search for document retrieval.
For natural language processing, we utilized the Google Gemini Flash model and Hugging Face models to generate embeddings. User queries are handled through an interactive Gradio interface, while Polars was employed for high-performance data manipulation and analysis.
By integrating these technologies, the system retrieves relevant documents from the Securities and Exchange Commission of Brazil and generates accurate responses, making it easier to access corporate data in Brazil.