solar-systemRAG System

Tutorial - Build a Retrieval-Augmented Generation system with vector embeddings and MCP

Step-by-step tutorial for building a Retrieval-Augmented Generation system with NeurosLink AI and Model Context Protocol (MCP)


What You'll Build

A production-ready RAG (Retrieval-Augmented Generation) system featuring:

  • 📚 Document ingestion from multiple formats (PDF, MD, TXT)

  • 🔍 Semantic search with vector embeddings

  • 🤖 AI-powered Q&A with source citations

  • 🔧 MCP integration for file system access

  • 💾 Vector storage with Pinecone/in-memory

  • 🎯 Context-aware responses

  • 📊 Relevance scoring and ranking

Tech Stack:

  • Next.js 14+

  • TypeScript

  • NeurosLink AI with MCP

  • OpenAI Embeddings

  • Pinecone (or in-memory vector store)

  • PDF parsing libraries

Time to Complete: 60-90 minutes


Prerequisites

  • Node.js 18+

  • OpenAI API key (for embeddings)

  • Anthropic API key (for generation)

  • Pinecone account (optional, free tier)

  • Sample documents to index


Understanding RAG

RAG combines retrieval and generation:

Why RAG?

  • ✅ Access to custom/private data

  • ✅ Up-to-date information

  • ✅ Reduced hallucinations

  • ✅ Source attribution

  • ✅ Cost-effective (smaller context windows)


Step 1: Project Setup

Initialize Project

Options:

  • TypeScript: Yes

  • Tailwind CSS: Yes

  • App Router: Yes

Install Dependencies

Environment Setup

Create .env.local:


Step 2: Document Processing

Create Document Parser

Create src/lib/document-parser.ts:


Step 3: Text Chunking

Create src/lib/text-chunker.ts:


Step 4: Embedding Service

Create src/lib/embeddings.ts:


Step 5: Vector Store (In-Memory)

Create src/lib/vector-store.ts:

  1. Vector entry structure: Each entry stores the chunk's embedding vector, metadata, and a reference to the original chunk.

  2. In-memory storage: All vectors are stored in RAM. For production with large datasets (>10K docs), use Pinecone or another vector database.

  3. Batch embedding: Process all chunks together for efficiency. OpenAI allows up to 100 texts per API call.

  4. Convert text to vectors: Each chunk is converted to a 1536-dimensional embedding vector (using OpenAI's text-embedding-3-small model).

  5. Semantic search: Find the most relevant chunks by comparing vector similarity, not keyword matching.

  6. Query embedding: Convert the user's question into the same vector space as the document chunks.

  7. Calculate similarity: Compute cosine similarity between query vector and all document vectors. Score ranges from -1 to 1 (higher = more similar).

  8. Rank by relevance: Sort results by similarity score in descending order (most relevant first).

  9. Return top results: Return only the topK most relevant chunks to use as context for the AI.


Step 6: Alternative: Pinecone Vector Store

Create src/lib/pinecone-store.ts:


Step 7: RAG Service

Create src/lib/rag-service.ts:

  1. Use Claude for generation: Claude 3.5 Sonnet excels at following instructions and citing sources accurately in RAG applications.

  2. Chunk configuration: 1000 characters per chunk with 200 character overlap to maintain context across chunk boundaries.

  3. Indexing pipeline: Parse documents → chunk text → create embeddings → store in vector database. Run this once when documents change.

  4. Text chunking: Split documents into smaller chunks. Large documents can't fit in context windows, and smaller chunks improve retrieval precision.

  5. Create embeddings: Convert each chunk to a vector representation. This is the most expensive operation (OpenAI API costs ~$0.02/1M tokens).

  6. RAG query flow: Retrieve relevant chunks → build context → generate answer with citations.

  7. Semantic search: Find the 5 most relevant chunks using vector similarity (not keyword matching).

  8. Build augmented context: Format retrieved chunks with source labels to enable the AI to cite sources in its answer.

  9. Structured prompt: Clear instructions help the AI stay grounded in the provided context and cite sources properly.

  10. Generate final answer: NeurosLink AI sends the question + context to Claude, which generates an answer based on the retrieved information.


Step 8: API Routes

Index Documents API

Create src/app/api/index/route.ts:

Query API

Create src/app/api/query/route.ts:


Step 9: Frontend Interface

Create src/app/page.tsx:


Step 10: Testing

Prepare Test Documents

Create docs/ folder with sample files:

docs/introduction.md:

docs/architecture.md:

Index Documents

  1. Start dev server: npm run dev

  2. Click "Index Documents"

  3. Wait for completion

Test Queries

Try these questions:

Verify:

  • Relevant sources retrieved

  • Answer cites sources

  • Relevance scores make sense


Step 11: Production Enhancements

Add Streaming Responses

Add Document Upload

Add Metadata Filtering


Step 12: MCP Integration (Advanced)

Using Model Context Protocol for file access:


Troubleshooting

Embeddings API Errors

Memory Issues with Large Documents

Poor Retrieval Quality


Feature Guides:

Tutorials & Examples:


Summary

You've built a production-ready RAG system with:

✅ Multi-format document ingestion (PDF, MD, TXT) ✅ Text chunking with overlap ✅ Vector embeddings (OpenAI) ✅ Semantic search ✅ AI-powered Q&A with source citations ✅ Relevance scoring ✅ Modern web interface

Cost Analysis:

  • Embedding: ~$0.02 per 1M tokens

  • Generation: ~$3 per 1M input tokens (Claude 3.5 Sonnet)

  • 1000 documents → ~$0.50 to index

  • 1000 queries → ~$2

Next Steps:

  1. Add authentication

  2. Implement caching

  3. Add document versioning

  4. Deploy to production

Last updated

Was this helpful?