Sameer Singh

If you have been following along with this series on Retrieval-Augmented Generation (RAG), you already know that building a robust AI system is not just about hooking a language model to some text and hoping for the best. There is a carefully orchestrated pipeline behind every intelligent search or question-answering system.
In the previous parts of this series, we explored two foundational components:
Now we arrive at arguably the most technically interesting component of the entire RAG pipeline:
Vector Stores
In this article, we are going to go deep. We will cover not just what vector stores are, but why they exist, what problems they solve, how they work internally, and how to use them in a real LangChain project. By the end, you will have a solid conceptual and practical understanding of this critical AI infrastructure component.
Before diving into definitions and technical jargon, let us build intuition by working through a concrete real-world problem.
Imagine you are building a movie platform similar to IMDb. At its core, your system needs to:
Your basic architecture looks like this:
Database --> Backend (Python/Node) --> Frontend Website
This is straightforward. But now your product manager comes to you with a new requirement.
You want to show users movies similar to what they are currently viewing. For example:
This feature can dramatically increase:
Now, how do you actually build this?
The first instinct most developers have is to implement keyword or attribute matching. The logic is simple: if two movies share the same director, actor, or genre, they are probably similar.
| Movie A | Movie B | Shared Attributes |
|---|---|---|
| Avengers | Iron Man | Same universe, overlapping cast |
| The Dark Knight | Batman Begins | Same director (Nolan), same character |
In cases like these, attribute matching does a decent job.
Consider this example. My Name is Khan and Kabhi Alvida Naa Kehna share:
A keyword-based system would confidently recommend one when the user is viewing the other. But if you have seen both films, you know they are fundamentally different stories. One is a deeply political and social narrative about prejudice post-9/11. The other is a romantic drama about extramarital relationships and complex family dynamics.
The keyword system fails because it has no understanding of what the movies are actually about.
Now consider Taare Zameen Par (an Indian film about a dyslexic child) and A Beautiful Mind (an American film about a mathematician with schizophrenia). These two films share almost nothing in terms of conventional attributes:
Yet anyone who has watched both will immediately recognize the thematic and emotional similarity: a brilliant, struggling mind navigating a world that does not understand them, eventually finding redemption and recognition.
A keyword-based system would never surface one when viewing the other. But a semantically intelligent system absolutely should.
This gap between surface-level attributes and actual meaning is the core problem that embeddings and vector stores are designed to solve.
The breakthrough idea behind modern semantic search is deceptively simple: represent the meaning of text as a point in mathematical space.
An embedding is a numerical vector (a list of floating-point numbers) that encodes the semantic meaning of a piece of text. This conversion is performed by a neural network that has been trained on massive amounts of text data.
When you pass a movie plot through an embedding model, you get something like this:
"3 Idiots is a story about three engineering students navigating friendship, academic pressure, and the meaning of true success."
--> Embedding Model (e.g., OpenAI text-embedding-ada-002, Google PaLM, BERT)
--> [0.23, -0.81, 0.54, 0.12, 0.77, -0.33, 0.91, 0.05, ...]This vector might have 768, 1536, or even 3072 dimensions depending on the model used. Each dimension captures some abstract aspect of the text's meaning, though these dimensions are not directly human-interpretable.
Here is the beautiful part. When you convert many movie plots to embeddings and plot them in high-dimensional space:
For example:
This spatial relationship is measured mathematically using cosine similarity: a value between -1 and 1 that tells you how similar two vectors are, regardless of their magnitude.
Cosine Similarity = 1 --> Identical meaning
Cosine Similarity = 0 --> Completely unrelated
Cosine Similarity = -1 --> Opposite meaningEmbedding models are trained to understand context, synonyms, analogies, and semantic relationships. Consider:
| Phrase | Embedding Behavior |
|---|---|
| "car" vs "automobile" | Very similar vectors |
| "king" vs "queen" | Close but different, direction encodes gender |
| "happy" vs "joyful" | Nearly identical |
| "bank (financial)" vs "bank (river)" | Different vectors despite same word |
This contextual understanding is what allows a vector-based recommendation system to surface Taare Zameen Par when someone is watching A Beautiful Mind. Both plots are encoded as vectors that sit close together in semantic space, even though they share no keywords.
Once you decide to use embeddings as the backbone of your search or recommendation system, you face three significant engineering challenges.
If your movie database has one million entries, you need to generate one million embedding vectors. Each embedding generation call involves sending text to a neural network and receiving a vector in return.
This requires:
Traditional relational databases like MySQL, PostgreSQL, and Oracle are designed for structured, tabular data. They are optimized for operations like filtering rows, joining tables, and aggregating counts.
A vector like [0.23, -0.81, 0.54, ...] with 1536 dimensions is not something these systems handle well. Storing it as a serialized blob is technically possible but completely impractical for the next challenge.
This is where things get computationally expensive. Suppose a user is viewing Movie A, and you want to find the top 10 most similar movies from a database of 1 million entries. A naive approach would be:
For 1 million vectors with 1536 dimensions each, this requires approximately 1.5 billion floating-point operations per query. On modern hardware, this might take several seconds per search, which is completely unacceptable for a real-time user experience.
You need a smarter approach.
A vector store is a specialized data management system designed from the ground up to store, index, and retrieve high-dimensional numerical vectors efficiently.
In plain terms: it is the database purpose-built for the world of embeddings.
Vector stores can store:
Storage can happen in two ways:
The core query operation in a vector store is "given this query vector, find me the N most similar vectors in the database."
This is called a k-nearest neighbor (kNN) search or approximate nearest neighbor (ANN) search, and it is what enables:
Naive kNN search is too slow at scale. Vector stores solve this with indexing strategies that dramatically reduce search time by avoiding comparisons with irrelevant vectors.
Here is how a common indexing approach works conceptually:
Step 1: Cluster all vectors into N groups during index build time.
Step 2: Calculate the centroid (geometric center) of each cluster.
Step 3: When a query arrives, first compare the query vector against all cluster centroids. This is cheap (only N comparisons).
Step 4: Identify the closest cluster(s) and search only within them.
Result: Instead of comparing against 1,000,000 vectors, you might only compare against 50,000, achieving a 20x speedup with minimal accuracy loss.
Popular indexing algorithms include:
| Algorithm | Full Name | Best For |
|---|---|---|
| HNSW | Hierarchical Navigable Small World | High accuracy, moderate memory |
| IVF | Inverted File Index | Large-scale datasets |
| PQ | Product Quantization | Memory-constrained environments |
| Flat | Brute-force (no approximation) | Small datasets, maximum accuracy |
Beyond search, vector stores also support standard database operations:
This is essential for keeping your knowledge base current as your content evolves.
Vector stores are not a niche technology. They power some of the most widely used AI features in production today.
Traditional search engines match keywords. Semantic search engines match meaning. When you type "how to cook chicken in a healthy way" into a semantic search engine, it can surface a recipe titled "Grilled Poultry with Mediterranean Vegetables" even though none of your query words appear in the title.
Companies like Notion, Confluence, and Linear use semantic search to help users find documents and issues using natural language.
Every major streaming and e-commerce platform uses some form of vector similarity to power recommendations:
In a Retrieval-Augmented Generation pipeline, the vector store plays a central role:
Without a vector store, the LLM would be generating answers purely from its training data, with no access to your private or up-to-date information.
Vector stores are not limited to text. Modern embedding models can encode:
This enables cross-modal search, where you can search for images using a text description, or find similar audio clips by providing a sample.
These two terms are frequently used interchangeably, but they represent meaningfully different things.
A vector store is a lightweight system focused exclusively on vector storage and similarity search. It provides the core search functionality without enterprise database features.
Examples:
Best for: Development, prototyping, small-to-medium scale applications, embedded use cases.
A vector database is a full-featured database system built around vector operations. It includes everything a vector store has, plus enterprise-grade capabilities:
Examples:
| Database | Key Strength |
|---|---|
| Pinecone | Fully managed, zero ops overhead |
| Weaviate | Multi-modal support, GraphQL interface |
| Milvus | Highly scalable, open-source |
| Qdrant | Rust-based, payload filtering, self-hostable |
| pgvector | PostgreSQL extension (familiar for SQL users) |
Best for: Production applications, enterprise deployments, systems requiring reliability guarantees.
Vector Database = Vector Store + Full Database Feature Set
Every vector database IS a vector store.
Not every vector store IS a vector database.LangChain is a popular Python framework for building LLM-powered applications. It provides a unified, consistent interface for working with many different vector stores, so you can switch backends without rewriting your application logic.
LangChain natively supports dozens of vector stores, including:
Regardless of which vector store you choose, LangChain exposes the same core methods:
# Create a vector store from documents
vectorstore = VectorStore.from_documents(documents, embedding_model)
# Add more documents later
vectorstore.add_documents(new_documents)
# Search by semantic similarity
results = vectorstore.similarity_search("your query", k=5)
# Search with similarity scores returned
results_with_scores = vectorstore.similarity_search_with_score("your query", k=5)This abstraction is powerful. You can prototype with FAISS locally, then switch to Pinecone for production by changing a single import line and configuration block.
Let us walk through a complete practical example. We will build a simple RAG system that can answer questions about a cricket dataset using Chroma as our vector store.
Chroma is an excellent choice for development and small production workloads:
Before writing code, it helps to understand how Chroma organizes data:
Tenant
|
+--> Database
|
+--> Collection (like a table in SQL)
|
+--> Document
|
+--> Text Content
+--> Metadata (key-value pairs)
+--> Embedding VectorEach Collection groups related documents together. You might have separate collections for different topics, departments, or data sources.
pip install langchain langchain-community langchain-openai chromadb openaifrom langchain.schema import Document
# Sample cricket player data
players = [
Document(
page_content="Rohit Sharma is a right-handed opening batsman known for his elegant stroke play and record-breaking double centuries in ODI cricket.",
metadata={"name": "Rohit Sharma", "team": "Mumbai Indians", "role": "Batsman"}
),
Document(
page_content="Jasprit Bumrah is an Indian fast bowler renowned for his unique bowling action, exceptional yorkers, and ability to bowl in death overs.",
metadata={"name": "Jasprit Bumrah", "team": "Mumbai Indians", "role": "Bowler"}
),
Document(
page_content="Ravindra Jadeja is a prolific all-rounder for Chennai Super Kings who contributes significantly with both left-arm spin bowling and aggressive lower-order batting.",
metadata={"name": "Ravindra Jadeja", "team": "Chennai Super Kings", "role": "All-rounder"}
),
Document(
page_content="MS Dhoni is a legendary wicketkeeper-batsman and captain for Chennai Super Kings, famous for his finishing abilities and calm decision-making under pressure.",
metadata={"name": "MS Dhoni", "team": "Chennai Super Kings", "role": "Wicketkeeper-Batsman"}
),
]from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(
model="text-embedding-3-small", # 1536 dimensions
openai_api_key="your-api-key"
)Note: You can substitute OpenAI embeddings with free alternatives likeHuggingFaceEmbeddingsusing models such assentence-transformers/all-MiniLM-L6-v2for local development without API costs.
from langchain_community.vectorstores import Chroma
# Create vector store from documents
# This automatically generates embeddings and stores them
vectorstore = Chroma.from_documents(
documents=players,
embedding=embedding_model,
collection_name="cricket_players",
persist_directory="./chroma_db" # Omit this for in-memory only
)
print(f"Vector store created with {vectorstore._collection.count()} documents")# Basic similarity search
query = "Which player is known for bowling?"
results = vectorstore.similarity_search(query, k=2)
for doc in results:
print(f"Player: {doc.metadata['name']}")
print(f"Role: {doc.metadata['role']}")
print(f"Description: {doc.page_content}")
print("---")Output:
Player: Jasprit Bumrah
Role: Bowler
Description: Jasprit Bumrah is an Indian fast bowler renowned for his unique bowling action...
---
Player: Ravindra Jadeja
Role: All-rounder
Description: Ravindra Jadeja is a prolific all-rounder...
---The system correctly identified the bowler even though the query phrase "known for bowling" does not appear verbatim in any document.
# Returns results alongside their cosine distance scores
results_with_scores = vectorstore.similarity_search_with_score(
"Who is a great finisher under pressure?",
k=2
)
for doc, score in results_with_scores:
print(f"Player: {doc.metadata['name']}")
print(f"Similarity Score: {score:.4f}") # Lower = more similar (cosine distance)
print("---")One powerful feature of vector stores is combining semantic search with exact metadata filtering. This lets you narrow results before similarity comparison:
# Find bowlers or all-rounders from Chennai Super Kings
results = vectorstore.similarity_search(
"Who bowls spin?",
k=3,
filter={"team": "Chennai Super Kings"}
)This is extremely useful in real applications where you want semantic search within a specific category, date range, department, or data source.
from langchain.schema import Document
new_players = [
Document(
page_content="Virat Kohli is a right-handed middle-order batsman and former Indian captain, widely regarded as one of the greatest batsmen of his generation.",
metadata={"name": "Virat Kohli", "team": "Royal Challengers Bangalore", "role": "Batsman"}
)
]
vectorstore.add_documents(new_players)
print("New document added successfully")from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Create retriever from vector store
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 3}
)
# Build RAG chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
# Ask a question
response = qa_chain.invoke({"query": "Tell me about players who are known for finishing matches under pressure."})
print(response["result"])The LLM now grounds its answer in your actual document data rather than relying on potentially outdated training knowledge.
Pure semantic search sometimes misses exact keyword matches that matter. For example, if a user searches for a product by its exact model number (like "RTX 4090"), semantic search might return conceptually similar GPUs when the user wanted that specific model.
Hybrid search combines:
The results are merged using a technique called Reciprocal Rank Fusion (RRF). This gives you the best of both worlds: conceptual understanding plus exact match capability.
Vector databases like Weaviate and Qdrant have native hybrid search support.
Your choice of embedding model significantly impacts search quality. Key considerations:
| Factor | Consideration |
|---|---|
| Dimensions | More dimensions generally means more expressive vectors (and higher storage cost) |
| Domain | General-purpose models vs. domain-specific (e.g., legal, medical, code) |
| Language | Multilingual support if your content is in multiple languages |
| Speed | Self-hosted models vs. API-based models (latency tradeoffs) |
| Cost | API pricing vs. compute cost of self-hosting |
Popular choices:
Vector stores are only as good as the data you put into them. Your chunking strategy (how you split documents before embedding) has a huge impact on retrieval quality:
| Use Case | Recommended Option | Reason |
|---|---|---|
| Local development / learning | Chroma or FAISS | Zero setup, free, works offline |
| Production (fully managed) | Pinecone | No infrastructure to manage |
| Production (self-hosted) | Milvus or Qdrant | Full control, open-source |
| Existing PostgreSQL setup | pgvector | No new infrastructure |
| Multi-modal search | Weaviate | Native image, text, audio support |
| High-performance filtering | Qdrant | Advanced payload filtering with great performance |
Let us bring it all together with a clear picture of the full RAG pipeline and where vector stores fit:
1. Load Documents
(PDFs, web pages, databases via Document Loaders)
|
v
2. Split into Chunks
(via Text Splitters with overlap)
|
v
3. Generate Embeddings
(via OpenAI, Cohere, HuggingFace, etc.)
|
v
4. Store in Vector Store
(Chroma, FAISS, Pinecone, Weaviate, etc.)
|
v
5. User Asks a Question
|
v
6. Embed the Question
(same model as step 3)
|
v
7. Similarity Search in Vector Store
(retrieve top-k relevant chunks)
|
v
8. Inject Context into LLM Prompt
|
v
9. LLM Generates Grounded AnswerVector stores live at the center of this pipeline. They are the memory of your RAG system.
Vector stores represent a fundamental shift in how we think about data retrieval. Traditional databases answer the question "does this exact record exist?" Vector stores answer the far more powerful question: "what does my data mean, and what is similar to this query?"
This shift from exact matching to semantic understanding is what makes modern AI applications feel genuinely intelligent rather than just fast databases with a chatbot wrapper.
As you build more sophisticated RAG systems, you will find that the quality of your vector store setup, your embedding model choice, your chunking strategy, and your metadata design are often more impactful on final output quality than the LLM you choose.
Get the retrieval right, and your LLM has great material to work with. Get it wrong, and even the most powerful model will produce hallucinated, irrelevant answers.
In the next article of this series, we will bring everything together to build a complete, end-to-end RAG application. Stay tuned.
Part of the Retrieval-Augmented Generation (RAG) series. Previous articles covered Document Loaders and Text Splitters.
You do not need to check every buy-sell pair. One pass, two variables, and the right greedy instinct is all it takes. Here is the full breakdown of LeetCode 121 with a deep dry run and interview tips.
Rahul Kumar
Text splitting is the backbone of every RAG system. Learn how chunking works, why chunk size and overlap matter, and which LangChain splitter to use for production-grade AI applications.
Sign in to join the discussion.
Sameer Singh