Sameer Singh

Introduction
We are living through one of the most transformative technological shifts in human history. Large Language Models (LLMs) like GPT-4, Claude, and Gemini have demonstrated capabilities that seemed like science fiction just a few years ago - writing code, answering complex questions, summarizing thousands of pages of text, and even reasoning through multi-step problems. The potential is staggering.
But here's the thing most tutorials don't tell you: using an LLM API is the easy part. The real challenge is building a production-grade application around it - one that can retrieve the right information, maintain conversation history, orchestrate multiple components, and switch providers without rewriting your entire codebase.
That's exactly the problem LangChain was designed to solve.
In this deep-dive guide, we'll go far beyond the surface-level explanation. We'll cover what LangChain is, why it exists, how its internals work, and how to think about building real AI-powered systems using it - with a detailed walkthrough of a practical, end-to-end use case.
Before diving into LangChain itself, it's essential to understand the landscape it operates in. Why can't you just call an LLM API and call it done?
Every LLM has a maximum number of tokens (roughly, words) it can process at once. GPT-4 Turbo, for example, supports around 128,000 tokens - which sounds like a lot, until you try to feed it a 500-page technical manual or an entire codebase.
You cannot simply paste in your entire knowledge base and ask questions. You need a smarter way to retrieve only the relevant portions and pass those to the model.
Every time you call an LLM API, it starts fresh. It has no memory of previous conversations. If a user asks:
"What are the assumptions of Linear Regression?"
...and then follows up with:
"Can you generate interview questions on that topic?"
The model has no idea what "that topic" refers to. You need to explicitly manage and inject conversation history.
Out of the box, an LLM can only generate text. It cannot:
If you want your AI application to do things - not just talk - you need to build a layer that gives the LLM the ability to use tools.
A real AI application might involve:
Getting all of these to work together reliably, with proper error handling and the flexibility to swap out components, is a significant engineering challenge.
LangChain solves all of these problems.
LangChain is an open-source framework for building applications powered by Large Language Models. It provides:
Think of LangChain as the plumbing and electrical wiring of an AI application. You bring the ideas; LangChain handles the infrastructure.
Core philosophy: Components should be modular, interchangeable, and composable. You should be able to switch from OpenAI to Anthropic, or from Pinecone to Weaviate, by changing a single configuration line - not rewriting your entire codebase.
Let's build our understanding around a concrete, realistic example: a PDF-based AI knowledge assistant.
A user uploads a PDF - say, a machine learning textbook. They want to:
This is a Retrieval-Augmented Generation (RAG) system - one of the most common and powerful patterns in LLM application development.
Let's walk through exactly how this system works, layer by layer.
When a user uploads a PDF, the following happens:
Step 1: Document Loading
The PDF is loaded into memory using a document loader. LangChain provides loaders for:
Each loader produces a standardized Document object containing the text content and metadata (like source file, page number, etc.).
Step 2: Text Splitting
Raw documents are rarely the right size to feed into an embedding model or LLM. A 300-page PDF needs to be broken into smaller, semantically meaningful chunks.
LangChain provides several text splitters:
Key parameters:
chunk_size: How many tokens/characters per chunk (e.g., 1000)chunk_overlap: How many tokens to share between adjacent chunks (e.g., 200) - this prevents answers from being cut off at chunk boundaries.Step 3: Embedding Generation
Each chunk of text is converted into a vector embedding - a list of floating-point numbers that captures the semantic meaning of the text.
Two pieces of text about similar topics will have embeddings that are close together in this high-dimensional vector space, even if they use different words. This is the foundation of semantic search.
Common embedding models:
text-embedding-ada-002 (OpenAI) - 1536 dimensionstext-embedding-3-large (OpenAI) - 3072 dimensionsall-MiniLM-L6-v2 (HuggingFace, runs locally) - 384 dimensionsStep 4: Vector Database Storage
The embeddings (along with the original text chunks and metadata) are stored in a vector database, which is optimized for similarity search.
Popular vector databases supported by LangChain:
When a user asks a question, this pipeline kicks in:
Step 1: Query Embedding
The user's question is converted into an embedding using the same model that was used to embed the document chunks. This ensures apples-to-apples comparison.
Step 2: Similarity Search
The query embedding is compared against all stored embeddings using a distance metric - typically cosine similarity or dot product. The k most similar chunks are retrieved (e.g., top 3 or top 5).
This is much more powerful than keyword search because it finds semantically relevant content, even when the exact words don't match.
Example:
Query: "What factors affect model overfitting?"
Even if the relevant chapter talks about "regularization techniques to reduce variance" - with no mention of "overfitting" in those exact words - semantic search will still retrieve it because the meaning is similar.
Step 3: Context Construction
The retrieved chunks are assembled into a "context" block that will be passed to the LLM along with the user's question.
Step 4: Prompt Engineering
LangChain uses a PromptTemplate to structure the input to the LLM. A typical RAG prompt looks like:
You are a helpful assistant. Use the following context to answer the user's question.
If the answer is not in the context, say "I don't know."
Context:
{context}
Question: {question}
Answer:This technique - called "grounding" - dramatically reduces hallucinations because the model is instructed to rely on provided evidence rather than its parametric knowledge.
Step 5: LLM Generation
The filled prompt is sent to the LLM (OpenAI, Anthropic, Google, or a local model via Ollama). The model generates a response grounded in the retrieved context.
Step 6: Response Delivery
The response is returned to the user, optionally with source citations (LangChain can return the source Document objects alongside the answer).
Without memory, every message in a conversation is isolated. LangChain provides several memory strategies:
k messages. Good for token efficiency.LangChain wraps all LLM providers under a common interface. This means the rest of your code doesn't need to change when you switch providers.
# OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
# Anthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-opus-4-5")
# Local model via Ollama
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3")All three are drop-in replacements for each other in your pipeline.
A chain is a sequence of operations where the output of one step becomes the input of the next. This is the core composability primitive in LangChain.
Modern LangChain (v0.2+) uses the LangChain Expression Language (LCEL) with the pipe (|) operator:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_template("Summarize this text: {text}")
model = ChatOpenAI()
parser = StrOutputParser()
chain = prompt | model | parser
result = chain.invoke({"text": "LangChain is a framework for building LLM apps..."})Chains can be:
Combining a retriever with a chain gives you a full RAG pipeline:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(retriever, combine_docs_chain)
result = retrieval_chain.invoke({"input": "What is the bias-variance tradeoff?"})
print(result["answer"])Agents are the most powerful abstraction in LangChain. Rather than following a fixed pipeline, an agent uses the LLM as a reasoning engine to decide which actions to take, in what order, based on the current state and goal.
This is often implemented using the ReAct pattern (Reasoning + Acting):
Thought: I need to find the current price of a flight.
Action: search_flights(from="Delhi", to="Mumbai", date="2026-06-01")
Observation: Flight IndiGo 6E-204 costs ₹3,450. Flight Air India AI-677 costs ₹4,100.
Thought: IndiGo is cheaper. I should book it.
Action: book_flight(flight_id="6E-204", passenger="Rahul Kumar")
Observation: Booking confirmed. PNR: XYZ123.
Final Answer: I've booked the cheapest flight for you. Your PNR is XYZ123.LangChain supports many built-in tools:
LLMs return unstructured text. Output parsers help you extract structured data:
Example: Extract structured job data from a job posting:
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser
class JobPosting(BaseModel):
title: str = Field(description="Job title")
company: str = Field(description="Company name")
salary_range: str = Field(description="Salary range if mentioned")
required_skills: list[str] = Field(description="List of required skills")
parser = PydanticOutputParser(pydantic_object=JobPosting)One of LangChain's greatest strengths is the breadth of its integrations.
OpenAI, Anthropic, Google (Gemini), Mistral, Cohere, Hugging Face, Ollama (local), AWS Bedrock, Azure OpenAI, Groq
Pinecone, Weaviate, Chroma, FAISS, Qdrant, Milvus, pgvector, MongoDB Atlas, Elasticsearch, Redis
PDF, Word, Excel, PowerPoint, HTML, Markdown, CSV, JSON, Notion, Confluence, Google Drive, Slack, GitHub, YouTube transcripts, Arxiv, Wikipedia
Recursive Character, Token, Markdown, HTML, Code (Python, JavaScript, Go, etc.)
In-memory, Redis, MongoDB, SQLite, PostgreSQL, DynamoDB, Upstash
A single query might not capture all relevant angles of a question. Multi-query retrieval uses the LLM to generate multiple rephrased versions of the user's question, retrieves documents for each, and combines the results.
This significantly improves recall - you're more likely to find all relevant chunks.
Instead of embedding the query directly, you ask the LLM to generate a hypothetical document that would answer the question - then embed that document to search the vector store. Since the hypothetical document is in the same "language" as the actual documents, similarity search is often more effective.
This pattern uses the LLM to automatically construct metadata filters for the vector store based on the user's query.
For example: "What were the revenue figures from Q3 2024 reports?" → the LLM parses this and generates a filter like {"quarter": "Q3", "year": 2024, "document_type": "revenue_report"}, narrowing the search space dramatically.
Combines multiple retrieval strategies - for instance, a keyword-based BM25 search and a semantic vector search - and uses a weighted ensemble of their results. This hybrid approach often outperforms either method alone.
Companies are deploying internal AI assistants trained on their own documentation, policies, product manuals, and support histories. Employees can ask natural language questions and get accurate, sourced answers - without digging through wikis or Confluence pages.
First-tier customer support can be fully automated using a RAG system trained on product FAQs, troubleshooting guides, and past support tickets. Complex or edge cases are automatically escalated to human agents.
Law firms use RAG systems to query vast corpora of legal documents, case law, and contracts. Instead of spending hours manually searching, a lawyer can ask "What precedents exist for software patent disputes in the 9th Circuit?" and get a synthesized, sourced answer in seconds.
Online learning platforms embed AI tutors that can answer student questions in the context of their specific course content - not generic internet content. This keeps answers accurate and on-topic.
More advanced deployments use LangChain agents to automate complex workflows: reading emails, updating CRM records, scheduling meetings, generating reports, and triggering downstream processes - all from a natural language instruction.
| Framework | Strength | Best For |
|---|---|---|
| LangChain | Broadest ecosystem, most integrations, mature community | General-purpose LLM app development |
| LlamaIndex | Superior document indexing and query patterns | Heavily document-centric RAG applications |
| Haystack | Strong NLP pipeline heritage, good for search | Enterprise search and document QA |
| CrewAI | Multi-agent collaboration | Complex agentic workflows with multiple AI personas |
| AutoGen | Microsoft-backed, strong multi-agent orchestration | Research and complex reasoning pipelines |
LangChain vs. Alternatives
Debugging an LLM application is notoriously difficult. A chain might involve 10+ steps, and if the final answer is wrong, which step failed?
LangSmith is Anthropic's (and LangChain's) observability platform for LLM applications. It provides:
For any serious production deployment, LangSmith is an essential companion to LangChain.
pip install langchain langchain-openai langchain-community chromadbfrom langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
# 1. Load PDF
loader = PyPDFLoader("your_document.pdf")
docs = loader.load()
# 2. Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(docs)
# 3. Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
# 4. Build retrieval chain
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:
{context}
Question: {input}
""")
combine_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(vectorstore.as_retriever(), combine_chain)
# 5. Query
result = retrieval_chain.invoke({"input": "Explain the bias-variance tradeoff"})
print(result["answer"])That's a complete RAG system in under 30 lines of code.
We are at an inflection point. The gap between "I have access to a powerful LLM" and "I've shipped a production AI application" is still large - and LangChain is one of the most powerful bridges across that gap.
By providing standardized abstractions, a rich ecosystem of integrations, and battle-tested patterns for retrieval, memory, and agents, LangChain lets developers focus on building products rather than reinventing infrastructure.
The LLM application ecosystem is still maturing rapidly. Patterns, tools, and best practices are evolving month by month. But the core problems LangChain addresses - orchestration, retrieval, memory, and tool use - are fundamental to any serious AI application. Learning LangChain is not just learning a framework; it's learning the vocabulary and patterns of LLM-powered software development.
We are at the beginning of the LLM application era. The developers who understand how to build, orchestrate, and ship production-grade AI systems will define the next generation of software. LangChain is an excellent place to start.
Day 4 of the 250 Days DSA Challenge. LeetCode 11 Container With Most Water is a visual problem hiding a clean logical constraint. This post breaks down the full approach with a detailed dry run, pointer movement reasoning, and interview ready insights.
Rahul Kumar
LangChain's Model component is the core of every AI app. Learn how LLMs, Chat Models, and Embeddings actually work - from token generation to semantic search - with practical code and real-world examples.
Sign in to join the discussion.
Sameer Singh