Sameer Singh

Most LangChain tutorials make the same mistake: they throw you into building a project before you understand what you are actually building with. You copy code, run it, it works, and you still have no idea why. The moment something breaks, you are lost.
The better approach is to first understand the conceptual architecture of LangChain. Not just what each component is called, but what problem it solves, how it works internally, and how it connects to everything else.
This guide does exactly that. We will go deep on all six core components of LangChain: Models, Prompts, Chains, Indexes, Memory, and Agents. By the end, you will not just know the vocabulary - you will understand the reasoning behind every design decision, and building real projects will feel natural rather than confusing.
LangChain is an open-source framework for building applications powered by Large Language Models (LLMs). It was created to solve a specific problem: LLMs are powerful in isolation, but building a complete application around them involves many moving parts that are painful to wire together manually.
In the previous lesson, we covered:
Now we go deeper. Let us break down each of LangChain's six core components, understand what they do, and see how they work together.
LangChain is built around six key abstractions:
Understanding these six components gives you a mental model for the entire framework. Every feature, integration, and advanced pattern in LangChain builds on top of these foundations.
In LangChain, a "model" does not refer to the neural network weights themselves. It refers to the interface your application uses to communicate with an AI model - whether hosted remotely via an API or running locally on your machine.
This distinction matters because LangChain wraps every model behind a consistent interface, regardless of the provider or model type.
Before modern LLMs, building a system that could understand natural language and generate meaningful responses required enormous effort. Two core challenges existed:
Natural Language Understanding (NLU): Teaching a computer to understand what a human actually means - not just the words, but the intent, context, and nuance.
Natural Language Generation (NLG): Producing a response that is coherent, contextually appropriate, and actually useful.
These were separate, hard research problems. Then came the Transformer architecture (2017), followed by BERT, GPT, and eventually the modern LLM era. These models solved both problems simultaneously, at a quality level that made production deployment viable.
Modern LLMs are enormous. GPT-4, for example, has hundreds of billions of parameters and requires massive GPU clusters to run. It is completely impractical for most developers to host these models themselves.
The solution is the API model:
This abstraction means you never need to think about model hosting, GPU provisioning, or inference optimization. You just call an API.
Here is where things get complicated without LangChain.
Every AI provider has a different API format:
OpenAI:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
text = response.choices[0].message.contentAnthropic:
response = anthropic.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
text = response.content[0].textGoogle:
response = model.generate_content("Hello")
text = response.textThese are three completely different interfaces. If you build your application around OpenAI's API and later want to switch to Anthropic (for cost, performance, or compliance reasons), you need to rewrite significant portions of your codebase.
LangChain solves this by wrapping all providers behind a single, unified interface:
# OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
# Anthropic
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-opus-4-5")
# Local model
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3")
# All three are called identically
response = llm.invoke("Explain neural networks simply.")Switching providers is now a one-line change.
1. Language Models (Chat Models)
Input: text (or a list of messages) Output: text (or a message object)
These are what power chatbots, assistants, code generators, summarizers, and agents. Any task where the goal is to generate or transform text.
Examples: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3
2. Embedding Models
Input: text Output: a vector (a list of floating-point numbers)
Embedding models convert text into a mathematical representation that captures semantic meaning. Two sentences with similar meaning will have embeddings that are close together in vector space, even if the exact words are different.
Examples: text-embedding-3-large (OpenAI), embed-english-v3.0 (Cohere), all-MiniLM-L6-v2 (HuggingFace)
These are the foundation of semantic search, RAG systems, and document retrieval. Without embeddings, you cannot do meaningful similarity search.
A prompt is the input you send to an LLM. It seems simple, but prompts are arguably the most important factor in determining the quality of your AI application's output.
The same model, given two different prompts, can produce dramatically different results. Prompt engineering - the skill of crafting effective prompts - is a genuine discipline.
Consider these two prompts asking about the same topic:
Prompt A: "Explain linear regression."
Prompt B: "You are a patient tutor explaining to a first-year computer science student who has strong math skills but no machine learning experience. Explain linear regression, starting with the intuition, then the math, then a real-world example. Use analogies where helpful."
The outputs will be completely different in depth, tone, structure, and usefulness. The model did not change. Only the prompt did.
LangChain provides a rich set of tools for managing, structuring, and reusing prompts.
PromptTemplate - Dynamic Prompts
Instead of hardcoding prompts, you create templates with variables that get filled at runtime:
from langchain_core.prompts import PromptTemplate
template = PromptTemplate.from_template(
"Explain {topic} to a {audience} in {tone} tone. Keep it under {word_limit} words."
)
prompt = template.invoke({
"topic": "gradient descent",
"audience": "high school student",
"tone": "conversational",
"word_limit": "200"
})This makes your prompts reusable, testable, and easy to modify without touching your application logic.
ChatPromptTemplate - Role-Based Prompts
Modern LLMs are instruction-tuned and respond well to role assignment. A role gives the model a persona, an area of expertise, and a behavioral framework.
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are an experienced cardiologist. Explain medical concepts clearly but accurately. Always recommend consulting a physician for personal medical decisions."),
("human", "{question}")
])The system message sets the model's role and behavior for the entire conversation. The human message is the actual user input.
You can use many different roles:
FewShotPromptTemplate - Teaching by Example
Few-shot prompting is one of the most powerful techniques in prompt engineering. Instead of telling the model what to do, you show it examples of correct behavior.
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
examples = [
{"input": "Charged twice for the same order", "output": "Billing Issue"},
{"input": "App crashes when I open it", "output": "Technical Issue"},
{"input": "Wrong item delivered", "output": "Fulfillment Issue"},
{"input": "Cannot reset my password", "output": "Account Issue"},
]
example_prompt = PromptTemplate.from_template("Input: {input}\nCategory: {output}")
few_shot_prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=example_prompt,
prefix="Classify the following customer complaint into the correct category:",
suffix="Input: {complaint}\nCategory:",
input_variables=["complaint"]
)The model learns the classification pattern from the examples and applies it to new inputs - often without any additional fine-tuning.
MessagesPlaceholder - Dynamic Conversation History
When building chatbots, you need to inject the conversation history into each prompt so the model maintains context:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}")
])The MessagesPlaceholder gets replaced at runtime with the actual message history, giving the model the full conversational context it needs.
A chain is a sequence of operations where each step takes the output of the previous step as its input. Chains are the core composability primitive in LangChain - they are how you build complex, multi-step workflows from simple, reusable components.
LangChain is literally named after this concept.
Without chains, you have to manually manage the data flow between every component:
# Without chains - manual, messy, hard to maintain
text = load_document("report.pdf")
chunks = split_text(text)
embeddings = generate_embeddings(chunks)
store_in_vectordb(embeddings)
query_embedding = generate_embedding(user_query)
relevant_chunks = search_vectordb(query_embedding)
prompt = build_prompt(user_query, relevant_chunks)
response = call_llm(prompt)
parsed = parse_response(response)Each step is tightly coupled to the next. Swapping out any component requires rewriting the surrounding code. Testing individual steps is awkward.
With chains, you declare the pipeline once and LangChain handles the data flow:
# With LangChain Expression Language (LCEL)
chain = retriever | prompt | llm | output_parser
result = chain.invoke({"input": user_query})Clean, readable, and every component is independently swappable.
Modern LangChain uses LCEL with the pipe (|) operator to compose chains. This is inspired by Unix pipe syntax and makes complex pipelines readable at a glance.
Basic chain:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_template("Summarize this in 3 bullet points: {text}")
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()
chain = prompt | model | parser
result = chain.invoke({"text": "LangChain is a framework for building LLM applications..."})Streaming support built in:
for chunk in chain.stream({"text": long_article}):
print(chunk, end="", flush=True)Async support built in:
result = await chain.ainvoke({"text": article})Sequential Chains
The most common pattern: each step processes the output of the previous step in a linear sequence.
Example: English text input -> translate to Hindi -> summarize the Hindi text -> return summary
translate_chain = translate_prompt | llm | StrOutputParser()
summarize_chain = summarize_prompt | llm | StrOutputParser()
full_chain = translate_chain | summarize_chainParallel Chains (RunnableParallel)
Multiple branches process the same input simultaneously and their results are combined.
from langchain_core.runnables import RunnableParallel
analysis_chain = RunnableParallel({
"summary": summarize_chain,
"sentiment": sentiment_chain,
"key_topics": topics_chain
})
result = analysis_chain.invoke({"article": news_article})
# result = {"summary": "...", "sentiment": "positive", "key_topics": [...]}This is significantly faster than running the three chains sequentially.
Conditional Chains (RunnableBranch)
Route to different sub-chains based on the content of the input:
from langchain_core.runnables import RunnableBranch
chain = RunnableBranch(
(lambda x: "billing" in x["topic"].lower(), billing_support_chain),
(lambda x: "technical" in x["topic"].lower(), tech_support_chain),
general_support_chain # default fallback
)This pattern is extremely useful for building routers, classifiers, and conditional workflows.
LLMs are trained on a snapshot of the world up to a certain date. They have no knowledge of:
Ask an LLM "What is our company's parental leave policy?" and it simply cannot answer - not because it is not smart enough, but because it was never trained on that information.
This is the knowledge gap that Indexes are designed to bridge.
An Index is a data structure that organizes your external documents in a way that allows an LLM application to efficiently retrieve the most relevant pieces at query time. It is the infrastructure layer that makes Retrieval-Augmented Generation (RAG) possible.
Building an Index involves four components working together:
Document loaders are responsible for ingesting data from external sources and converting it into a standard Document object that LangChain can work with.
LangChain supports an extraordinary range of loaders:
File-based loaders:
PyPDFLoader - PDFsDocx2txtLoader - Microsoft WordCSVLoader - CSV spreadsheetsUnstructuredExcelLoader - Excel filesTextLoader - plain text filesWeb-based loaders:
WebBaseLoader - any public web pageYoutubeLoader - YouTube video transcriptsArxivLoader - academic papers from arXivPlatform loaders:
NotionDBLoader - Notion databasesConfluenceLoader - Atlassian ConfluenceSlackDirectoryLoader - Slack message historyGitLoader - Git repository filesGoogleDriveLoader - Google Drive documentsEvery loader produces the same output format - a list of Document objects with page_content (the text) and metadata (source, page number, author, etc.). This standardization means the rest of your pipeline does not need to know or care where the data came from.
Raw documents are almost never the right size to use directly. A 200-page PDF cannot be embedded and stored as a single chunk. You need to break it into smaller, semantically meaningful pieces.
But how you split matters enormously. Naive splitting (every 1000 characters) can slice sentences mid-thought, destroying the meaning of a passage. Smart splitting preserves semantic boundaries.
RecursiveCharacterTextSplitter (recommended default):
Attempts to split on semantic boundaries in order of preference: paragraph breaks (\n\n), then line breaks (\n), then sentences (.), then words (), then characters. It only falls back to smaller splits when necessary.
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # target chunk size in characters
chunk_overlap=200, # overlap between chunks to avoid context loss
separators=["\n\n", "\n", ".", " ", ""]
)Why chunk_overlap matters:
Imagine an answer spans two adjacent chunks. Without overlap, no single chunk contains the complete answer, and retrieval will fail. With an overlap of 200 characters, adjacent chunks share context, making it much more likely that relevant information is fully contained within a single retrieved chunk.
Specialized splitters:
MarkdownHeaderTextSplitter - respects Markdown heading structureHTMLHeaderTextSplitter - respects HTML heading hierarchyTokenTextSplitter - splits by token count (critical when you need exact token control)Language splitter - language-aware splitting for code (Python, JS, Go, etc.)Once your documents are split into chunks, each chunk is converted into an embedding vector and stored in a vector database.
How similarity search works:
text-embedding-ada-002)The key insight is that semantic similarity in meaning corresponds to geometric proximity in vector space. "What causes inflation?" and "factors that increase the price level" will have similar embeddings even though they share almost no words.
Popular vector databases:
| Database | Best For | Hosting |
|---|---|---|
| Chroma | Development, prototyping | Local / self-hosted |
| FAISS | Fast in-memory search | Local |
| Pinecone | Production at scale | Managed cloud |
| Weaviate | Hybrid search (semantic + keyword) | Self-hosted or cloud |
| Qdrant | High performance, filtering | Self-hosted or cloud |
| pgvector | Already using PostgreSQL | Self-hosted |
A retriever is the interface that ties everything together: given a query, return the most relevant documents.
The simplest retriever does a direct similarity search:
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4} # return top 4 most similar chunks
)
docs = retriever.invoke("What are the assumptions of linear regression?")More advanced retrieval strategies include:
MMR (Maximal Marginal Relevance): Balances relevance with diversity. Instead of returning the 4 most similar chunks (which might all be near-identical), it returns 4 chunks that are collectively diverse while still being relevant.
Multi-query retrieval: Uses the LLM to generate 3-5 rephrased versions of the user's question, retrieves results for each, and combines them. This improves recall significantly.
Self-query retrieval: Uses the LLM to extract metadata filters from the query automatically. "Show me documents about neural networks from 2024" becomes a filtered search: {"year": 2024, "topic": "neural networks"}.
LLM APIs are completely stateless. Every call is independent. There is no concept of a session, a user, or a conversation history at the API level.
This creates a jarring experience for users:
User: What are the main algorithms in supervised learning?
AI: The main supervised learning algorithms include Linear Regression, Logistic Regression,Decision Trees, Random Forests, SVM, and Neural Networks.
User: Which of those is best for classification?
AI: I'd be happy to help! Could you clarify what topic you're asking about?The model has completely forgotten the previous message. Without memory management, building a coherent conversational experience is impossible.
LangChain provides memory components that automatically track conversation history and inject it into each new prompt. From the user's perspective, the AI "remembers" the conversation. Under the hood, LangChain is including the history as part of every request.
ConversationBufferMemory
The simplest approach: store every message in the conversation and include the complete history in every prompt.
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(return_messages=True)
memory.save_context(
{"input": "What are supervised learning algorithms?"},
{"output": "They include Linear Regression, Decision Trees, SVM..."}
)
# On next message, full history is included
history = memory.load_memory_variables({})Pros: Perfect memory, no information loss Cons: Token usage grows linearly with conversation length. For long conversations, this becomes expensive and can exceed the context window.
ConversationBufferWindowMemory
Keep only the last k message pairs. Older messages are discarded.
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=5, return_messages=True)Pros: Predictable token usage Cons: Loses older context. If the user mentioned something important 10 messages ago, the model will not remember it.
ConversationSummaryMemory
Instead of storing raw messages, the LLM is used to progressively summarize the conversation. Only the summary is stored.
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm, return_messages=True)Pros: Handles arbitrarily long conversations, retains the gist of everything discussed Cons: Summarization introduces latency and additional LLM calls. Fine-grained details may be lost in summarization.
ConversationSummaryBufferMemory
A hybrid: keep recent messages verbatim (for precise short-term context), summarize older messages (for long-term context). This is often the best practical choice for production chatbots.
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=500, # keep recent messages up to this token count verbatim
return_messages=True
)VectorStoreRetrieverMemory
Store conversation turns as embeddings. When a new message arrives, retrieve only the most semantically relevant past exchanges (rather than the most recent). This is powerful for very long-running conversations where the user might return to a topic discussed much earlier.
By default, all of the above memory types are in-memory and lost when the application restarts. For production systems, you need to persist memory to a database.
LangChain supports memory backends including Redis, MongoDB, DynamoDB, PostgreSQL, and SQLite.
Chains are powerful, but they follow a predetermined path. The sequence of steps is decided at development time and does not change based on the input.
Agents are fundamentally different. An agent uses the LLM itself as a reasoning engine to dynamically decide what to do next. It can choose which tools to use, in what order, and how many times - all based on the current state of the task.
A chain says: "Always do step 1, then step 2, then step 3."
An agent says: "Here is my goal and here are my available tools. Let me think about what I need to do to achieve the goal."
An agent can:
The most common agent architecture in LangChain is ReAct (Reasoning + Acting). The agent alternates between:
Thought: What do I need to do? What do I know? What do I still need to find out?
Action: Call a specific tool with specific inputs.
Observation: What did the tool return?
...repeat until the goal is achieved...
Final Answer: Provide the result to the user.
Example - "What is 3 times today's temperature in Delhi?"
Thought: The user wants a calculation that depends on today's temperature in Delhi. I need to look up the current temperature first.
Action: weather_search(location="Delhi")
Observation: Current temperature in Delhi: 38°C
Thought: Now I have the temperature (38°C). I need to multiply it by 3. I should use the calculator for accuracy.
Action: calculator(expression="38 * 3")
Observation: 114
Thought: I have all the information I need to answer.
Final Answer: 3 times today's temperature in Delhi (38°C) is 114°C.The LLM is not just answering a question - it is planning, using tools, evaluating results, and adapting its approach.
A tool is any Python function that the agent can call. LangChain has many built-in tools and makes it easy to create custom ones.
Built-in tools:
TavilySearchResults - web searchWikipediaQueryRun - Wikipedia lookupPythonREPLTool - execute Python codeCalculator - mathematical computationSQLDatabaseToolkit - query SQL databasesRequestsGetTool - make HTTP requestsCreating custom tools:
from langchain_core.tools import tool
@tool
def get_product_inventory(product_id: str) -> str:
"""Look up the current inventory level for a product. Use this when the user asks about product availability."""
# Your database query here
inventory = db.query(f"SELECT quantity FROM inventory WHERE id = '{product_id}'")
return f"Product {product_id} has {inventory} units in stock."The docstring is critical - it is what the LLM reads to understand when and how to use the tool.
OpenAI Functions/Tools Agent: Uses OpenAI's function calling capability for reliable, structured tool selection. The recommended choice when using OpenAI models.
ReAct Agent: The classic reasoning + acting loop described above. Works with any LLM.
Structured Chat Agent: Designed for multi-input tools. Forces structured output format.
Conversational Agent: Optimized for multi-turn conversations, with memory built in.
For the most complex applications, you can build systems where multiple specialized agents collaborate:
This pattern, popularized by frameworks like CrewAI and AutoGen, allows you to tackle problems that would be too complex for a single agent.
Let us trace a realistic end-to-end flow for a corporate knowledge assistant:
Setup (done once when the system is deployed):
At runtime (every user query):
company_docs_search toolEach of the six components plays a specific, essential role. Remove any one of them and the system degrades significantly.
| Component | Use When You Need To... |
|---|---|
| Models | Communicate with an LLM or generate embeddings |
| Prompts | Structure, templatize, or reuse instructions to the LLM |
| Chains | Connect multiple steps into a repeatable, structured pipeline |
| Indexes | Give your app access to external or private knowledge |
| Memory | Maintain context across multiple turns in a conversation |
| Agents | Let the LLM reason about and dynamically choose its own actions |
LangChain's six core components are not arbitrary - each one was designed to solve a real, painful problem that every developer hits when building LLM applications from scratch.
Understanding these components at a deep level is not just useful for using LangChain - it gives you a mental model for thinking about any LLM application, regardless of the framework.
Once the concepts are clear, the code almost writes itself.
Day 4 of the 250 Days DSA Challenge. LeetCode 11 Container With Most Water is a visual problem hiding a clean logical constraint. This post breaks down the full approach with a detailed dry run, pointer movement reasoning, and interview ready insights.
Rahul Kumar
LangChain's Model component is the core of every AI app. Learn how LLMs, Chat Models, and Embeddings actually work - from token generation to semantic search - with practical code and real-world examples.
Sign in to join the discussion.
Sameer Singh