Migrating from LangChain to LlamaIndex
How to migrate your RAG pipeline from LangChain to LlamaIndex, with side-by-side code examples for document loading, indexing, querying, and agents.

TL;DR
- LlamaIndex achieves 30ms p99 latency vs LangChain's 45ms at 1,000 concurrent requests for RAG workloads
- Both frameworks are open source and free - the real cost is in LLM APIs and vector stores you plug in
- Migrate retrieval components first, keep agent orchestration in LangChain initially if needed
- Medium difficulty, expect 1-2 weeks for a typical RAG application
Why Switch Your RAG Framework?
LangChain and LlamaIndex solve different problems. LangChain is an orchestration framework - it chains LLM calls, tool use, and workflows together. LlamaIndex is a data framework - it's built specifically for indexing, retrieving, and querying documents. If your primary workload is RAG, LlamaIndex does that job with less boilerplate and better defaults.
The practical difference shows up in code volume. A basic RAG pipeline in LangChain requires a document loader, a text splitter, an embedding model, a vector store, a retriever, a prompt template, and a chain connecting them all. In LlamaIndex, you load documents, create an index, and query it. Three steps.
That doesn't mean LlamaIndex replaces LangChain everywhere. For agent workflows with complex tool calling, multi-step reasoning, and human-in-the-loop patterns, LangChain (and LangGraph) is still the stronger choice. Many production stacks now use both: LlamaIndex for data ingestion and retrieval, LangChain for orchestration. This guide focuses on migrating the retrieval layer.
Feature Parity Table
| Feature | LangChain | LlamaIndex | Notes |
|---|---|---|---|
| Document loading | 160+ loader integrations | 200+ data connectors | LlamaIndex has more connectors via LlamaHub |
| Text splitting | RecursiveCharacterTextSplitter | SentenceSplitter, NodeParser | LlamaIndex splits at semantic boundaries |
| Vector indexing | Via Chroma, FAISS, Pinecone, etc. | VectorStoreIndex (built-in) | LlamaIndex wraps stores into an index abstraction |
| Retrieval | Retriever interface | QueryEngine + Retriever | LlamaIndex bundles retrieval and synthesis |
| RAG chain | LCEL pipe operator | QueryEngine.query() | Single call in LlamaIndex vs multi-step chain |
| Agents | LangGraph (state machines) | ReActAgent, FunctionAgent | LangGraph is more flexible for complex workflows |
| Memory | ConversationBufferMemory | ChatMemoryBuffer | Both support conversation history |
| Streaming | LCEL.stream() | QueryEngine streaming | Both support token streaming |
| Observability | LangSmith (paid service) | LlamaTrace | Both have tracing platforms |
| Structured output | Output parsers + JSON mode | Pydantic program + output parsers | Similar capabilities |
| Hybrid search | Via retrievers | Built-in fusion retriever | LlamaIndex makes this easier |
| Multi-modal | Via model integrations | Native multi-modal index | LlamaIndex has tighter integration |
Code Examples - Side by Side
Document Loading and Indexing
This is where the biggest difference in developer experience shows up.
LangChain:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
# Load documents
loader = PyPDFLoader("report.pdf")
documents = loader.load()
# Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = splitter.split_documents(documents)
# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
LlamaIndex:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load and index in two lines
documents = SimpleDirectoryReader(input_files=["report.pdf"]).load_data()
index = VectorStoreIndex.from_documents(documents)
LlamaIndex handles text splitting, embedding, and vector storage internally. The defaults use a SentenceSplitter with 1024-token chunks, OpenAI embeddings, and an in-memory vector store. You can override any of these, but the defaults work for most use cases.
Querying
LangChain:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template(
"Answer based on context:\n{context}\n\nQuestion: {question}"
)
def format_docs(docs):
return "\n".join(doc.page_content for doc in docs)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
response = chain.invoke("What were the key findings?")
LlamaIndex:
query_engine = index.as_query_engine()
response = query_engine.query("What were the key findings?")
print(response)
LlamaIndex's query_engine handles retrieval, context formatting, and LLM synthesis in a single call. The LangChain version gives you more control over each step, which matters when you need custom retrieval logic or prompt templates. For standard RAG though, the LlamaIndex version is clearly less code.
Streaming Responses
LangChain:
for chunk in chain.stream("Summarize the main conclusions"):
print(chunk, end="")
LlamaIndex:
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("Summarize the main conclusions")
response.print_response_stream()
Both frameworks support streaming natively. The LlamaIndex version just requires passing streaming=True when creating the query engine.
Agent with Tools
This is where LangChain still has an advantage. If you're building agents that call external tools, LangChain's ecosystem is deeper.
LangChain:
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
@tool
def search_docs(query: str) -> str:
"""Search internal documents."""
docs = retriever.invoke(query)
return "\n".join(d.page_content for d in docs)
agent = create_tool_calling_agent(llm, [search_docs], prompt)
executor = AgentExecutor(agent=agent, tools=[search_docs])
result = executor.invoke({"input": "Find revenue figures"})
LlamaIndex:
from llama_index.core.agent import FunctionCallingAgent
from llama_index.core.tools import QueryEngineTool
search_tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="doc_search",
description="Search internal documents"
)
agent = FunctionCallingAgent.from_tools(
[search_tool],
llm=llm,
verbose=True
)
response = agent.chat("Find revenue figures")
Both approaches work. LlamaIndex's QueryEngineTool wraps any query engine into a tool automatically, which is convenient when your agent's primary action is searching indexed data. For agents that need to interact with APIs, databases, or external services, LangChain's tooling ecosystem has more pre-built integrations.
Mapping LangChain Concepts to LlamaIndex
| LangChain Concept | LlamaIndex Equivalent | Notes |
|---|---|---|
DocumentLoader | SimpleDirectoryReader / LlamaHub readers | LlamaHub has 200+ connectors |
RecursiveCharacterTextSplitter | SentenceSplitter / NodeParser | LlamaIndex splits at sentence boundaries |
VectorStore | VectorStoreIndex | Wraps stores (Chroma, Pinecone, pgvector) |
Retriever | index.as_retriever() | Returns nodes instead of documents |
Chain (LCEL) | QueryEngine | Combines retrieval + synthesis |
ConversationBufferMemory | ChatMemoryBuffer | Both track conversation history |
AgentExecutor | FunctionCallingAgent / ReActAgent | Different agent patterns |
Tool decorator | FunctionTool / QueryEngineTool | Both wrap callables as agent tools |
| LangSmith | LlamaTrace / Arize Phoenix | Observability platforms |
Pricing Impact
Both frameworks are open source under MIT-compatible licenses. There's no licensing cost for either one.
The real costs come from the services they connect to:
| Cost Component | LangChain | LlamaIndex |
|---|---|---|
| Framework license | Free (MIT) | Free (MIT) |
| LLM API calls | Same | Same |
| Embedding API calls | Same | Same |
| Vector store | Same | Same |
| Managed platform | LangSmith: free tier, then $39-$400+/mo | LlamaCloud: $0.001/credit, Pro from $500/mo |
| Enterprise support | Custom pricing | Custom pricing |
If you're using LangSmith for tracing and evaluation, switching to LlamaIndex means adopting LlamaTrace or a third-party tool like Arize Phoenix (also open source). The observability platform is the only area where migration creates a cost difference.
Known Gotchas
Node vs Document abstraction. LangChain uses
Documentobjects withpage_contentandmetadata. LlamaIndex usesNodeobjects withtext,metadata, and relationships between nodes. If your code referencesdoc.page_content, you'll need to change it tonode.text.Default chunking is different. LangChain's
RecursiveCharacterTextSplitterdefaults to 1000-character chunks. LlamaIndex'sSentenceSplitterdefaults to 1024-token chunks with sentence-aware boundaries. Your retrieval quality may change even with the same data - test recall on your evaluation set.Query engines aren't chains. LangChain's LCEL lets you compose arbitrary pipelines with the pipe operator. LlamaIndex's query engines are more opinionated - they handle retrieval and synthesis together. For custom post-processing, you'll use LlamaIndex's
TransformationsorNodePostprocessorsinstead of pipe steps.Agent patterns differ. LangChain's LangGraph uses state machines for complex agent flows. LlamaIndex offers simpler
FunctionCallingAgentandReActAgentpatterns. If your agent has branching logic, conditional tool calls, or human-in-the-loop steps, LangGraph may still be the better fit.Embedding model defaults matter. LlamaIndex defaults to OpenAI's
text-embedding-ada-002unless you specify otherwise. If your LangChain setup uses a different embedding model, set it explicitly in LlamaIndex or your vectors won't be compatible with existing data.Async support varies. LangChain has broad async support through LCEL's
.ainvoke()method. LlamaIndex supports async querying but some older integrations may not have full async implementations.Fewer pre-built chains. LangChain has chains for summarization, Q&A, conversational retrieval, and more. LlamaIndex provides similar capabilities through different abstractions (ResponseSynthesizer modes like
tree_summarize,compact), but the names and APIs are different.
FAQ
Can I use LlamaIndex and LangChain together?
Yes, and many teams do. Use LlamaIndex for data indexing and retrieval, LangChain for orchestration and agent logic. LlamaIndex provides a LangChainToolAdapter for this pattern.
Will my existing vector store work with LlamaIndex?
Most likely. LlamaIndex supports Chroma, Pinecone, pgvector, Weaviate, Qdrant, FAISS, and 30+ other vector stores. You can point LlamaIndex at your existing store without re-indexing.
Is LlamaIndex better for production RAG?
For pure retrieval workloads, LlamaIndex's defaults produce strong results with less code. For complex pipelines with multiple tools and conditional logic, LangChain with LangGraph offers more flexibility.
Do I need to re-embed all my documents?
Only if you change the embedding model. If you keep the same model and vector store, LlamaIndex can query existing vectors without re-indexing.
How does the learning curve compare?
LlamaIndex has a gentler learning curve for RAG applications. LangChain's LCEL and LangGraph have steeper learning curves but offer more composability for non-RAG use cases.
Sources:
✓ Last verified March 11, 2026
