Migrating from LangChain to LlamaIndex

How to migrate your RAG pipeline from LangChain to LlamaIndex, with side-by-side code examples for document loading, indexing, querying, and agents.

From: LangChain To: LlamaIndex Difficulty: Medium
Migrating from LangChain to LlamaIndex

TL;DR

  • LlamaIndex achieves 30ms p99 latency vs LangChain's 45ms at 1,000 concurrent requests for RAG workloads
  • Both frameworks are open source and free - the real cost is in LLM APIs and vector stores you plug in
  • Migrate retrieval components first, keep agent orchestration in LangChain initially if needed
  • Medium difficulty, expect 1-2 weeks for a typical RAG application

Why Switch Your RAG Framework?

LangChain and LlamaIndex solve different problems. LangChain is an orchestration framework - it chains LLM calls, tool use, and workflows together. LlamaIndex is a data framework - it's built specifically for indexing, retrieving, and querying documents. If your primary workload is RAG, LlamaIndex does that job with less boilerplate and better defaults.

The practical difference shows up in code volume. A basic RAG pipeline in LangChain requires a document loader, a text splitter, an embedding model, a vector store, a retriever, a prompt template, and a chain connecting them all. In LlamaIndex, you load documents, create an index, and query it. Three steps.

That doesn't mean LlamaIndex replaces LangChain everywhere. For agent workflows with complex tool calling, multi-step reasoning, and human-in-the-loop patterns, LangChain (and LangGraph) is still the stronger choice. Many production stacks now use both: LlamaIndex for data ingestion and retrieval, LangChain for orchestration. This guide focuses on migrating the retrieval layer.

Feature Parity Table

FeatureLangChainLlamaIndexNotes
Document loading160+ loader integrations200+ data connectorsLlamaIndex has more connectors via LlamaHub
Text splittingRecursiveCharacterTextSplitterSentenceSplitter, NodeParserLlamaIndex splits at semantic boundaries
Vector indexingVia Chroma, FAISS, Pinecone, etc.VectorStoreIndex (built-in)LlamaIndex wraps stores into an index abstraction
RetrievalRetriever interfaceQueryEngine + RetrieverLlamaIndex bundles retrieval and synthesis
RAG chainLCEL pipe operatorQueryEngine.query()Single call in LlamaIndex vs multi-step chain
AgentsLangGraph (state machines)ReActAgent, FunctionAgentLangGraph is more flexible for complex workflows
MemoryConversationBufferMemoryChatMemoryBufferBoth support conversation history
StreamingLCEL.stream()QueryEngine streamingBoth support token streaming
ObservabilityLangSmith (paid service)LlamaTraceBoth have tracing platforms
Structured outputOutput parsers + JSON modePydantic program + output parsersSimilar capabilities
Hybrid searchVia retrieversBuilt-in fusion retrieverLlamaIndex makes this easier
Multi-modalVia model integrationsNative multi-modal indexLlamaIndex has tighter integration

Code Examples - Side by Side

Document Loading and Indexing

This is where the biggest difference in developer experience shows up.

LangChain:

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Load documents
loader = PyPDFLoader("report.pdf")
documents = loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

LlamaIndex:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load and index in two lines
documents = SimpleDirectoryReader(input_files=["report.pdf"]).load_data()
index = VectorStoreIndex.from_documents(documents)

LlamaIndex handles text splitting, embedding, and vector storage internally. The defaults use a SentenceSplitter with 1024-token chunks, OpenAI embeddings, and an in-memory vector store. You can override any of these, but the defaults work for most use cases.

Querying

LangChain:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o")

prompt = ChatPromptTemplate.from_template(
    "Answer based on context:\n{context}\n\nQuestion: {question}"
)

def format_docs(docs):
    return "\n".join(doc.page_content for doc in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

response = chain.invoke("What were the key findings?")

LlamaIndex:

query_engine = index.as_query_engine()
response = query_engine.query("What were the key findings?")
print(response)

LlamaIndex's query_engine handles retrieval, context formatting, and LLM synthesis in a single call. The LangChain version gives you more control over each step, which matters when you need custom retrieval logic or prompt templates. For standard RAG though, the LlamaIndex version is clearly less code.

Streaming Responses

LangChain:

for chunk in chain.stream("Summarize the main conclusions"):
    print(chunk, end="")

LlamaIndex:

query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("Summarize the main conclusions")
response.print_response_stream()

Both frameworks support streaming natively. The LlamaIndex version just requires passing streaming=True when creating the query engine.

Agent with Tools

This is where LangChain still has an advantage. If you're building agents that call external tools, LangChain's ecosystem is deeper.

LangChain:

from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool

@tool
def search_docs(query: str) -> str:
    """Search internal documents."""
    docs = retriever.invoke(query)
    return "\n".join(d.page_content for d in docs)

agent = create_tool_calling_agent(llm, [search_docs], prompt)
executor = AgentExecutor(agent=agent, tools=[search_docs])
result = executor.invoke({"input": "Find revenue figures"})

LlamaIndex:

from llama_index.core.agent import FunctionCallingAgent
from llama_index.core.tools import QueryEngineTool

search_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="doc_search",
    description="Search internal documents"
)

agent = FunctionCallingAgent.from_tools(
    [search_tool],
    llm=llm,
    verbose=True
)
response = agent.chat("Find revenue figures")

Both approaches work. LlamaIndex's QueryEngineTool wraps any query engine into a tool automatically, which is convenient when your agent's primary action is searching indexed data. For agents that need to interact with APIs, databases, or external services, LangChain's tooling ecosystem has more pre-built integrations.

Mapping LangChain Concepts to LlamaIndex

LangChain ConceptLlamaIndex EquivalentNotes
DocumentLoaderSimpleDirectoryReader / LlamaHub readersLlamaHub has 200+ connectors
RecursiveCharacterTextSplitterSentenceSplitter / NodeParserLlamaIndex splits at sentence boundaries
VectorStoreVectorStoreIndexWraps stores (Chroma, Pinecone, pgvector)
Retrieverindex.as_retriever()Returns nodes instead of documents
Chain (LCEL)QueryEngineCombines retrieval + synthesis
ConversationBufferMemoryChatMemoryBufferBoth track conversation history
AgentExecutorFunctionCallingAgent / ReActAgentDifferent agent patterns
Tool decoratorFunctionTool / QueryEngineToolBoth wrap callables as agent tools
LangSmithLlamaTrace / Arize PhoenixObservability platforms

Pricing Impact

Both frameworks are open source under MIT-compatible licenses. There's no licensing cost for either one.

The real costs come from the services they connect to:

Cost ComponentLangChainLlamaIndex
Framework licenseFree (MIT)Free (MIT)
LLM API callsSameSame
Embedding API callsSameSame
Vector storeSameSame
Managed platformLangSmith: free tier, then $39-$400+/moLlamaCloud: $0.001/credit, Pro from $500/mo
Enterprise supportCustom pricingCustom pricing

If you're using LangSmith for tracing and evaluation, switching to LlamaIndex means adopting LlamaTrace or a third-party tool like Arize Phoenix (also open source). The observability platform is the only area where migration creates a cost difference.

Known Gotchas

  1. Node vs Document abstraction. LangChain uses Document objects with page_content and metadata. LlamaIndex uses Node objects with text, metadata, and relationships between nodes. If your code references doc.page_content, you'll need to change it to node.text.

  2. Default chunking is different. LangChain's RecursiveCharacterTextSplitter defaults to 1000-character chunks. LlamaIndex's SentenceSplitter defaults to 1024-token chunks with sentence-aware boundaries. Your retrieval quality may change even with the same data - test recall on your evaluation set.

  3. Query engines aren't chains. LangChain's LCEL lets you compose arbitrary pipelines with the pipe operator. LlamaIndex's query engines are more opinionated - they handle retrieval and synthesis together. For custom post-processing, you'll use LlamaIndex's Transformations or NodePostprocessors instead of pipe steps.

  4. Agent patterns differ. LangChain's LangGraph uses state machines for complex agent flows. LlamaIndex offers simpler FunctionCallingAgent and ReActAgent patterns. If your agent has branching logic, conditional tool calls, or human-in-the-loop steps, LangGraph may still be the better fit.

  5. Embedding model defaults matter. LlamaIndex defaults to OpenAI's text-embedding-ada-002 unless you specify otherwise. If your LangChain setup uses a different embedding model, set it explicitly in LlamaIndex or your vectors won't be compatible with existing data.

  6. Async support varies. LangChain has broad async support through LCEL's .ainvoke() method. LlamaIndex supports async querying but some older integrations may not have full async implementations.

  7. Fewer pre-built chains. LangChain has chains for summarization, Q&A, conversational retrieval, and more. LlamaIndex provides similar capabilities through different abstractions (ResponseSynthesizer modes like tree_summarize, compact), but the names and APIs are different.

FAQ

Can I use LlamaIndex and LangChain together?

Yes, and many teams do. Use LlamaIndex for data indexing and retrieval, LangChain for orchestration and agent logic. LlamaIndex provides a LangChainToolAdapter for this pattern.

Will my existing vector store work with LlamaIndex?

Most likely. LlamaIndex supports Chroma, Pinecone, pgvector, Weaviate, Qdrant, FAISS, and 30+ other vector stores. You can point LlamaIndex at your existing store without re-indexing.

Is LlamaIndex better for production RAG?

For pure retrieval workloads, LlamaIndex's defaults produce strong results with less code. For complex pipelines with multiple tools and conditional logic, LangChain with LangGraph offers more flexibility.

Do I need to re-embed all my documents?

Only if you change the embedding model. If you keep the same model and vector store, LlamaIndex can query existing vectors without re-indexing.

How does the learning curve compare?

LlamaIndex has a gentler learning curve for RAG applications. LangChain's LCEL and LangGraph have steeper learning curves but offer more composability for non-RAG use cases.


Sources:

✓ Last verified March 11, 2026

Migrating from LangChain to LlamaIndex
About the author AI Education & Guides Writer

Priya is an AI educator and technical writer whose mission is making artificial intelligence approachable for everyone - not just engineers.