DocsIntegrationsLlamaIndex(Deprecated) LlamaIndex Callback

(Deprecated) Callback-based LlamaIndex Integration

⚠️

This integration is deprecated. We recommend using the new instrumentation-based integration with Langfuse as described here.

Add Langfuse to your LlamaIndex application

Make sure you have both llama-index and langfuse installed.

pip install llama-index langfuse

At the root of your LlamaIndex application, register Langfuse’s LlamaIndexCallbackHandler in the LlamaIndex Settings.callback_manager. When instantiating LlamaIndexCallbackHandler, make sure to configure it correctly with your Langfuse API keys and the Host URL.

.env
LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_PUBLIC_KEY="pk-lf-..."
LANGFUSE_HOST="https://cloud.langfuse.com" # 🇪🇺 EU region
# LANGFUSE_HOST="https://us.cloud.langfuse.com" # 🇺🇸 US region
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from langfuse.llama_index import LlamaIndexCallbackHandler
 
langfuse_callback_handler = LlamaIndexCallbackHandler()
Settings.callback_manager = CallbackManager([langfuse_callback_handler])

Done! Traces and metrics from your LlamaIndex application are now automatically tracked in Langfuse. If you construct a new index or query an LLM with your documents in context, your traces and metrics are immediately visible in the Langfuse UI.

Check out the notebook for end-to-end examples of the integration:

Additional configuration

Queuing and flushing

The Langfuse SDKs queue and batches events in the background to reduce the number of network requests and improve overall performance. In a long-running application, this works without any additional configuration.

If you are running a short-lived application, you need to flush Langfuse to ensure that all events are flushed before the application exits.

langfuse_handler.flush()

Learn more about queuing and batching of events here.

Custom trace parameters

You can update trace parameters at any time to add additional context to a trace, such as a user ID, session ID, or tags. See the Python SDK Trace documentation for more information. All subsequent traces will include these set parameters.

PropertyDescription
nameIdentify a specific type of trace, e.g. a use case or functionality.
metadataAdditional information that you want to see in Langfuse. Can be any JSON.
session_idThe current session.
user_idThe current user_id.
tagsTags to categorize and filter traces.
releaseThe release tag of your application.
versionThe version of your application.
sample_rateSample rate for tracing.
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from langfuse import langfuse
 
# Instantiate a new LlamaIndexCallbackHandler and register it in the LlamaIndex Settings
langfuse_handler = LlamaIndexCallbackHandler()
Settings.callback_manager = CallbackManager([langfuse_handler])
 
def my_func():
  # Set trace parameters before executing your LlamaIndex code
  langfuse_callback_handler.set_trace_params(
    user_id="user-123",
    session_id="session-abc",
    tags=["production"]
  )
 
  # Your LlamaIndex code, trace will include the set parameters

Notes

  • The params will be applied to all traces and spans created after the set_trace_params call. You can unset them by calling e.g. set_trace_params(user_id=None).
  • If you run this in a Jupyter Notebook, you need to run set_trace_params in the same cell as your LlamaIndex code.
  • When setting a root trace or span, this setting will have no effect as the root trace or span will be used. See next section for more information.

Interoperability with Langfuse SDK

The Langfuse Python SDK is fully interoperable with the LlamaIndex integration.

This is useful when your LlamaIndex executions are part of a larger application and you want to link all traces and spans together. This can also be useful when you’d like to group multiple LlamaIndex executions to be part of the same trace or span.

When using the Langfuse @observe() decorator, langfuse_context.get_current_llama_index_handler() exposes a callback handler scoped to the current trace context, in this case llama_index_fn(). Pass it to the LlamaIndex Settings.callback_manager to trace subsequent LlamaIndex executions.

from langfuse.decorators import langfuse_context, observe
from llama_index.core import Document, VectorStoreIndex
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
 
@observe()
def llama_index_fn(question: str):
    # Set callback manager for LlamaIndex, will apply to all LlamaIndex executions in this function
    langfuse_handler = langfuse_context.get_current_llama_index_handler()
    Settings.callback_manager = CallbackManager([langfuse_handler])
 
    # Run application
    index = VectorStoreIndex.from_documents([doc1,doc2])
    response = index.as_query_engine().query(question)
    return response

Notes

  • The Llamaindex intergation will not make any changes to your provided root trace or span. If you want to add additional context or input/output to your root trace or span, you can do so via the Python SDK.
  • This uses context vars and will work reliably when run in the same cell in Jupyter.

Was this page useful?

Questions? We're here to help

Subscribe to updates