Masking of sensitive LLM data

Masking is a feature that allows precise control over the tracing data sent to the Langfuse server. It enables you to:

Redact sensitive information from trace or observation inputs and outputs.
Customize the content of events before transmission.
Implement fine-grained data filtering based on your specific requirements.

The process works as follows:

You define a custom masking function and pass it to the Langfuse client constructor.
All event inputs and outputs are processed through this function.
The masked data is then sent to the Langfuse server.

This approach ensures that you have complete control over the event input and output data traced by your application.

Define a masking function:

def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data

Use with the @observe() decorator:

from langfuse.decorators import langfuse_context, observe
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def fn():
    return "SECRET_DATA"
 
fn()
 
langfuse_context.flush()
 
# The trace output in Langfuse will have the output masked as "REDACTED".

Use with the low-level SDK:

from langfuse import Langfuse
 
langfuse = Langfuse(mask=masking_function)
 
trace = langfuse.trace(output="SECRET_DATA")
 
langfuse.flush()
 
# The trace output in Langfuse will have the output masked as "REDACTED".

import { Langfuse } from "langfuse";
 
function maskingFunction(params: { data: any }) {
if (typeof params.data === "string" && params.data.startsWith("SECRET\_")) {
return "REDACTED";
}
 
return params.data;
}
 
const langfuse = new Langfuse({ mask: maskingFunction });
 
const trace = langfuse.trace({
output: "SECRET_DATA",
});
 
await langfuse.flushAsync();
 
// The trace output in Langfuse will have the output masked as "REDACTED".

See JS/TS SDK docs for more details.

When using the OpenAI SDK Integration, set openai.langfuse_mask to the masking function:

from langfuse.openai import openai
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
openai.langfuse_mask = masking_function
 
completion = openai.chat.completions.create(
  name="test-chat",
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a bot."},
    {"role": "user", "content": "1 + 1 = "}],
  temperature=0,
)
 
openai.flush_langfuse()

When using the integration with the @observe() decorator (see interop docs), set masking function via the langfuse_context:

from langfuse.decorators import langfuse_context, observe
from langfuse.openai import openai
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def fn():
    completion = openai.chat.completions.create(
      name="test-chat",
      model="gpt-3.5-turbo",
      messages=[
        {"role": "system", "content": "You are a calculator."},
        {"role": "user", "content": "1 + 1 = "}],
      temperature=0,
    )
 
fn()

When using the CallbackHandler, you can pass mask as a keyword argument:

from langfuse.callback import CallbackHandler
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
handler = CallbackHandler(
  mask=masking_function
)

When using the integration with the @observe() decorator (see interop docs), set mask via the langfuse_context:

from langfuse.decorators import langfuse_context, observe
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def fn():
    langfuse_handler = langfuse_context.get_current_langchain_handler()
 
    # Pass handler to invoke of your langchain chain/agent
    chain.invoke({"person": person}, config={"callbacks":[langfuse_handler]})
 
fn()

When using the CallbackHandler, you can pass mask to the constructor:

import { CallbackHandler } from "langfuse-langchain";
 
function maskingFunction(params: { data: any }) {
  if (typeof params.data === "string" && params.data.startsWith("SECRET_")) {
    return "REDACTED";
  }
 
  return params.data;
}
 
const handler = new CallbackHandler({
  mask: maskingFunction,
});

When using the LlamaIndex Integration, set the mask via the instrumentor.observe() context manager:

from langfuse.llama_index import LlamaIndexInstrumentor
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
instrumentor = LlamaIndexInstrumentor(mask=masking_function)
 
with instrumentor.observe():
    # ... your LlamaIndex index creation ...
 
    index.as_query_engine().query("What is the capital of France?")
 
instrumentor.flush()

When using the integration with the @observe() decorator (see interop docs), set the mask via the langfuse_context:

from langfuse.decorators import langfuse_context, observe
from langfuse.llama_index import LlamaIndexInstrumentor
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def llama_index_fn(question: str):
    # Get IDs
    current_trace_id = langfuse_context.get_current_trace_id()
    current_observation_id = langfuse_context.get_current_observation_id()
 
    # Pass to instrumentor
    with instrumentor.observe(
        trace_id=current_trace_id,
        parent_observation_id=current_observation_id,
        update_parent=False
    ) as trace:
        # ... your LlamaIndex index creation ...
 
        index.as_query_engine().query("What is the capital of France?")
 
        # Run application
        index = VectorStoreIndex.from_documents([doc1, doc2])
        response = index.as_query_engine().query(question)
 
        return response

Masking of sensitive LLM data

GitHub Discussions

Was this page useful?

Questions? We're here to help

Subscribe to updates