Langfuse Custom Tracer

What is this library?

When you build applications with AI models, you quickly need answers to questions like: how many tokens did that cost? which step is the slowest? which user is spending the most? This library answers all of those — automatically — by integrating with Langfuse, an open-source observability platform for LLM apps.

🔢

Automatic token counting

Reads token usage directly from the API response. No manual parsing required.

💰

Dynamic cost calculation

Fetches live pricing from a remote JSON file. Prices update without any code changes.

🌿

Nested trace visualisation

Multi-step pipelines appear as a parent-child tree in Langfuse.

⚡

Zero-setup auto tracing

Call observe() once — all AI calls are traced from that point on.

🛡️

Graceful degradation

If the network is unavailable, the library uses cached pricing and never crashes your app.

⏱️

TTL-based caching

Pricing data is cached for 10 minutes so you never make a network call on every request.

Installation

Install with pip. Pick the extras that match the AI providers you use. If you are unsure, install everything with [all].

1

Core library only

Just the tracer, no provider SDKs.

pip install langfuse-custom-tracer

2

Add `.env` file support

Lets you load API keys from a .env file using load_env().

pip install langfuse-custom-tracer[env]

3

Add provider support

Choose the providers you need, or install everything at once.

# Google Gemini only
pip install langfuse-custom-tracer[gemini]

# Anthropic Claude only
pip install langfuse-custom-tracer[anthropic]

# Everything — recommended for most projects
pip install langfuse-custom-tracer[all]

Quick start

Get up and running in three steps. You will need API keys from Langfuse and at least one AI provider.

Step 1 — Get your API keys

Langfuse required

Sign up at cloud.langfuse.com. Find your secret key and public key in project settings.

Google Gemini optional

Get your key from ai.google.dev. Only needed if you use Gemini models.

Anthropic Claude optional

Get your key from console.anthropic.com. Only needed if you use Claude models.

Step 2 — Create a `.env` file

In your project root, create a file named .env and add your keys:

# .env — never commit this file to version control
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...

GEMINI_API_KEY=...
ANTHROPIC_API_KEY=...

Important: Never commit your .env file. Add it to .gitignore. The library will raise an error if required keys are missing.

Step 3 — Run your first traced call

from langfuse_custom_tracer import load_env, observe
import os

load_env()   # reads your .env file
observe()    # turns on automatic tracing — do this before importing AI SDKs

import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")

# This call is now automatically traced and sent to Langfuse
response = model.generate_content("Summarise the history of Python in 3 sentences.")
print(response.text)

Open your Langfuse dashboard after running the script. You will see the trace with token counts, latency, and cost already calculated.

Auto tracing

The recommended approach for most applications. Call observe() once at startup and every subsequent AI call is recorded automatically — no wrapping, no boilerplate.

Call observe() before importing any AI SDK. It works by patching the SDK internally, so the import order matters.

What gets recorded on every call

🤖

Model name

The exact model string passed to the API.

📥

Input prompt

The full message or prompt sent to the model.

📤

Output response

The model's full reply.

🔢

Token counts

Input tokens, output tokens, and cached tokens (where applicable).

💰

Cost in USD

Calculated using dynamic pricing. Both input and output costs are recorded separately.

⏱️

Latency

Wall-clock time in milliseconds from request to response.

⚠️

Errors

If the call fails, the error is captured and the trace is marked as failed in Langfuse.

Tag calls with a user ID

Call set_user() at the start of each request to tag all subsequent AI calls with that user's ID. Langfuse aggregates cost and usage per user automatically.

from langfuse_custom_tracer import observe, set_user

observe()

# In a web app, set the user at the start of each request
def handle_chat_request(user_id: str, message: str):
    set_user(user_id)

    # All AI calls below are now tagged to user_id in Langfuse
    response = model.generate_content(message)
    return response.text

Group calls into sessions

A session groups multiple traces together — useful for multi-turn conversations. All messages in one conversation appear together on the Langfuse Sessions tab.

from langfuse_custom_tracer import observe, set_user, set_session, end_session

observe()

set_user("user-123")
session_id = set_session()   # auto-generates a UUID; returns it so you can store it

response1 = model.generate_content("Hello")
response2 = model.generate_content("Tell me more")
response3 = model.generate_content("Thanks, that is helpful")

end_session()   # clears the session so the next conversation starts fresh

Attach quality scores

After a call completes, you can attach a score — useful for recording user feedback (thumbs up/down), LLM-as-a-judge evaluations, or custom metrics.

from langfuse_custom_tracer import observe, get_trace_id, score

observe()

response = model.generate_content("Explain recursion")
trace_id = get_trace_id()   # capture the ID immediately after the call

# Later — for example when the user clicks thumbs up
score("thumbs_up",    1.0,  trace_id=trace_id, comment="Very clear explanation")
score("relevance",    0.95, trace_id=trace_id, data_type="NUMERIC")
score("hallucination", False, trace_id=trace_id, data_type="BOOLEAN")

Async support

Auto tracing works with asyncio out of the box. Each asyncio task has its own isolated context via Python's contextvars, so concurrent requests never mix up each other's user IDs or session IDs.

import asyncio
from langfuse_custom_tracer import observe, set_user

observe()

async def process_user(user_id: str, messages: list):
    set_user(user_id)   # ContextVar — isolated per asyncio task
    tasks = [model.generate_content_async(m) for m in messages]
    return await asyncio.gather(*tasks)

# Both users are processed concurrently without interfering with each other
asyncio.run(asyncio.gather(
    process_user("user-1", ["msg1", "msg2"]),
    process_user("user-2", ["msg3", "msg4"]),
))

Manual tracing

When you want full control — custom names, explicit inputs and outputs, metadata — use the context manager API directly. This is particularly useful for labelling multi-step pipelines so each step is clearly visible in Langfuse.

With Google Gemini

from langfuse_custom_tracer import load_env, create_langfuse_client, GeminiTracer
import google.generativeai as genai
import os

load_env()

lf     = create_langfuse_client(os.getenv("LANGFUSE_SECRET_KEY"), os.getenv("LANGFUSE_PUBLIC_KEY"))
tracer = GeminiTracer(lf)

genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")

with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
    with tracer.generation("extract-data", model="gemini-2.0-flash",
                          input="Extract name, amount, date") as gen:
        response = model.generate_content("Extract name, amount, date from this invoice: ...")
        usage    = tracer.extract_usage(response, model="gemini-2.0-flash")
        gen.update(output=response.text, usage_details=usage)
    span.update(output="Extraction complete")

tracer.flush()   # required at end of script

With Anthropic Claude

from langfuse_custom_tracer import load_env, create_langfuse_client, AnthropicTracer
from anthropic import Anthropic
import os

load_env()

lf     = create_langfuse_client(os.getenv("LANGFUSE_SECRET_KEY"), os.getenv("LANGFUSE_PUBLIC_KEY"))
tracer = AnthropicTracer(lf)
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
    with tracer.generation("extract-data", model="claude-3-5-sonnet-20241022") as gen:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            messages=[{"role": "user", "content": "Extract name, amount, date from this invoice: ..."}]
        )
        usage = tracer.extract_usage(response, model="claude-3-5-sonnet-20241022")
        gen.update(output=response.content[0].text, usage_details=usage)
    span.update(output="Extraction complete")

tracer.flush()

Multi-step pipelines

Traces can be nested to any depth. Each tracer.trace() block inside another becomes a child span. Langfuse uses OpenTelemetry context propagation to detect the nesting automatically — you do not need to pass parent IDs manually.

with tracer.trace("document-processing", user_id="user-123",
                 metadata={"doc_type": "invoice"}) as root:

    with tracer.trace("step-1-ocr"):
        with tracer.generation("extract-text", model="gemini-2.0-flash-lite") as gen:
            text = model.generate_content("Extract all text from this image.")
            gen.update(output=text.text, usage_details=tracer.extract_usage(text, model="gemini-2.0-flash-lite"))

    with tracer.trace("step-2-classify"):
        with tracer.generation("classify-doc", model="gemini-2.0-flash") as gen:
            classification = model.generate_content(f"Classify this document: {text.text}")
            gen.update(output=classification.text, usage_details=tracer.extract_usage(classification, model="gemini-2.0-flash"))

    with tracer.trace("step-3-extract-fields"):
        with tracer.generation("get-fields", model="gemini-2.0-flash") as gen:
            fields = model.generate_content(f"Extract name, amount, date from: {text.text}")
            gen.update(output=fields.text, usage_details=tracer.extract_usage(fields, model="gemini-2.0-flash"))

    root.update(output="All steps complete")

tracer.flush()

In Langfuse you will see a clean tree: the root trace contains three child spans, each with its own latency, token count, and cost. The root trace also shows the combined totals.

Error handling

Wrap AI calls in a try/except inside the generation block. Even if the call fails, the trace is sent to Langfuse with the error attached — so you can track failure rates over time.

with tracer.trace("risky-task"):
    with tracer.generation("ai-call", model="gemini-2.0-flash") as gen:
        try:
            response = model.generate_content("...")
            usage    = tracer.extract_usage(response, model="gemini-2.0-flash")
            gen.update(output=response.text, usage_details=usage)
        except Exception as e:
            gen.update(status_code=500, error=str(e))
            raise   # re-raise so your application can handle it

Langfuse dashboard

After running your code, open cloud.langfuse.com. Each traced call appears in the Traces list. Here is what a typical trace looks like:

Trace: invoice-processing ID: trace-abc123 Duration : 2.3s User : user-123 Tags : [production, batch] Status : success
└─ Generation: extract-data Model : gemini-2.0-flash Tokens : input 156 · output 89 · total 245 Cost : $0.000287 (input $0.000234 + output $0.000053) Pricing : source=json · version=2026-04-22-v1 Latency : 1.8s Output : "Name: John Doe, Amount: $500, Date: 2025-03-31"

The dashboard also aggregates all calls: total daily tokens, total cost, cost per user, and a breakdown by model — with no extra configuration.

API reference

create_langfuse_client() langfuse_custom_tracer

Creates and returns a Langfuse client. Pass the returned object into your tracer constructor.

Parameter	Type	Description
secret_key	str	Your Langfuse secret key (`sk-lf-...`)	required
public_key	str	Your Langfuse public key (`pk-lf-...`)	required
host	str	EU: `https://cloud.langfuse.com` · US: `https://us.cloud.langfuse.com`	optional

lf = create_langfuse_client(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host="https://us.cloud.langfuse.com"  # optional — US region
)

load_env() langfuse_custom_tracer

Loads environment variables from a .env file. Requires the [env] extra (python-dotenv). Call this at the very top of your script, before anything else.

Parameter	Type	Description
path	str	Path to the env file. Defaults to `.env` in the current directory.	optional

load_env()                      # reads .env in current directory
load_env(".env.production")    # reads a custom file

tracer.trace() context manager

Creates a root-level span — the top of your trace tree. Everything nested inside this block becomes a child span. You can nest trace() inside another trace() to build multi-level trees.

Parameter	Type	Description
name	str	A label for this span, e.g. `"invoice-processing"`	required
input	any	The input to this step (any Python value — shown in Langfuse)	optional
metadata	dict	Custom key/value pairs, e.g. `{"version": "1.0"}`	optional
user_id	str	Identifier for the user who triggered this trace	optional
session_id	str	Groups this trace into a conversation session	optional
tags	list[str]	String labels for filtering in Langfuse, e.g. `["production", "batch"]`	optional

Call span.update(output=...) inside the block to record the output after the work is done.

tracer.generation() context manager

Creates a generation span — one individual LLM call. Nest this inside a trace() block. After the AI call returns, call gen.update(output=..., usage_details=...) to record the result.

Parameter	Type	Description
name	str	Label for this specific LLM call, e.g. `"extract-data"`	required
model	str	The model name string, e.g. `"gemini-2.0-flash"`	optional
input	any	The prompt or messages you sent	optional
metadata	dict	Custom metadata, e.g. temperature or system prompt version	optional

tracer.extract_usage() GeminiTracer · AnthropicTracer

Reads token counts from the model response object and calculates costs using dynamic pricing. Pass the returned dict directly to gen.update(usage_details=...).

Parameter	Type	Description
response	object	The raw response object returned by the AI SDK	required
model	str	Model name — used to look up the correct pricing	required

Returns

inputint — number of prompt tokens

outputint — number of completion tokens

totalint — input + output

unitstr — always "TOKENS"

inputCostfloat — input cost in USD

outputCostfloat — output cost in USD

totalCostfloat — combined cost in USD

cachedTokensint (optional) — cached tokens used (Gemini and Anthropic)

pricing_sourcestr — "json" (remote) or "default" (fallback)

pricing_versionstr — version tag from the pricing file, e.g. "2026-04-22-v1"

tracer.flush() BaseTracer

Sends all buffered trace data to Langfuse and blocks until the upload completes.

Always call tracer.flush() at the end of short-lived scripts (CLI tools, batch jobs, notebooks). Long-running servers flush automatically in the background — you can skip this there.

Auto tracing API

These functions are only available when using observe() mode.

Function	Description
observe()	Patches the Gemini and Anthropic SDKs globally. Call once at startup.
set_user(user_id)	Tags all subsequent calls in the current context with the given user ID.
set_session(session_id=None)	Starts a new session. Auto-generates a UUID if no ID is passed. Returns the session ID.
end_session()	Clears the current session from the context.
get_trace_id()	Returns the trace ID of the most recent AI call.
score(name, value, trace_id=None, comment=None, data_type="NUMERIC")	Attaches a quality score to a trace. `data_type` can be `"NUMERIC"` or `"BOOLEAN"`.

Dynamic pricing

Pricing is decoupled from source code and managed via a remote JSON file. When AI providers change their prices, your cost numbers update automatically — no new library version required.

How it works

1

Fetch

On first use, the library fetches pricing.json from a remote URL (GitHub by default).

2

Cache

The data is cached in memory for 10 minutes (TTL-based). Subsequent calls within that window use the cache.

3

Fallback

If the remote is unavailable, the library uses the last cached value. If no cache exists, it uses built-in default prices. It never crashes.

4

Audit trail

Every trace records pricing_source and pricing_version so you can trace cost calculations back to a specific pricing snapshot.

Use a custom pricing URL

Point the library at your own pricing file by setting an environment variable or passing a URL in code:

# Option 1 — environment variable
PRICING_JSON_URL=https://your-domain.com/pricing.json

# Option 2 — in code
from langfuse_custom_tracer import get_pricing_manager

pm = get_pricing_manager(url="https://your-domain.com/pricing.json")
price, version, source = pm.get_price("gemini-2.0-flash")
print(f"Input: ${price['input']} per 1M tokens (v{version}, from {source})")

Supported models

All prices are per 1 million tokens (Q1 2026). Prices update automatically via the remote JSON.

Google Gemini

Model	Input	Output	Cache read
gemini-2.5-pro	$1.25	$10.00	$0.3125
gemini-2.0-flash	$0.15	$0.60	$0.0375
gemini-2.0-flash-lite	$0.075	$0.30	$0.01875
gemini-1.5-pro	$1.25	$5.00	$0.3125
gemini-1.5-flash	$0.075	$0.30	$0.01875
gemini-1.5-flash-8b	$0.0375	$0.15	$0.01

Anthropic Claude

Model	Input	Output	Cache read	Cache write
claude-3-5-sonnet-20241022	$3.00	$15.00	$0.30	$3.75
claude-3-5-haiku-20241022	$0.80	$4.00	$0.08	$1.00
claude-3-opus-20250219	$15.00	$75.00	$1.50	$18.75
claude-3-sonnet-20250229	$3.00	$15.00	$0.30	$3.75
claude-3-haiku-20250307	$0.80	$4.00	$0.08	$1.00

Supported SDKs

Gemini legacy

`google-generativeai`

Patches GenerativeModel.generate_content()

✓ Supported

Gemini new

`google-genai`

Patches Models.generate_content()

✓ Supported

Anthropic

`anthropic`

Patches Messages.create()

✓ Supported

Project structure

langfuse-custom-tracer/ ├── langfuse_custom_tracer/ │ ├── __init__.py # package exports │ ├── client.py # Langfuse client setup │ ├── pricing_manager.py # remote pricing: fetch, cache, fallback │ ├── auto.py # observe() — SDK patching logic │ ├── context.py # ContextVar state: user, session, trace ID │ ├── factory.py # tracer factory helper │ ├── scoring.py # score() helper │ └── tracers/ │ ├── base.py # BaseTracer — trace() and generation() │ ├── gemini.py # GeminiTracer — Gemini-specific extract_usage() │ └── anthropic.py # AnthropicTracer — Claude-specific extract_usage() ├── tests/ # 81 unit tests │ ├── conftest.py │ ├── test_pricing_manager.py │ ├── test_gemini_tracer.py │ ├── test_anthropic_tracer.py │ ├── test_base_tracer.py │ ├── test_auto_patch.py │ ├── test_factory.py │ └── test_client.py ├── pricing.json # local fallback pricing data ├── examples/ │ └── env_setup_example.py ├── SETUP.md ├── TESTING.md └── pyproject.toml

Testing

81 unit tests with 64% overall coverage. The full suite runs in approximately 1 second.

81

Total tests passing

43

AnthropicTracer
100% coverage

19

PricingManager
79% coverage

19

GeminiTracer
76% coverage

# run the full test suite
pytest

# run with a coverage report
pytest --cov

# run a single test
pytest tests/test_gemini_tracer.py::TestGeminiTracer::test_extract_usage_basic -v

# run by module
pytest tests/test_anthropic_tracer.py -v
pytest tests/test_pricing_manager.py -v

Security

🚫

Never commit `.env` files

The .env file is already listed in .gitignore. Do not remove it from there. If you accidentally commit keys, rotate them immediately.

🔒

Keys only from environment variables

Always read secrets via os.getenv() or load_env(). Never hardcode a key string in source code. The library raises an ImportError if required keys are missing.

🔐

HTTPS only

All communication with Langfuse and the remote pricing endpoint is encrypted via HTTPS.

What is this library?

Automatic token counting

Dynamic cost calculation

Nested trace visualisation

Zero-setup auto tracing

Graceful degradation

TTL-based caching

Installation

Core library only

Add .env file support

Add provider support

Quick start

Step 1 — Get your API keys

Langfuse required

Google Gemini optional

Anthropic Claude optional

Step 2 — Create a .env file

Step 3 — Run your first traced call

Auto tracing

What gets recorded on every call

Model name

Input prompt

Output response

Token counts

Cost in USD

Latency

Errors

Tag calls with a user ID

Group calls into sessions

Attach quality scores

Async support

Manual tracing

With Google Gemini

With Anthropic Claude

Multi-step pipelines

Error handling

Langfuse dashboard

API reference

Auto tracing API

Dynamic pricing

How it works

Fetch

Cache

Fallback

Audit trail

Use a custom pricing URL

Supported models

Google Gemini

Anthropic Claude

Supported SDKs

google-generativeai

google-genai

anthropic

Project structure

Testing

Security

Never commit .env files

Keys only from environment variables

HTTPS only

Add `.env` file support

Step 2 — Create a `.env` file

`google-generativeai`

`google-genai`

`anthropic`

Never commit `.env` files