Langfuse Custom Tracer
A lightweight Python library that records every AI call your application makes — what was sent, what came back, how many tokens were used, and exactly how much it cost. All of it flows into your Langfuse dashboard automatically.
What is this library?
When you build applications with AI models, you quickly need answers to questions like: how many tokens did that cost? which step is the slowest? which user is spending the most? This library answers all of those — automatically — by integrating with Langfuse, an open-source observability platform for LLM apps.
Automatic token counting
Reads token usage directly from the API response. No manual parsing required.
Dynamic cost calculation
Fetches live pricing from a remote JSON file. Prices update without any code changes.
Nested trace visualisation
Multi-step pipelines appear as a parent-child tree in Langfuse.
Zero-setup auto tracing
Call observe() once — all AI calls are traced from that point on.
Graceful degradation
If the network is unavailable, the library uses cached pricing and never crashes your app.
TTL-based caching
Pricing data is cached for 10 minutes so you never make a network call on every request.
Installation
Install with pip. Pick the extras that match the AI providers you use.
If you are unsure, install everything with [all].
Core library only
Just the tracer, no provider SDKs.
pip install langfuse-custom-tracer
Add .env file support
Lets you load API keys from a .env file using load_env().
pip install langfuse-custom-tracer[env]
Add provider support
Choose the providers you need, or install everything at once.
# Google Gemini only
pip install langfuse-custom-tracer[gemini]
# Anthropic Claude only
pip install langfuse-custom-tracer[anthropic]
# Everything — recommended for most projects
pip install langfuse-custom-tracer[all]
Quick start
Get up and running in three steps. You will need API keys from Langfuse and at least one AI provider.
Step 1 — Get your API keys
Langfuse required
Sign up at cloud.langfuse.com. Find your secret key and public key in project settings.
Google Gemini optional
Get your key from ai.google.dev. Only needed if you use Gemini models.
Anthropic Claude optional
Get your key from console.anthropic.com. Only needed if you use Claude models.
Step 2 — Create a .env file
In your project root, create a file named .env and add your keys:
# .env — never commit this file to version control
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
GEMINI_API_KEY=...
ANTHROPIC_API_KEY=...
.env file. Add it to .gitignore. The library will raise an error if required keys are missing.
Step 3 — Run your first traced call
from langfuse_custom_tracer import load_env, observe
import os
load_env() # reads your .env file
observe() # turns on automatic tracing — do this before importing AI SDKs
import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")
# This call is now automatically traced and sent to Langfuse
response = model.generate_content("Summarise the history of Python in 3 sentences.")
print(response.text)
Auto tracing
The recommended approach for most applications. Call observe() once
at startup and every subsequent AI call is recorded automatically — no wrapping, no boilerplate.
observe() before importing any AI SDK. It works by patching
the SDK internally, so the import order matters.
What gets recorded on every call
Model name
The exact model string passed to the API.
Input prompt
The full message or prompt sent to the model.
Output response
The model's full reply.
Token counts
Input tokens, output tokens, and cached tokens (where applicable).
Cost in USD
Calculated using dynamic pricing. Both input and output costs are recorded separately.
Latency
Wall-clock time in milliseconds from request to response.
Errors
If the call fails, the error is captured and the trace is marked as failed in Langfuse.
Tag calls with a user ID
Call set_user() at the start of each request to tag all subsequent AI calls with that user's ID. Langfuse aggregates cost and usage per user automatically.
from langfuse_custom_tracer import observe, set_user
observe()
# In a web app, set the user at the start of each request
def handle_chat_request(user_id: str, message: str):
set_user(user_id)
# All AI calls below are now tagged to user_id in Langfuse
response = model.generate_content(message)
return response.text
Group calls into sessions
A session groups multiple traces together — useful for multi-turn conversations. All messages in one conversation appear together on the Langfuse Sessions tab.
from langfuse_custom_tracer import observe, set_user, set_session, end_session
observe()
set_user("user-123")
session_id = set_session() # auto-generates a UUID; returns it so you can store it
response1 = model.generate_content("Hello")
response2 = model.generate_content("Tell me more")
response3 = model.generate_content("Thanks, that is helpful")
end_session() # clears the session so the next conversation starts fresh
Attach quality scores
After a call completes, you can attach a score — useful for recording user feedback (thumbs up/down), LLM-as-a-judge evaluations, or custom metrics.
from langfuse_custom_tracer import observe, get_trace_id, score
observe()
response = model.generate_content("Explain recursion")
trace_id = get_trace_id() # capture the ID immediately after the call
# Later — for example when the user clicks thumbs up
score("thumbs_up", 1.0, trace_id=trace_id, comment="Very clear explanation")
score("relevance", 0.95, trace_id=trace_id, data_type="NUMERIC")
score("hallucination", False, trace_id=trace_id, data_type="BOOLEAN")
Async support
Auto tracing works with asyncio out of the box. Each asyncio task has its own isolated context via Python's contextvars, so concurrent requests never mix up each other's user IDs or session IDs.
import asyncio
from langfuse_custom_tracer import observe, set_user
observe()
async def process_user(user_id: str, messages: list):
set_user(user_id) # ContextVar — isolated per asyncio task
tasks = [model.generate_content_async(m) for m in messages]
return await asyncio.gather(*tasks)
# Both users are processed concurrently without interfering with each other
asyncio.run(asyncio.gather(
process_user("user-1", ["msg1", "msg2"]),
process_user("user-2", ["msg3", "msg4"]),
))
Manual tracing
When you want full control — custom names, explicit inputs and outputs, metadata — use the context manager API directly. This is particularly useful for labelling multi-step pipelines so each step is clearly visible in Langfuse.
With Google Gemini
from langfuse_custom_tracer import load_env, create_langfuse_client, GeminiTracer
import google.generativeai as genai
import os
load_env()
lf = create_langfuse_client(os.getenv("LANGFUSE_SECRET_KEY"), os.getenv("LANGFUSE_PUBLIC_KEY"))
tracer = GeminiTracer(lf)
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")
with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
with tracer.generation("extract-data", model="gemini-2.0-flash",
input="Extract name, amount, date") as gen:
response = model.generate_content("Extract name, amount, date from this invoice: ...")
usage = tracer.extract_usage(response, model="gemini-2.0-flash")
gen.update(output=response.text, usage_details=usage)
span.update(output="Extraction complete")
tracer.flush() # required at end of script
With Anthropic Claude
from langfuse_custom_tracer import load_env, create_langfuse_client, AnthropicTracer
from anthropic import Anthropic
import os
load_env()
lf = create_langfuse_client(os.getenv("LANGFUSE_SECRET_KEY"), os.getenv("LANGFUSE_PUBLIC_KEY"))
tracer = AnthropicTracer(lf)
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
with tracer.generation("extract-data", model="claude-3-5-sonnet-20241022") as gen:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Extract name, amount, date from this invoice: ..."}]
)
usage = tracer.extract_usage(response, model="claude-3-5-sonnet-20241022")
gen.update(output=response.content[0].text, usage_details=usage)
span.update(output="Extraction complete")
tracer.flush()
Multi-step pipelines
Traces can be nested to any depth. Each tracer.trace() block inside another
becomes a child span. Langfuse uses OpenTelemetry context propagation to detect the nesting automatically —
you do not need to pass parent IDs manually.
with tracer.trace("document-processing", user_id="user-123",
metadata={"doc_type": "invoice"}) as root:
with tracer.trace("step-1-ocr"):
with tracer.generation("extract-text", model="gemini-2.0-flash-lite") as gen:
text = model.generate_content("Extract all text from this image.")
gen.update(output=text.text, usage_details=tracer.extract_usage(text, model="gemini-2.0-flash-lite"))
with tracer.trace("step-2-classify"):
with tracer.generation("classify-doc", model="gemini-2.0-flash") as gen:
classification = model.generate_content(f"Classify this document: {text.text}")
gen.update(output=classification.text, usage_details=tracer.extract_usage(classification, model="gemini-2.0-flash"))
with tracer.trace("step-3-extract-fields"):
with tracer.generation("get-fields", model="gemini-2.0-flash") as gen:
fields = model.generate_content(f"Extract name, amount, date from: {text.text}")
gen.update(output=fields.text, usage_details=tracer.extract_usage(fields, model="gemini-2.0-flash"))
root.update(output="All steps complete")
tracer.flush()
In Langfuse you will see a clean tree: the root trace contains three child spans, each with its own latency, token count, and cost. The root trace also shows the combined totals.
Error handling
Wrap AI calls in a try/except inside the generation block. Even if the call fails, the trace is sent to Langfuse with the error attached — so you can track failure rates over time.
with tracer.trace("risky-task"):
with tracer.generation("ai-call", model="gemini-2.0-flash") as gen:
try:
response = model.generate_content("...")
usage = tracer.extract_usage(response, model="gemini-2.0-flash")
gen.update(output=response.text, usage_details=usage)
except Exception as e:
gen.update(status_code=500, error=str(e))
raise # re-raise so your application can handle it
Langfuse dashboard
After running your code, open cloud.langfuse.com. Each traced call appears in the Traces list. Here is what a typical trace looks like:
└─ Generation: extract-data Cost : $0.000287 (input $0.000234 + output $0.000053)
API reference
Creates and returns a Langfuse client. Pass the returned object into your tracer constructor.
| Parameter | Type | Description | |
|---|---|---|---|
| secret_key | str | Your Langfuse secret key (sk-lf-...) | required |
| public_key | str | Your Langfuse public key (pk-lf-...) | required |
| host | str | EU: https://cloud.langfuse.com · US: https://us.cloud.langfuse.com | optional |
lf = create_langfuse_client(
secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
host="https://us.cloud.langfuse.com" # optional — US region
)
Loads environment variables from a .env file. Requires the [env] extra (python-dotenv). Call this at the very top of your script, before anything else.
| Parameter | Type | Description | |
|---|---|---|---|
| path | str | Path to the env file. Defaults to .env in the current directory. | optional |
load_env() # reads .env in current directory
load_env(".env.production") # reads a custom file
Creates a root-level span — the top of your trace tree. Everything nested inside this block becomes a child span. You can nest trace() inside another trace() to build multi-level trees.
| Parameter | Type | Description | |
|---|---|---|---|
| name | str | A label for this span, e.g. "invoice-processing" | required |
| input | any | The input to this step (any Python value — shown in Langfuse) | optional |
| metadata | dict | Custom key/value pairs, e.g. {"version": "1.0"} | optional |
| user_id | str | Identifier for the user who triggered this trace | optional |
| session_id | str | Groups this trace into a conversation session | optional |
| tags | list[str] | String labels for filtering in Langfuse, e.g. ["production", "batch"] | optional |
Call span.update(output=...) inside the block to record the output after the work is done.
Creates a generation span — one individual LLM call. Nest this inside a trace() block. After the AI call returns, call gen.update(output=..., usage_details=...) to record the result.
| Parameter | Type | Description | |
|---|---|---|---|
| name | str | Label for this specific LLM call, e.g. "extract-data" | required |
| model | str | The model name string, e.g. "gemini-2.0-flash" | optional |
| input | any | The prompt or messages you sent | optional |
| metadata | dict | Custom metadata, e.g. temperature or system prompt version | optional |
Reads token counts from the model response object and calculates costs using dynamic pricing. Pass the returned dict directly to gen.update(usage_details=...).
| Parameter | Type | Description | |
|---|---|---|---|
| response | object | The raw response object returned by the AI SDK | required |
| model | str | Model name — used to look up the correct pricing | required |
"TOKENS""json" (remote) or "default" (fallback)"2026-04-22-v1"Sends all buffered trace data to Langfuse and blocks until the upload completes.
tracer.flush() at the end of short-lived scripts (CLI tools, batch jobs, notebooks). Long-running servers flush automatically in the background — you can skip this there.Auto tracing API
These functions are only available when using observe() mode.
| Function | Description |
|---|---|
| observe() | Patches the Gemini and Anthropic SDKs globally. Call once at startup. |
| set_user(user_id) | Tags all subsequent calls in the current context with the given user ID. |
| set_session(session_id=None) | Starts a new session. Auto-generates a UUID if no ID is passed. Returns the session ID. |
| end_session() | Clears the current session from the context. |
| get_trace_id() | Returns the trace ID of the most recent AI call. |
| score(name, value, trace_id=None, comment=None, data_type="NUMERIC") | Attaches a quality score to a trace. data_type can be "NUMERIC" or "BOOLEAN". |
Dynamic pricing
Pricing is decoupled from source code and managed via a remote JSON file. When AI providers change their prices, your cost numbers update automatically — no new library version required.
How it works
Fetch
On first use, the library fetches pricing.json from a remote URL (GitHub by default).
Cache
The data is cached in memory for 10 minutes (TTL-based). Subsequent calls within that window use the cache.
Fallback
If the remote is unavailable, the library uses the last cached value. If no cache exists, it uses built-in default prices. It never crashes.
Audit trail
Every trace records pricing_source and pricing_version so you can trace cost calculations back to a specific pricing snapshot.
Use a custom pricing URL
Point the library at your own pricing file by setting an environment variable or passing a URL in code:
# Option 1 — environment variable
PRICING_JSON_URL=https://your-domain.com/pricing.json
# Option 2 — in code
from langfuse_custom_tracer import get_pricing_manager
pm = get_pricing_manager(url="https://your-domain.com/pricing.json")
price, version, source = pm.get_price("gemini-2.0-flash")
print(f"Input: ${price['input']} per 1M tokens (v{version}, from {source})")
Supported models
All prices are per 1 million tokens (Q1 2026). Prices update automatically via the remote JSON.
Google Gemini
| Model | Input | Output | Cache read |
|---|---|---|---|
| gemini-2.5-pro | $1.25 | $10.00 | $0.3125 |
| gemini-2.0-flash | $0.15 | $0.60 | $0.0375 |
| gemini-2.0-flash-lite | $0.075 | $0.30 | $0.01875 |
| gemini-1.5-pro | $1.25 | $5.00 | $0.3125 |
| gemini-1.5-flash | $0.075 | $0.30 | $0.01875 |
| gemini-1.5-flash-8b | $0.0375 | $0.15 | $0.01 |
Anthropic Claude
| Model | Input | Output | Cache read | Cache write |
|---|---|---|---|---|
| claude-3-5-sonnet-20241022 | $3.00 | $15.00 | $0.30 | $3.75 |
| claude-3-5-haiku-20241022 | $0.80 | $4.00 | $0.08 | $1.00 |
| claude-3-opus-20250219 | $15.00 | $75.00 | $1.50 | $18.75 |
| claude-3-sonnet-20250229 | $3.00 | $15.00 | $0.30 | $3.75 |
| claude-3-haiku-20250307 | $0.80 | $4.00 | $0.08 | $1.00 |
Supported SDKs
google-generativeai
Patches GenerativeModel.generate_content()
google-genai
Patches Models.generate_content()
anthropic
Patches Messages.create()
Project structure
Testing
81 unit tests with 64% overall coverage. The full suite runs in approximately 1 second.
100% coverage
79% coverage
76% coverage
# run the full test suite
pytest
# run with a coverage report
pytest --cov
# run a single test
pytest tests/test_gemini_tracer.py::TestGeminiTracer::test_extract_usage_basic -v
# run by module
pytest tests/test_anthropic_tracer.py -v
pytest tests/test_pricing_manager.py -v
Security
Never commit .env files
The .env file is already listed in .gitignore. Do not remove it from there. If you accidentally commit keys, rotate them immediately.
Keys only from environment variables
Always read secrets via os.getenv() or load_env(). Never hardcode a key string in source code. The library raises an ImportError if required keys are missing.
HTTPS only
All communication with Langfuse and the remote pricing endpoint is encrypted via HTTPS.