Documentation
✓ 81 tests passing 64% coverage Python 3.10+ MIT

Langfuse Custom Tracer

A lightweight Python library that records every AI call your application makes — what was sent, what came back, how many tokens were used, and exactly how much it cost. All of it flows into your Langfuse dashboard automatically.

Google Gemini Anthropic Claude Dynamic pricing OpenTelemetry Zero-setup auto tracing

What is this library?

When you build applications with AI models, you quickly need answers to questions like: how many tokens did that cost? which step is the slowest? which user is spending the most? This library answers all of those — automatically — by integrating with Langfuse, an open-source observability platform for LLM apps.

🔢

Automatic token counting

Reads token usage directly from the API response. No manual parsing required.

💰

Dynamic cost calculation

Fetches live pricing from a remote JSON file. Prices update without any code changes.

🌿

Nested trace visualisation

Multi-step pipelines appear as a parent-child tree in Langfuse.

Zero-setup auto tracing

Call observe() once — all AI calls are traced from that point on.

🛡️

Graceful degradation

If the network is unavailable, the library uses cached pricing and never crashes your app.

⏱️

TTL-based caching

Pricing data is cached for 10 minutes so you never make a network call on every request.

Installation

Install with pip. Pick the extras that match the AI providers you use. If you are unsure, install everything with [all].

1

Core library only

Just the tracer, no provider SDKs.

pip install langfuse-custom-tracer
2

Add .env file support

Lets you load API keys from a .env file using load_env().

pip install langfuse-custom-tracer[env]
3

Add provider support

Choose the providers you need, or install everything at once.

# Google Gemini only
pip install langfuse-custom-tracer[gemini]

# Anthropic Claude only
pip install langfuse-custom-tracer[anthropic]

# Everything — recommended for most projects
pip install langfuse-custom-tracer[all]

Quick start

Get up and running in three steps. You will need API keys from Langfuse and at least one AI provider.

Step 1 — Get your API keys

Langfuse required

Sign up at cloud.langfuse.com. Find your secret key and public key in project settings.

Google Gemini optional

Get your key from ai.google.dev. Only needed if you use Gemini models.

Anthropic Claude optional

Get your key from console.anthropic.com. Only needed if you use Claude models.

Step 2 — Create a .env file

In your project root, create a file named .env and add your keys:

# .env — never commit this file to version control
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...

GEMINI_API_KEY=...
ANTHROPIC_API_KEY=...
Important: Never commit your .env file. Add it to .gitignore. The library will raise an error if required keys are missing.

Step 3 — Run your first traced call

from langfuse_custom_tracer import load_env, observe
import os

load_env()   # reads your .env file
observe()    # turns on automatic tracing — do this before importing AI SDKs

import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")

# This call is now automatically traced and sent to Langfuse
response = model.generate_content("Summarise the history of Python in 3 sentences.")
print(response.text)
Open your Langfuse dashboard after running the script. You will see the trace with token counts, latency, and cost already calculated.

Auto tracing

The recommended approach for most applications. Call observe() once at startup and every subsequent AI call is recorded automatically — no wrapping, no boilerplate.

Call observe() before importing any AI SDK. It works by patching the SDK internally, so the import order matters.

What gets recorded on every call

🤖

Model name

The exact model string passed to the API.

📥

Input prompt

The full message or prompt sent to the model.

📤

Output response

The model's full reply.

🔢

Token counts

Input tokens, output tokens, and cached tokens (where applicable).

💰

Cost in USD

Calculated using dynamic pricing. Both input and output costs are recorded separately.

⏱️

Latency

Wall-clock time in milliseconds from request to response.

⚠️

Errors

If the call fails, the error is captured and the trace is marked as failed in Langfuse.

Tag calls with a user ID

Call set_user() at the start of each request to tag all subsequent AI calls with that user's ID. Langfuse aggregates cost and usage per user automatically.

from langfuse_custom_tracer import observe, set_user

observe()

# In a web app, set the user at the start of each request
def handle_chat_request(user_id: str, message: str):
    set_user(user_id)

    # All AI calls below are now tagged to user_id in Langfuse
    response = model.generate_content(message)
    return response.text

Group calls into sessions

A session groups multiple traces together — useful for multi-turn conversations. All messages in one conversation appear together on the Langfuse Sessions tab.

from langfuse_custom_tracer import observe, set_user, set_session, end_session

observe()

set_user("user-123")
session_id = set_session()   # auto-generates a UUID; returns it so you can store it

response1 = model.generate_content("Hello")
response2 = model.generate_content("Tell me more")
response3 = model.generate_content("Thanks, that is helpful")

end_session()   # clears the session so the next conversation starts fresh

Attach quality scores

After a call completes, you can attach a score — useful for recording user feedback (thumbs up/down), LLM-as-a-judge evaluations, or custom metrics.

from langfuse_custom_tracer import observe, get_trace_id, score

observe()

response = model.generate_content("Explain recursion")
trace_id = get_trace_id()   # capture the ID immediately after the call

# Later — for example when the user clicks thumbs up
score("thumbs_up",    1.0,  trace_id=trace_id, comment="Very clear explanation")
score("relevance",    0.95, trace_id=trace_id, data_type="NUMERIC")
score("hallucination", False, trace_id=trace_id, data_type="BOOLEAN")

Async support

Auto tracing works with asyncio out of the box. Each asyncio task has its own isolated context via Python's contextvars, so concurrent requests never mix up each other's user IDs or session IDs.

import asyncio
from langfuse_custom_tracer import observe, set_user

observe()

async def process_user(user_id: str, messages: list):
    set_user(user_id)   # ContextVar — isolated per asyncio task
    tasks = [model.generate_content_async(m) for m in messages]
    return await asyncio.gather(*tasks)

# Both users are processed concurrently without interfering with each other
asyncio.run(asyncio.gather(
    process_user("user-1", ["msg1", "msg2"]),
    process_user("user-2", ["msg3", "msg4"]),
))

Manual tracing

When you want full control — custom names, explicit inputs and outputs, metadata — use the context manager API directly. This is particularly useful for labelling multi-step pipelines so each step is clearly visible in Langfuse.

With Google Gemini

from langfuse_custom_tracer import load_env, create_langfuse_client, GeminiTracer
import google.generativeai as genai
import os

load_env()

lf     = create_langfuse_client(os.getenv("LANGFUSE_SECRET_KEY"), os.getenv("LANGFUSE_PUBLIC_KEY"))
tracer = GeminiTracer(lf)

genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")

with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
    with tracer.generation("extract-data", model="gemini-2.0-flash",
                          input="Extract name, amount, date") as gen:
        response = model.generate_content("Extract name, amount, date from this invoice: ...")
        usage    = tracer.extract_usage(response, model="gemini-2.0-flash")
        gen.update(output=response.text, usage_details=usage)
    span.update(output="Extraction complete")

tracer.flush()   # required at end of script

With Anthropic Claude

from langfuse_custom_tracer import load_env, create_langfuse_client, AnthropicTracer
from anthropic import Anthropic
import os

load_env()

lf     = create_langfuse_client(os.getenv("LANGFUSE_SECRET_KEY"), os.getenv("LANGFUSE_PUBLIC_KEY"))
tracer = AnthropicTracer(lf)
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
    with tracer.generation("extract-data", model="claude-3-5-sonnet-20241022") as gen:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            messages=[{"role": "user", "content": "Extract name, amount, date from this invoice: ..."}]
        )
        usage = tracer.extract_usage(response, model="claude-3-5-sonnet-20241022")
        gen.update(output=response.content[0].text, usage_details=usage)
    span.update(output="Extraction complete")

tracer.flush()

Multi-step pipelines

Traces can be nested to any depth. Each tracer.trace() block inside another becomes a child span. Langfuse uses OpenTelemetry context propagation to detect the nesting automatically — you do not need to pass parent IDs manually.

with tracer.trace("document-processing", user_id="user-123",
                 metadata={"doc_type": "invoice"}) as root:

    with tracer.trace("step-1-ocr"):
        with tracer.generation("extract-text", model="gemini-2.0-flash-lite") as gen:
            text = model.generate_content("Extract all text from this image.")
            gen.update(output=text.text, usage_details=tracer.extract_usage(text, model="gemini-2.0-flash-lite"))

    with tracer.trace("step-2-classify"):
        with tracer.generation("classify-doc", model="gemini-2.0-flash") as gen:
            classification = model.generate_content(f"Classify this document: {text.text}")
            gen.update(output=classification.text, usage_details=tracer.extract_usage(classification, model="gemini-2.0-flash"))

    with tracer.trace("step-3-extract-fields"):
        with tracer.generation("get-fields", model="gemini-2.0-flash") as gen:
            fields = model.generate_content(f"Extract name, amount, date from: {text.text}")
            gen.update(output=fields.text, usage_details=tracer.extract_usage(fields, model="gemini-2.0-flash"))

    root.update(output="All steps complete")

tracer.flush()

In Langfuse you will see a clean tree: the root trace contains three child spans, each with its own latency, token count, and cost. The root trace also shows the combined totals.

Error handling

Wrap AI calls in a try/except inside the generation block. Even if the call fails, the trace is sent to Langfuse with the error attached — so you can track failure rates over time.

with tracer.trace("risky-task"):
    with tracer.generation("ai-call", model="gemini-2.0-flash") as gen:
        try:
            response = model.generate_content("...")
            usage    = tracer.extract_usage(response, model="gemini-2.0-flash")
            gen.update(output=response.text, usage_details=usage)
        except Exception as e:
            gen.update(status_code=500, error=str(e))
            raise   # re-raise so your application can handle it

Langfuse dashboard

After running your code, open cloud.langfuse.com. Each traced call appears in the Traces list. Here is what a typical trace looks like:

Trace: invoice-processing ID: trace-abc123 Duration : 2.3s User : user-123 Tags : [production, batch] Status : success
└─ Generation: extract-data Model : gemini-2.0-flash Tokens : input 156 · output 89 · total 245 Cost : $0.000287 (input $0.000234 + output $0.000053) Pricing : source=json · version=2026-04-22-v1 Latency : 1.8s Output : "Name: John Doe, Amount: $500, Date: 2025-03-31"
The dashboard also aggregates all calls: total daily tokens, total cost, cost per user, and a breakdown by model — with no extra configuration.

API reference

create_langfuse_client() langfuse_custom_tracer

Creates and returns a Langfuse client. Pass the returned object into your tracer constructor.

ParameterTypeDescription
secret_keystrYour Langfuse secret key (sk-lf-...)required
public_keystrYour Langfuse public key (pk-lf-...)required
hoststrEU: https://cloud.langfuse.com · US: https://us.cloud.langfuse.comoptional
lf = create_langfuse_client(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host="https://us.cloud.langfuse.com"  # optional — US region
)
load_env() langfuse_custom_tracer

Loads environment variables from a .env file. Requires the [env] extra (python-dotenv). Call this at the very top of your script, before anything else.

ParameterTypeDescription
pathstrPath to the env file. Defaults to .env in the current directory.optional
load_env()                      # reads .env in current directory
load_env(".env.production")    # reads a custom file
tracer.trace() context manager

Creates a root-level span — the top of your trace tree. Everything nested inside this block becomes a child span. You can nest trace() inside another trace() to build multi-level trees.

ParameterTypeDescription
namestrA label for this span, e.g. "invoice-processing"required
inputanyThe input to this step (any Python value — shown in Langfuse)optional
metadatadictCustom key/value pairs, e.g. {"version": "1.0"}optional
user_idstrIdentifier for the user who triggered this traceoptional
session_idstrGroups this trace into a conversation sessionoptional
tagslist[str]String labels for filtering in Langfuse, e.g. ["production", "batch"]optional

Call span.update(output=...) inside the block to record the output after the work is done.

tracer.generation() context manager

Creates a generation span — one individual LLM call. Nest this inside a trace() block. After the AI call returns, call gen.update(output=..., usage_details=...) to record the result.

ParameterTypeDescription
namestrLabel for this specific LLM call, e.g. "extract-data"required
modelstrThe model name string, e.g. "gemini-2.0-flash"optional
inputanyThe prompt or messages you sentoptional
metadatadictCustom metadata, e.g. temperature or system prompt versionoptional
tracer.extract_usage() GeminiTracer · AnthropicTracer

Reads token counts from the model response object and calculates costs using dynamic pricing. Pass the returned dict directly to gen.update(usage_details=...).

ParameterTypeDescription
responseobjectThe raw response object returned by the AI SDKrequired
modelstrModel name — used to look up the correct pricingrequired
Returns
inputint — number of prompt tokens
outputint — number of completion tokens
totalint — input + output
unitstr — always "TOKENS"
inputCostfloat — input cost in USD
outputCostfloat — output cost in USD
totalCostfloat — combined cost in USD
cachedTokensint (optional) — cached tokens used (Gemini and Anthropic)
pricing_sourcestr — "json" (remote) or "default" (fallback)
pricing_versionstr — version tag from the pricing file, e.g. "2026-04-22-v1"
tracer.flush() BaseTracer

Sends all buffered trace data to Langfuse and blocks until the upload completes.

Always call tracer.flush() at the end of short-lived scripts (CLI tools, batch jobs, notebooks). Long-running servers flush automatically in the background — you can skip this there.

Auto tracing API

These functions are only available when using observe() mode.

FunctionDescription
observe()Patches the Gemini and Anthropic SDKs globally. Call once at startup.
set_user(user_id)Tags all subsequent calls in the current context with the given user ID.
set_session(session_id=None)Starts a new session. Auto-generates a UUID if no ID is passed. Returns the session ID.
end_session()Clears the current session from the context.
get_trace_id()Returns the trace ID of the most recent AI call.
score(name, value, trace_id=None, comment=None, data_type="NUMERIC")Attaches a quality score to a trace. data_type can be "NUMERIC" or "BOOLEAN".

Dynamic pricing

Pricing is decoupled from source code and managed via a remote JSON file. When AI providers change their prices, your cost numbers update automatically — no new library version required.

How it works

1

Fetch

On first use, the library fetches pricing.json from a remote URL (GitHub by default).

2

Cache

The data is cached in memory for 10 minutes (TTL-based). Subsequent calls within that window use the cache.

3

Fallback

If the remote is unavailable, the library uses the last cached value. If no cache exists, it uses built-in default prices. It never crashes.

4

Audit trail

Every trace records pricing_source and pricing_version so you can trace cost calculations back to a specific pricing snapshot.

Use a custom pricing URL

Point the library at your own pricing file by setting an environment variable or passing a URL in code:

# Option 1 — environment variable
PRICING_JSON_URL=https://your-domain.com/pricing.json

# Option 2 — in code
from langfuse_custom_tracer import get_pricing_manager

pm = get_pricing_manager(url="https://your-domain.com/pricing.json")
price, version, source = pm.get_price("gemini-2.0-flash")
print(f"Input: ${price['input']} per 1M tokens (v{version}, from {source})")

Supported models

All prices are per 1 million tokens (Q1 2026). Prices update automatically via the remote JSON.

Google Gemini

ModelInputOutputCache read
gemini-2.5-pro$1.25$10.00$0.3125
gemini-2.0-flash$0.15$0.60$0.0375
gemini-2.0-flash-lite$0.075$0.30$0.01875
gemini-1.5-pro$1.25$5.00$0.3125
gemini-1.5-flash$0.075$0.30$0.01875
gemini-1.5-flash-8b$0.0375$0.15$0.01

Anthropic Claude

ModelInputOutputCache readCache write
claude-3-5-sonnet-20241022$3.00$15.00$0.30$3.75
claude-3-5-haiku-20241022$0.80$4.00$0.08$1.00
claude-3-opus-20250219$15.00$75.00$1.50$18.75
claude-3-sonnet-20250229$3.00$15.00$0.30$3.75
claude-3-haiku-20250307$0.80$4.00$0.08$1.00

Supported SDKs

Gemini legacy

google-generativeai

Patches GenerativeModel.generate_content()

✓ Supported
Gemini new

google-genai

Patches Models.generate_content()

✓ Supported
Anthropic

anthropic

Patches Messages.create()

✓ Supported

Project structure

langfuse-custom-tracer/ ├── langfuse_custom_tracer/ │ ├── __init__.py # package exports │ ├── client.py # Langfuse client setup │ ├── pricing_manager.py # remote pricing: fetch, cache, fallback │ ├── auto.py # observe() — SDK patching logic │ ├── context.py # ContextVar state: user, session, trace ID │ ├── factory.py # tracer factory helper │ ├── scoring.py # score() helper │ └── tracers/ │ ├── base.py # BaseTracer — trace() and generation() │ ├── gemini.py # GeminiTracer — Gemini-specific extract_usage() │ └── anthropic.py # AnthropicTracer — Claude-specific extract_usage() ├── tests/ # 81 unit tests │ ├── conftest.py │ ├── test_pricing_manager.py │ ├── test_gemini_tracer.py │ ├── test_anthropic_tracer.py │ ├── test_base_tracer.py │ ├── test_auto_patch.py │ ├── test_factory.py │ └── test_client.py ├── pricing.json # local fallback pricing data ├── examples/ │ └── env_setup_example.py ├── SETUP.md ├── TESTING.md └── pyproject.toml

Testing

81 unit tests with 64% overall coverage. The full suite runs in approximately 1 second.

81
Total tests passing
43
AnthropicTracer
100% coverage
19
PricingManager
79% coverage
19
GeminiTracer
76% coverage
# run the full test suite
pytest

# run with a coverage report
pytest --cov

# run a single test
pytest tests/test_gemini_tracer.py::TestGeminiTracer::test_extract_usage_basic -v

# run by module
pytest tests/test_anthropic_tracer.py -v
pytest tests/test_pricing_manager.py -v

Security

🚫

Never commit .env files

The .env file is already listed in .gitignore. Do not remove it from there. If you accidentally commit keys, rotate them immediately.

🔒

Keys only from environment variables

Always read secrets via os.getenv() or load_env(). Never hardcode a key string in source code. The library raises an ImportError if required keys are missing.

🔐

HTTPS only

All communication with Langfuse and the remote pricing endpoint is encrypted via HTTPS.