A Python decorator is all you need to trace LLM apps (open-source).
Most LLM evals treat the app like an end-to-end black box.
But LLM apps need component-level evals and tracing since the issue can be anywhere inside the box, like the retriever, tool call, or the LLM itself.
In
@deepeval, you can do that with just 3 lines of code:
- Trace individual LLM components (tools, retrievers, generators) with the "@ observe" decorator.
- Attach different metrics to each part.
- Get a visual breakdown of what’s working and what’s not.
Done!
You don't need to refactor any of your existing code.
See the example below for a RAG app.
Deepeval is 100% open-source with 8500+ stars, and you can easily self-host it so your data stays where you want.
Find the repo in the replies!