You shipped your agent. Now what? Without monitoring, you won't know when it breaks, costs spike, or quality degrades. Observability for AI products is different from traditional software.
What to Track
Latency
Time per tool call: Is any tool consistently slow?
Total task time: How long does the full agent workflow take?
Time to first token: For streaming responses, how quickly does the user see output?
Cost
Tokens per request: Input + output tokens. Spikes indicate prompt bloat or runaway loops.
Each step includes: timestamp, tool called, input/output summary, success/failure. This lets you debug agent failures by replaying the decision chain.
PostHog for AI Products
You already use PostHog. For AI products, track custom events:
agent_task_started — with task_type, model, estimated_complexity
agent_tool_called — with tool_name, latency_ms, success
agent_task_completed — with total_tokens, total_cost, quality_score
agent_error — with error_type, tool_name, recovery_attempted
Build dashboards that answer:
What are the most common task types?
What's the average cost per task type?
Which tools fail most frequently?
What's the trend in task completion rate?
Alerting
Set alerts for:
Cost spike: Daily cost > 2x average → alert
Quality drop: Completion rate < 90% → alert
Error rate: Any tool > 10% error rate → alert
Latency: Average task time > 2x baseline → alert
The Feedback Loop
The most valuable monitoring signal is user behavior:
User accepts agent output → positive signal
User edits agent output → partial success, log the delta
User redoes from scratch → failure, log the original output for eval
Over time, these signals become your eval dataset. Real-world failures are the best test cases.
❓ Quiz 1
What's the most important monitoring metric for catching runaway agent loops?
Runaway loops generate massive token usage. A spike in tokens per request is the fastest signal that something is wrong — the agent is repeating actions or generating excessive content.
Answer to continue ↓
🛠 Exercise 1
Design the monitoring setup for an agent you'd build for Muno Labs. List: (1) 5 custom events you'd track in PostHog, (2) 3 alert conditions with thresholds, (3) The key dashboard you'd build. Be specific to your actual use case.