LLM Observability: Practical Methods for Monitoring Prompt and Response
Contents
Three Key Metric Types
# 1. Latency metrics
latency_seconds # request to response time
first_token_latency # time to first token
# 2. Quality metrics
response_length # answer length
token_per_second # generation speed
# 3. Security metrics
prompt_injection_score # injection risk score
pii_detection_count # sensitive info detectionsImplementation
class LLMObservability:
def track(self, prompt, response, latency):
# record to Prometheus
metrics.counter('llm_requests_total').inc()
metrics.histogram('llm_latency_seconds', latency)
# sample storage (not all requests)
if should_sample(prompt):
storage.store(prompt, response)Conclusion
Observability trio: latency + quality + security.
Must have before launch.