Contents

LLM Observability: Practical Methods for Monitoring Prompt and Response

Three Key Metric Types

# 1. Latency metrics
latency_seconds  # request to response time
first_token_latency  # time to first token

# 2. Quality metrics
response_length  # answer length
token_per_second  # generation speed

# 3. Security metrics
prompt_injection_score  # injection risk score
pii_detection_count  # sensitive info detections

Implementation

class LLMObservability:
    def track(self, prompt, response, latency):
        # record to Prometheus
        metrics.counter('llm_requests_total').inc()
        metrics.histogram('llm_latency_seconds', latency)
        
        # sample storage (not all requests)
        if should_sample(prompt):
            storage.store(prompt, response)

Conclusion

Observability trio: latency + quality + security.

Must have before launch.