Contents

RAG Evaluation Guide: How to Know If Your RAG Is Good

Evaluation Dimensions

# 1. Context relevance
context_relevance = llm.judge(
    "Context: {context}\nQuery: {query}\n"
    "Rate relevance 1-5:"
)

# 2. Answer faithfulness
answer_faithfulness = llm.judge(
    "Context: {context}\nAnswer: {answer}\n"
    "Does answer match context? Yes/No:"
)

# 3. Answer relevance
answer_relevance = llm.judge(
    "Query: {query}\nAnswer: {answer}\n"
    "Does answer address query? Yes/No:"
)

RAGAS Scoring

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy

result = evaluate(dataset, metrics=[faithfulness, answer_relevancy])
# gives 0-1 score

Conclusion

RAG evaluation trio: relevance + faithfulness + answer quality.

Run evaluation regularly, know when to optimize.