RAG Evaluation Guide: How to Know If Your RAG Is Good
Contents
Evaluation Dimensions
# 1. Context relevance
context_relevance = llm.judge(
"Context: {context}\nQuery: {query}\n"
"Rate relevance 1-5:"
)
# 2. Answer faithfulness
answer_faithfulness = llm.judge(
"Context: {context}\nAnswer: {answer}\n"
"Does answer match context? Yes/No:"
)
# 3. Answer relevance
answer_relevance = llm.judge(
"Query: {query}\nAnswer: {answer}\n"
"Does answer address query? Yes/No:"
)RAGAS Scoring
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy
result = evaluate(dataset, metrics=[faithfulness, answer_relevancy])
# gives 0-1 scoreConclusion
RAG evaluation trio: relevance + faithfulness + answer quality.
Run evaluation regularly, know when to optimize.