More reading
LLM and Agentic Evaluation: Why Eval Dataset Coverage Matters More Than Size
The issue is rarely too few eval rows. It is eval data that misses the real spread of cases, slices, and failure modes.
Long-Form Synthetic Data Generation: Same LLM, Dramatically Different Results
Same LLM, dramatically different results. DataFramer vs raw Claude on 50K-token document generation.
Synthetic Text-to-SQL Data Generation with 100% SQL Validity Using Claude Haiku
From production SQL failure traces to 500 labeled, execution-validated eval samples — using only Claude Haiku.
How a 3B Model Outperformed GPT-4o on Hallucination Detection: The Training, Evals, Validation, and Benchmark Synthetic Data Pipeline Behind HDM-2
A 3B open-source model beat GPT-4o at hallucination detection, built on purpose-built training and eval data.
Field Notes
What we've learned building thisLearning Center
Foundational reading on LLMs, RAG, and AI evaluationAn Overview of Retrieval-Augmented Generation (RAG) and Its Different Components
RAG components, retrieval strategies, and how to build systems that ground LLM outputs in real data.
DataFramer TeamTop Problems with RAG Systems and Ways to Mitigate Them
The most common RAG failure modes and the best practices that address each one.
DataFramer TeamLLM-as-Judge: Why It's Hard to Get Right and Why It Still Matters
When LLM-as-judge works, when it breaks down, and what it actually takes to build one you can trust.
DataFramer TeamA Quick Comparison of Vector Databases for RAG Systems
ApertureDB, Pinecone, Weaviate, and Milvus compared on features, performance, and RAG use cases.
DataFramer TeamA Practical Guide to Agentic LLM Frameworks
A practical overview of agentic LLM frameworks: reasoning, planning, tool use, and the real challenges of running them in production.
DataFramer TeamHow to Fix Hallucinations in RAG LLM Apps
Concrete techniques for diagnosing and reducing hallucinations in RAG-based LLM applications.
DataFramer TeamAn Expert's Guide to Picking Your LLM Tech Stack
A practical breakdown of every layer in the LLM stack: models, orchestration, storage, and ops.
DataFramer TeamTop Strategies for Detecting LLM Hallucinations
Detection strategies for hallucinations in both RAG and non-RAG LLM applications, and the real tradeoffs between them.
DataFramer TeamSee it in action.