Research Datasets
Access datasets from Kamiwaza AIR research to support your agentic AI research and development.
How Much Do LLMs Hallucinate in Document Q&A Scenarios?
Raw model outputs from the 172-billion-token hallucination study across 35 models, three context lengths, four temperatures, and three hardware platforms. Ground truth, document corpora, and test sets provided separately.
📦 Download RIKER2_March2026.zip (4.34 GB) — Raw model outputs
📦 Download RIKER2_corpora_groundtruth_testsets.zip (0.60 MB) — Ground truth, corpora, and test sets
🔗 Read the paper | arXiv
Citation:
@article{roig2026hallucinate,
title={How Much Do LLMs Hallucinate in Document Q\&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms},
author={Roig, JV},
year={2026},
eprint={2603.08274},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.08274}
}
Scalable and Reliable Evaluation of AI Knowledge Retrieval Systems (aka "RIKER")
Generated ground truth, document corpora, test sets, and the various model raw results.
📦 Download riker_2025.zip (612.45 MB)
Citation:
@techreport{roig2025riker,
title={Scalable and Reliable Evaluation of AI Knowledge Retrieval Systems: RIKER and the Coherent Simulated Universe},
author={Roig, JV},
institution={Kamiwaza AI},
year={2025},
url={https://docs.kamiwaza.ai/research/papers/riker}
}
Towards a Standard, Enterprise-Relevant Agentic AI Benchmark (aka "KAMI v0.1")
Test suite definitions and evaluation code from our enterprise-focused agentic AI benchmark study.
📦 Download KAMI_v0.1_2025-12-17.zip (999.62 MB)
Citation:
@techreport{roig2025kami,
title={Towards a Standard, Enterprise-Relevant Agentic AI Benchmark: Lessons from 5.5 billion tokens' worth of agentic AI evaluations},
author={Roig, JV},
institution={Kamiwaza AI},
year={2025},
url={https://docs.kamiwaza.ai/research/papers/kami-v0-1}
}
How Do LLMs Fail In Agentic Scenarios?
Execution traces from our qualitative analysis of LLM failure modes in agentic simulations.
📦 Download HowDoLLMsFail_2025.zip (2.60 MB)
Citation:
@techreport{roig2025llmfailures,
title={How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations},
author={Roig, JV},
institution={Kamiwaza AI},
year={2025},
url={https://docs.kamiwaza.ai/research/papers/llm-agentic-failures}
}