Research Insights

Shorter, more timely and frequent research insights and perspectives from the Kamiwaza Agentic Intelligence Research team. Research insights provide a preview into our latest research findings before they are aggregated and summarized as part of future paper releases.

Latest Insights

The Tokenizer Tax: The Same Text Can Cost 26% More on Some Models

JV Roig · April 17, 2026

We fed identical text to 10 model families and counted tokens. MiniMax M2 is the most efficient, GPT-5.4 is close behind, and Gemma 4 uses 26% more tokens than Llama 3 baseline for the exact same text. For long-context workloads, this tokenizer difference is a significant hidden dimension affecting workload estimation and cost analysis.

A 9B Model Just Crashed the Big Leagues

JV Roig · March 5, 2026

Qwen3.5-9B scores 88.1% on our KAMI agentic benchmark — a bracket previously reserved for 70B+ dense models, 200B+ MoEs, and flagship cloud APIs. The small model revolution isn't coming. It's here.

Hallucination Resistance Holds at 64K and 128K Context

JV Roig · February 18, 2026

We pushed our LoRA-finetuned Granite 4.0 Micro from 32K to 64K and 128K context — 4-16x longer than training. Hallucination resistance held (92% → 88% → 87%). Extraction didn't. The "don't fabricate" lesson is durable; finding needles in bigger haystacks is not.

Can We Reduce LLM Hallucinations for Enterprise Use? RIKER+LoRA Says Yes

JV Roig · February 15, 2026

Using RIKER + LoRA SFT on IBM Granite 4.0 Micro with just ~1,100 lease contract examples boosted accuracy from 32% to 80% — and the hallucination resistance transferred to document types the model never saw during training.

Qwen3 Next 80B: The Long-Context Champion You Haven't Heard Of

JV Roig · January 28, 2026

Our RIKER benchmark testing reveals Qwen3 Next 80B-A3B as the top performer at 200K context, beating models 6x its size while using only 3B active parameters. A deep dive into what makes this model special for long-context knowledge retrieval.

RIKER Paper - Full methodology for long-context knowledge retrieval evaluation
KAMI Leaderboard - Live rankings for agentic AI performance
Main Blog - Articles on agentic computing, orchestration and AI platform development

Latest Insights​

The Tokenizer Tax: The Same Text Can Cost 26% More on Some Models​

A 9B Model Just Crashed the Big Leagues​

Hallucination Resistance Holds at 64K and 128K Context​

Can We Reduce LLM Hallucinations for Enterprise Use? RIKER+LoRA Says Yes​

Qwen3 Next 80B: The Long-Context Champion You Haven't Heard Of​

Related Resources​