Partial Evidence Bench — when agents answer with incomplete data
New benchmark tests a concrete failure mode: enterprise agents produce plausible answers from restricted datasets, masking gaps in evidence they're not authorized to see.
• 72 tasks across due diligence, compliance audit, and security incident response
• ACL-partitioned corpora to enforce authorization boundaries
• Oracle complete answers, authorized-view answers, and gap-report metrics