TL;DR
Research of 2,961 AI prompt runs shows <1% chance of identical results, challenging $100M+ AI tracking industry.
Key Points
- <1 in 100 chance ChatGPT/Google AI return same brand list twice; Claude slightly better but still near-useless for ranking
- 600 volunteers ran 12 different prompts across ChatGPT, Claude, Google AI; results varied in list length, brand selection, and ordering
- Visibility percentage (how often brands appear across 60-100 runs) shows statistical validity despite randomization, contradicting initial hypothesis
- Human-crafted prompts show 0.081 semantic similarity but still produce consistent core brands—suggesting intent recognition works despite prompt chaos
Why It Matters
Companies spending millions on AI visibility tracking need to understand the fundamental randomness they're measuring. While visibility percentages may have statistical merit, single AI responses are unreliable for product recommendations—especially critical for high-stakes decisions like medical care. This challenges the entire premise of AI tracking as a marketing metric and exposes potential for manipulation similar to SEO gaming.
Source: sparktoro.com