Study: AI Tools Give Wildly Inconsistent Brand Recommendations

TL;DR

Research of 2,961 AI prompt runs shows <1% chance of identical results, challenging $100M+ AI tracking industry.

Key Points

<1 in 100 chance ChatGPT/Google AI return same brand list twice; Claude slightly better but still near-useless for ranking
600 volunteers ran 12 different prompts across ChatGPT, Claude, Google AI; results varied in list length, brand selection, and ordering
Visibility percentage (how often brands appear across 60-100 runs) shows statistical validity despite randomization, contradicting initial hypothesis
Human-crafted prompts show 0.081 semantic similarity but still produce consistent core brands—suggesting intent recognition works despite prompt chaos

Why It Matters

Companies spending millions on AI visibility tracking need to understand the fundamental randomness they're measuring. While visibility percentages may have statistical merit, single AI responses are unreliable for product recommendations—especially critical for high-stakes decisions like medical care. This challenges the entire premise of AI tracking as a marketing metric and exposes potential for manipulation similar to SEO gaming.

Read the full research and methodology

Source: sparktoro.com