How the Index is built
The Index measures structural AI power: a company's current frontier capability plus the ingredients that produce future capability. A panel of frontier evaluator models scores fifteen labs across ten areas on a 1.0–10.0 scale. The panel mean is the raw figure; a weighted profile is the published default.
1. Framing
A single ranking has to optimize for something. We deliberately reject two narrower framings:
- Raw technological supremacy would over-index on Model Quality and ignore the ingredients that decide who's still leading in 18 months.
- Long-term commercial dominance would reward current revenue and distribution at the expense of frontier capability — which is what actually moves this market.
Model Quality is the headline number; Research, Compute, Talent, and Data are the inputs that decide where that number goes next. Product, Distribution, and Business matter — but as multipliers on capability, not as substitutes for it.
2. The three tiers
Areas are grouped into three structural tiers totaling 100%.
Core Engine
56%Foundational prerequisites for frontier AI. A company without strength here cannot compete at the highest level no matter how strong its commercial position.
| Area | Weight | Why this weight | Dispute |
|---|---|---|---|
| 🧠Model Quality | 18% | The headline number. If the models aren't intelligent, nothing downstream matters. Slightly below 20% because Model Quality is a snapshot — R&I is the leading indicator that produces tomorrow's snapshot. | Challenge → |
| 🔬Research | 12% | Promoted out of the accelerants bucket. Frontier breakthroughs (transformers, RLHF, MoE, reasoning) all came from research, and the labs that lead always lead by doing research first. Treating R&I as 5% rewards copying over originating. | Challenge → |
| ⚡Compute | 14% | Hard gate on what you can train. He who controls the GPUs/TPUs and energy controls the next generation of models. | Challenge → |
| 🏰Data & Moats | 12% | Proprietary data, flywheels, and structural lock-in are major differentiators. Slightly below 15% because synthetic data, distillation, and shared web-scale pretraining have eroded the data moat faster than expected. | Challenge → |
Delivery
28%Translating raw capability into reach and execution. Critical, but downstream of the Core Engine.
| Area | Weight | Why this weight | Dispute |
|---|---|---|---|
| 🏗️Product | 10% | How well raw intelligence becomes usable, reliable tools and APIs. Execution at the product layer is where capability becomes power for users and developers. | Challenge → |
| 👥Talent | 10% | Compounds everything else. Top-tier talent density and leadership's ability to execute and pivot decide whether a Core Engine advantage gets converted into product. | Challenge → |
| 🌐Distribution | 8% | Existing OS / browser / cloud distribution is a real structural advantage. It's an amplifier, not a generator — which is why it sits below Talent and Product. | Challenge → |
Accelerants & Stabilizers
16%Important multipliers and indicators, but lagging or narrative-driven rather than structurally decisive in the current AI arms race.
| Area | Weight | Why this weight | Dispute |
|---|---|---|---|
| 💰Business | 5% | Current revenue matters less than expected — the industry tolerates massive capex without near-term profitability. Penalizing companies for this would distort the ranking. | Challenge → |
| 🛡️Safety | 6% | Doesn't generate raw power today, but enterprise trust is starting to gate distribution. Slightly above 5% to reflect that emerging dynamic. | Challenge → |
| 🚀Momentum | 5% | Captures the present narrative and shipping velocity. A snapshot, not a moat — narratives shift quickly. | Challenge → |
3. How the math works
Each company's overall score is a weighted average:
overall = Σ (area_score × area_weight) / 100
Worked example — a hypothetical company:
| Area | Score | Weight | Contribution |
|---|---|---|---|
| Model Quality | 9.0 | 18 | 1.62 |
| Research & Innovation | 8.5 | 12 | 1.02 |
| Compute & Infrastructure | 8.0 | 14 | 1.12 |
| Data & Moats | 7.0 | 12 | 0.84 |
| Talent & Org | 9.0 | 10 | 0.90 |
| Product & Platform | 8.0 | 10 | 0.80 |
| Distribution & Reach | 6.0 | 8 | 0.48 |
| Business & Market | 7.0 | 5 | 0.35 |
| Momentum & Execution | 8.0 | 5 | 0.40 |
| Safety & Alignment | 8.0 | 6 | 0.48 |
| Weighted overall | 8.01 | ||
A 10.0 in every area still produces a 10.0 weighted overall — the ceiling is preserved.
4. The 1.0–10.0 rubric
Evaluators score each area on the same anchored scale:
| 9–10 | World-leading; clear #1 or #2 in this area |
| 7–8 | Very strong; top tier competitor |
| 5–6 | Competitive; solid but not differentiated |
| 3–4 | Below average; notable gaps |
| 1–2 | Minimal presence or capability in this area |
5. Risk-adjusted (optional)
Safety carries a modest weight in the main ranking by design. To prevent companies with serious trust issues from ranking too highly purely on capability and distribution, an optional risk-adjusted overall can be computed alongside the main weighted total:
if safety_score < 5.0: risk_adjusted = weighted_total × 0.95 if safety_score < 4.0: risk_adjusted = weighted_total × 0.90 otherwise: risk_adjusted = weighted_total
Exposed as a separate output, not the headline. Conflating power and risk into one number is a useful secondary view rather than the default.
6. How the weights were set
Weights were developed through structured back-and-forth between three frontier evaluator models, with a human arbitrating the major calls.
v1 (Claude + Gemini). Gemini proposed the 3-tier grouping. The notable disagreement was R&I: Gemini argued 5% (open research gets copied), Claude argued 12% (leaders originate, followers copy). 12% won; the structural framing was kept.
v1.1 (ChatGPT input). ChatGPT proposed a more output-weighted scheme (Business 12, Momentum 9, Talent 5). The framing dispute was resolved in favor of structural framing — output-weighting embeds incumbent bias. Three threads were absorbed: Talent 11→10, Product 9→10, and a Safety risk modifier instead of fighting over Safety's headline weight.
Grok and additional evaluator models were not consulted for v1.1.
7. Open questions
- R&I vs. Model Quality overlap. Both reward capability. If evaluators flag this as double-counting, we may rebalance.
- Safety drift. If enterprise trust becomes a stronger gate on distribution, Safety may need to move up.
- Compute commoditization. If access becomes meaningfully more even, Compute may need to come down.
- Annual re-evaluation. The industry shifts fast enough that 2026 weights may not fit 2027.