Methodology · v1.1

How the Index is built

The Index measures structural AI power: a company's current frontier capability plus the ingredients that produce future capability. A panel of frontier evaluator models scores fifteen labs across ten areas on a 1.0–10.0 scale. The panel mean is the raw figure; a weighted profile is the published default.

1. Framing

A single ranking has to optimize for something. We deliberately reject two narrower framings:

Raw technological supremacy would over-index on Model Quality and ignore the ingredients that decide who's still leading in 18 months.
Long-term commercial dominance would reward current revenue and distribution at the expense of frontier capability — which is what actually moves this market.

Model Quality is the headline number; Research, Compute, Talent, and Data are the inputs that decide where that number goes next. Product, Distribution, and Business matter — but as multipliers on capability, not as substitutes for it.

2. The three tiers

Areas are grouped into three structural tiers totaling 100%.

Core Engine

56%

Foundational prerequisites for frontier AI. A company without strength here cannot compete at the highest level no matter how strong its commercial position.

Area	Weight	Why this weight	Dispute
🧠Model Quality	18%	The headline number. If the models aren't intelligent, nothing downstream matters. Slightly below 20% because Model Quality is a snapshot — R&I is the leading indicator that produces tomorrow's snapshot.	Challenge →
🔬Research	12%	Promoted out of the accelerants bucket. Frontier breakthroughs (transformers, RLHF, MoE, reasoning) all came from research, and the labs that lead always lead by doing research first. Treating R&I as 5% rewards copying over originating.	Challenge →
⚡Compute	14%	Hard gate on what you can train. He who controls the GPUs/TPUs and energy controls the next generation of models.	Challenge →
🏰Data & Moats	12%	Proprietary data, flywheels, and structural lock-in are major differentiators. Slightly below 15% because synthetic data, distillation, and shared web-scale pretraining have eroded the data moat faster than expected.	Challenge →

Delivery

28%

Translating raw capability into reach and execution. Critical, but downstream of the Core Engine.

Area	Weight	Why this weight	Dispute
🏗️Product	10%	How well raw intelligence becomes usable, reliable tools and APIs. Execution at the product layer is where capability becomes power for users and developers.	Challenge →
👥Talent	10%	Compounds everything else. Top-tier talent density and leadership's ability to execute and pivot decide whether a Core Engine advantage gets converted into product.	Challenge →
🌐Distribution	8%	Existing OS / browser / cloud distribution is a real structural advantage. It's an amplifier, not a generator — which is why it sits below Talent and Product.	Challenge →

Accelerants & Stabilizers

16%

Important multipliers and indicators, but lagging or narrative-driven rather than structurally decisive in the current AI arms race.

Area	Weight	Why this weight	Dispute
💰Business	5%	Current revenue matters less than expected — the industry tolerates massive capex without near-term profitability. Penalizing companies for this would distort the ranking.	Challenge →
🛡️Safety	6%	Doesn't generate raw power today, but enterprise trust is starting to gate distribution. Slightly above 5% to reflect that emerging dynamic.	Challenge →
🚀Momentum	5%	Captures the present narrative and shipping velocity. A snapshot, not a moat — narratives shift quickly.	Challenge →

3. How the math works

Each company's overall score is a weighted average:

overall = Σ (area_score × area_weight) / 100

Worked example — a hypothetical company:

Area	Score	Weight	Contribution
Model Quality	9.0	18	1.62
Research & Innovation	8.5	12	1.02
Compute & Infrastructure	8.0	14	1.12
Data & Moats	7.0	12	0.84
Talent & Org	9.0	10	0.90
Product & Platform	8.0	10	0.80
Distribution & Reach	6.0	8	0.48
Business & Market	7.0	5	0.35
Momentum & Execution	8.0	5	0.40
Safety & Alignment	8.0	6	0.48
Weighted overall			8.01

A 10.0 in every area still produces a 10.0 weighted overall — the ceiling is preserved.

4. The 1.0–10.0 rubric

Evaluators score each area on the same anchored scale:

9–10	World-leading; clear #1 or #2 in this area
7–8	Very strong; top tier competitor
5–6	Competitive; solid but not differentiated
3–4	Below average; notable gaps
1–2	Minimal presence or capability in this area

5. Risk-adjusted (optional)

Safety carries a modest weight in the main ranking by design. To prevent companies with serious trust issues from ranking too highly purely on capability and distribution, an optional risk-adjusted overall can be computed alongside the main weighted total:

if safety_score < 5.0:  risk_adjusted = weighted_total × 0.95
if safety_score < 4.0:  risk_adjusted = weighted_total × 0.90
otherwise:              risk_adjusted = weighted_total

Exposed as a separate output, not the headline. Conflating power and risk into one number is a useful secondary view rather than the default.

6. How the weights were set

Weights were developed through structured back-and-forth between three frontier evaluator models, with a human arbitrating the major calls.

v1 (Claude + Gemini). Gemini proposed the 3-tier grouping. The notable disagreement was R&I: Gemini argued 5% (open research gets copied), Claude argued 12% (leaders originate, followers copy). 12% won; the structural framing was kept.

v1.1 (ChatGPT input). ChatGPT proposed a more output-weighted scheme (Business 12, Momentum 9, Talent 5). The framing dispute was resolved in favor of structural framing — output-weighting embeds incumbent bias. Three threads were absorbed: Talent 11→10, Product 9→10, and a Safety risk modifier instead of fighting over Safety's headline weight.

Grok and additional evaluator models were not consulted for v1.1.

7. Open questions

R&I vs. Model Quality overlap. Both reward capability. If evaluators flag this as double-counting, we may rebalance.
Safety drift. If enterprise trust becomes a stronger gate on distribution, Safety may need to move up.
Compute commoditization. If access becomes meaningfully more even, Compute may need to come down.
Annual re-evaluation. The industry shifts fast enough that 2026 weights may not fit 2027.