◆ Methodology

How dimensions are measured

Your overall score is a weighted average of dimensions. Each dimension below explains what it measures, why it matters, how it's computed, and what to do to grow it.

Cross-tool dimensions

These dimensions score across Claude Code, Codex, and Cowork where the underlying signal exists. Source coverage is noted per dimension.

Customization

What it measures. Whether you configure your AI environment — CLAUDE.md, AGENTS.md, and MCP configs.

Why it matters. The biggest leap from "user" to "operator" is when you start shaping the tool to your work, not the other way around.

How it's measured. Sum of custom_mcp_config_writes and claude_md_writes (or agents_md_writes for Codex) over the rolling 30 days, log-normalized. Skill file authorship is tracked separately under Custom Skills.

How to grow it. Add a CLAUDE.md to your project that captures your conventions. Add one MCP server you'd use weekly.

Sources: Claude · Codex · Cowork

Parallel Agents

What it measures. Whether you're running multiple agents in parallel within a session.

Why it matters. Parallel orchestration is how senior operators compress hours into minutes — branching work across subagents and converging the results.

How it's measured. Weighted blend of breadth (sessions with orchestration), depth (parallel agent turns), and peak (max parallel agents in any session).

How to grow it. Use the Agent tool to fan out independent subtasks. When two pieces of work don't depend on each other, launch them at the same time.

Sources: Claude · Codex · Cowork

Background Work (was Context Leverage)

What it measures. Whether you delegate work to run unattended — Cron, Task, or Monitor primitives.

Why it matters. Senior operators don't sit and wait. They queue work, walk away, and come back to results.

How it's measured. Sessions in which you used context-leverage tools (TaskCreate, ScheduleWakeup, CronCreate, Monitor, etc.), log-normalized to a reference of 80.

How to grow it. The next time a subtask will take a while, schedule it via Task and switch contexts. Use Cron for nightly jobs you'd otherwise babysit.

Sources: Claude

Tool Breadth

What it measures. How broadly you engage with skills and MCP servers.

Why it matters. Skills and MCPs are how Claude Code reaches outside the conversation — into your tools, services, and data.

How it's measured. Distinct count of skill_counts + mcp_server_counts in the 30-day window, log-normalized to a reference of 100.

How to grow it. Install one MCP server you'd actually use weekly. Try a built-in skill you've been ignoring.

Sources: Claude · Codex · Cowork

Planning

What it measures. Observable structured-planning signals in Claude and Codex—not every way you might plan.

Why it matters. Plans cut rework. Power users front-load thinking.

How it's measured. Sessions with Claude's exact built-in Plan agent or ExitPlanMode, or Codex's native Plan Mode or update_plan, log-normalized to a reference of 10. Generic orchestration and multitasking are measured separately and do not count as Planning.

How to grow it. When a task touches more than two files, plan first.

Sources: Claude · Codex

Repetition NEW

What it measures. When you use a skill, do you go deep — or just try once?

Why it matters. Repeat usage signals that a skill has become real practice, not a one-off experiment.

How it's measured. Total invocations across skill_counts in the 30-day window, log-normalized to a reference of 20.

How to grow it. Pick one or two skills and use them repeatedly across a week, instead of trying twenty different things once each.

Sources: Claude · Codex · Cowork

Custom Skills NEW

What it measures. How many of your own custom skills you've authored and actively practice.

Why it matters. Authoring a skill is one thing; making it part of your daily flow is the real signal. Writing your own automation and then using it repeatedly shows both investment and payoff.

How it's measured. Log-normalized blend of two signals: the set of skill names you authored (detected via Write/Edit/apply_patch calls on ~/.claude/skills/<name>/... or ~/.codex/skills/<name>/... paths in your transcripts) intersected with your skill_counts keys, plus the summed practice counts for those authored skills. Namespace suffix-match applies: a plugin-installed skill named <plugin>:<name> still credits you if <name> matches a skill you authored locally and shipped back.

How to grow it. Author a skill that automates a recurring task. Use it in real workflows. Share it back to your company plugin — the namespace suffix-match means you still get credit. Author another.

Sources: Claude · Codex · Cowork

Multi-Tasking NEW

What it measures. How many concurrent sessions you run at peak — multi-tasking across distinct workstreams.

Why it matters. Operators who run several workstreams at once are squeezing more value from the tool than serial users.

How it's measured. Peak max_concurrent_sessions over the 30-day window, log-normalized to a reference of 15.

How to grow it. Try Cowork or run two terminals. The next time you're blocked on one task, start the next.

Sources: Claude · Codex · Cowork

Codex dimensions

These dimensions score only for Codex — the underlying signal doesn't exist in Claude Code or Cowork sessions.

Thinking Mode

What it measures. Reasoning-block frequency per session.

Why it matters. Reasoning blocks are where Codex thinks before acting; high engagement maps to thoughtful operators.

How it's measured. reasoning_blocks ÷ sessions, log-normalized to a reference of 20.

How to grow it. Ask Codex to reason through edge cases before changing code.

Sources: Codex

Cowork dimensions

These dimensions score only for Cowork — they measure scheduling and queue behavior unique to the Cowork runner.

Schedule Reliability

What it measures. How consistently your scheduled Cowork tasks actually run — reliability over time.

Why it matters. Consistent scheduled tasks are the foundation of autonomous orchestration. Reliability compounds.

How it's measured. Ratio of scheduled task completions to scheduled task definitions, log-normalized.

How to grow it. Define more scheduled tasks and run them on a consistent cadence. Audit your scheduled tasks weekly — drop the ones you're not actually using.

Sources: Cowork

Queue Discipline

What it measures. How actively you use Cowork's task queue — chaining work across runs.

Why it matters. Chaining tasks in the queue multiplies Cowork's leverage — each run can feed the next.

How it's measured. Queue event volume over the 30-day window, log-normalized.

How to grow it. Cowork's queue lets you stack multiple tasks — try chaining work that builds on prior runs.

Sources: Cowork

Tracked but no longer scored

We removed these from the score because they were too easily saturated, didn't separate users in practice, or measured the wrong thing. We still capture the raw data so we can revisit if the user base diversifies.

Prompt Efficiency — was constant 100 across users.
Tool Diversity — saturated at the top of the cohort.
Conversation Depth — saturated similarly.
Basic Chat — measured the floor, not the ceiling.

Cowork

Cowork is a multi-workspace agent runner. When you run agents through Cowork, each session aggregates into your daily metrics like any other session.

Cowork sessions tend to score high on Parallel Agents (parallel agents are the default mode) and Custom Skills if you've authored Cowork-specific skills. Other dimensions are where Cowork operators usually find their next leverage.