Profiling Agent Autonomy

must be grounded on the notion of task complexity.

More autonomy is not always better.

Unveil the full potential of your agent by operating them at the right level of autonomy.

Across Work Domains

• Most domains only evaluate agents on relatively simple tasks

• Agents show slightly higher autonomy in engineering and design tasks.

Loading domain breakdown visualization...

What Does Complexity-k Look Like?

For Cross-Occupational Skills

• Agents perform better on self-contained activities: mental processes & work output

• But struggle at identifying and retrieving information & coordinating with others

Loading skill breakdown visualization...

What Does Complexity-k Look Like?

Comparing agent frameworks are hard

• Systematic comparisons are still difficult due to the lack of trajectory data

• We need more agent trajectories! submit your trajectories here

Loading agent breakdown visualization...

What Does Complexity-k Look Like?

How LM backbones compare

• Most released trajectories rely on a small set of API-based models

• Contribute more trajectories with various models! submit your trajectories here

Loading model breakdown visualization...

What Does Complexity-k Look Like?