Profiling Agent Success and Autonomy

More autonomy is not always better.

Unveil the full potential of your agent by operating them at the right level of autonomy.

• Most domains only evaluate agents on relatively simple tasks

• Agents show slightly higher autonomy in engineering and design tasks.

Loading domain breakdown visualization...

Domain:

Complexity Level:

Example Task Instruction:

• Agents perform better on self-contained activities: mental processes & work output

• But struggle at identifying and retrieving information & coordinating with others

Loading skill breakdown visualization...

Skill:

Complexity Level:

Example Task Instruction:

• Systematic comparisons are still difficult due to the lack of trajectory data

• We need more agent trajectories! submit your trajectories here

Loading agent breakdown visualization...

Agent:

Complexity Level:

Example Task Instruction:

• Most released trajectories rely on a small set of API-based models

• Contribute more trajectories with various models! submit your trajectories here

Loading model breakdown visualization...

Model:

Complexity Level:

Example Task Instruction:

Profiling Agent Autonomy