must be grounded on the notion of task complexity.
More autonomy is not always better.
Unveil the full potential of your agent by operating them at the right level of autonomy.
• Most domains only evaluate agents on relatively simple tasks
• Agents show slightly higher autonomy in engineering and design tasks.
Loading domain breakdown visualization...
• Agents perform better on self-contained activities: mental processes & work output
• But struggle at identifying and retrieving information & coordinating with others
Loading skill breakdown visualization...
• Systematic comparisons are still difficult due to the lack of trajectory data
• We need more agent trajectories! submit your trajectories here
Loading agent breakdown visualization...
• Most released trajectories rely on a small set of API-based models
• Contribute more trajectories with various models! submit your trajectories here
Loading model breakdown visualization...