Glossary term
Glossary term
Evaluation and Benchmarks
An agent that controls graphical desktop applications or OS interfaces.
Anthropic's Computer Use feature allows Claude to control a Linux desktop environment - demonstrated performing tasks in VS Code, LibreOffice, and terminal windows - achieving 14.9% on OSWorld benchmark.
Microsoft's UFO agent (2024) controls Windows desktop applications by interpreting screenshots and issuing UI-Automation API commands - tested across Word, Excel, PowerPoint, and Edge browser tasks.
Adept AI's ACT-1 model was trained to operate desktop GUIs for enterprise software (Salesforce, Workday, SAP) - demonstrated filling CRM records and running reports across business-software UIs without API access.