Glossary term
Glossary term
Evaluation and Benchmarks
An agent that interacts with websites or web applications through browser actions.
OpenAI Operator uses a browser agent to navigate e-commerce sites, fill checkout forms, compare prices, and confirm purchases on behalf of users - demonstrated on DoorDash, Instacart, and United Airlines.
Anthropic's Computer Use API (Claude 3.5 Sonnet) enables browser agents for filling government forms, scraping structured data from websites, and automating multi-step web workflows in enterprise RPA modernisation.
WebVoyager (academic benchmark agent) achieves 59.1% task completion on real-world web tasks - used by researchers to measure browser-agent progress on booking flights, finding information, and filling forms.