Anthropic has introduced a new feature in public beta for its Claude 3.5 Sonnet AI model, which allows AI to interact with a computer in a similar way to a human. This ability, called “computer use”, allows Claude to analyze the screen, move the cursor, click buttons and type text. The functionality is currently available via API, giving developers the ability to integrate Claude into applications that require interaction with a computer. A demonstration video shows Claude working on a Mac, opening up new perspectives for process automation and user support.
Although other companies like Microsoft (with Copilot Vision), OpenAI (with the desktop app for ChatGPT), and Google (with Gemini for Android) have already explored the interaction between AI and the computer screen, Anthropic appears to be the first to release tools of this type on a large scale. It’s important to note that the “computer use” feature is still experimental and, as Anthropic points out, may have limitations in terms of accuracy and smoothness. Some common actions, such as dragging and zooming, are not yet supported by Claude. Furthermore, the screen analysis system, based on a series of screenshots rather than a continuous video stream, can result in temporary elements such as notifications being missed.
Anthropic has also implemented security measures to limit the use of Claude in sensitive areas. For example, there are mechanisms to prevent AI from being used for election-related activities, for generating and publishing content on social media, for registering web domains or for interacting with government sites. In addition to the new “computer use” feature, Claude 3.5 Sonnet features significant improvements in several benchmarks, particularly when it comes to coding and tool usage. Coding performance has increased dramatically, surpassing that of other public models, including reasoning models such as OpenAI o1-preview and specialized systems.