Skip to content
AgentRadar
UI TARS Desktop

UI TARS Desktop

ByteDance's open-source multimodal AI agent that controls your computer, browser, and terminal via vision

MCP Servers Open Source

Released

2025-01

Country

China

API

Available

Self-Host

Yes

GitHub Stars

36,428

Last Reviewed

2026-06

About UI TARS Desktop

UI TARS Desktop is an open-source multimodal AI agent stack from ByteDance that brings vision-driven 'computer use' to your everyday machine. Rather than relying on accessibility APIs or brittle DOM selectors alone, its underlying vision-language models (the UI-TARS / Seed-1.5-VL family) perceive raw screenshots and rendered UI frames, then decide and execute the clicks, typing, and navigation needed to complete a task — much like Claude Computer Use or OpenAI's computer-using agent, but open-source and self-hostable. The product spans several surfaces. UI TARS Desktop is a native Electron application that lets the agent take over and operate your whole computer; Agent TARS focuses on the browser, controlling web pages through natural language with seamless DOM integration; and the stack extends into the terminal and production environments. This makes it a general-purpose agent platform rather than a single-purpose tool: you can ask it to fill out a web form, orchestrate a multi-app desktop workflow, or automate repetitive GUI tasks that resist traditional scripting. A major draw is local deployability. The UI-TARS-1.5-7B model is small enough to run on consumer hardware, so privacy-sensitive users can keep agent execution entirely on their own machine rather than streaming screenshots to a third-party cloud. The project has become one of the most significant open-source agents available, accumulating tens of thousands of GitHub stars shortly after release. UI TARS Desktop is aimed at developers, automation engineers, and power users who want an open, auditable computer-use agent — whether to build on top of the stack, self-host for privacy, or experiment with GUI automation that generalizes across applications.

Our Verdict

The open-source answer to computer-use agents. UI TARS Desktop pairs vision-driven GUI control with local deployability and ByteDance's model firepower — a compelling pick for developers who want an auditable, self-hostable agent that operates real software.

Features

Vision-driven computer use (screenshots)
Controls desktop, browser & terminal
Agent TARS browser control (DOM)
Local deployment (7B model on consumer GPU)
Natural-language task execution
Open-source & auditable

Detailed Ratings

Ease of Use
7.4
Value for Money
8.4
Features
8.2
Support
7.4
Performance
8.0
Overall Rating
8.0 /10

Pros & Cons

Pros

  • Genuine open-source computer-use agent — an alternative to closed options
  • Vision-based, so it generalizes across arbitrary GUIs
  • Runs locally on consumer hardware for privacy
  • Active ByteDance-backed development, tens of thousands of stars
  • Spans desktop, browser, and terminal from one stack

Cons

  • GUI agents can be slow and occasionally unreliable
  • Requires technical setup and a capable GPU for local runs
  • Early-stage ecosystem compared to mature automation tools

Use Cases

Cross-application desktop automationBrowser task automationRepetitive GUI workflowsPrivacy-preserving local agentsComputer-use agent R&D

Who Is It For?

Developers and power users who want an open-source, self-hostable computer-use agent that controls desktop, browser, and terminal via vision

#computer-use#multimodal#gui-agent#open-source#vision-language#automation#bytedance#self-hosted

Frequently Asked Questions

What is UI TARS Desktop?

UI TARS Desktop is an open-source multimodal AI agent stack by ByteDance. It uses vision-language models to perceive your screen and operate your computer, browser, and terminal through natural language — a self-hostable 'computer use' agent.

Is UI TARS Desktop free?

Yes. It is free and open-source. The UI-TARS-1.5-7B model can run on consumer hardware, so you can self-host the full agent locally. You only pay for compute if you use cloud GPUs or a remote model API.

How does UI TARS differ from Claude Computer Use?

Both are vision-driven computer-use agents. UI TARS is open-source and self-hostable with ByteDance's own vision-language models, whereas Claude Computer Use is a closed, cloud-based capability. UI TARS also spans desktop, browser, and terminal from one stack.

Can UI TARS run locally on my own machine?

Yes. The UI-TARS-1.5-7B model is sized to run on most consumer hardware, so you can keep screenshots and agent execution entirely on your own device for privacy.

Related Agents

Top Alternatives

Compare with these similar tools

Links & Resources

AR Researched by AgentRadar Editorial Team · Our methodology