Midscene.js

Official Website: https://midscenejs.com/

Open-source, vision-driven UI testing — write tests in natural language, automate any platform.

📣 Midscene Skills is here!

Use Midscene Skills to control any platform with OpenClaw

Showcases

💡 Why Midscene

Most UI automation — including AI tools that read the DOM or the accessibility tree — depends on page structure. That structure is fragile and incomplete: selectors break on every refactor, elements without semantic markup (icon-only buttons, custom controls, <canvas>) are invisible to it, native apps and cross-origin iframes are out of reach, and it cannot tell whether something actually looks right. Midscene works from the screenshot alone, and you describe each step in natural language:

Less maintenance — no selectors to chase when the UI changes.
Reach every element and surface — if a human can see it, Midscene can target it, even with no semantic annotations, on <canvas>, native apps, and cross-origin iframes.
Assert what users actually see — verify colors, highlights, layout, and rendered state, not just whether a DOM node exists.
Two ways to test — add Midscene to your Playwright / Vitest suite, or let an AI agent test autonomously via Skills and MCP.

Midscene is built for UI testing first, but the same vision-driven engine handles any UI automation task.

💡 What you can automate

Midscene works anywhere you can take a screenshot — web browsers, Android, iOS, HarmonyOS, desktop apps, and any custom interface — all through one API. Write automation with the JavaScript SDK or in YAML, hand it to AI agents via Skills and MCP, and look up every method (aiAct, aiQuery, aiAssert, and more) in the API reference.

🚀 Get started

Write your first script in a few minutes — Quick start.
No code? Try Midscene on any web page with the Chrome extension.
Other platforms — getting-started guides for Android, iOS, HarmonyOS, and desktop.

✨ Driven by Multimodal Models

Midscene is all-in on pure vision for UI actions: element localization is based on screenshots only. It runs on multimodal models with strong UI localization, such as Qwen3.x, Doubao-Seed-2.0, GLM-4.6V, gemini-3.5-flash, and UI-TARS, including open-source options you can self-host. For data extraction and page understanding, you can still opt in to include DOM when needed.

📄 Resources

Documentation: https://midscenejs.com
Sample projects: midscene-example
API reference: https://midscenejs.com/api

🤝 Community

🌟 Awesome Midscene

Community projects that extend Midscene.js capabilities:

midscene-ios - iOS Mirror automation support for Midscene
midscene-pc - PC operation device for Windows, macOS, and Linux
midscene-pc-docker - Docker image with Midscene-PC server pre-installed
Midscene-Python - Python SDK for Midscene automation
midscene-java by @Master-Frank - Java SDK for Midscene automation
midscene-java by @alstafeev - Java SDK for Midscene automation

📝 Credits

We would like to thank the following projects:

Rsbuild and Rslib for the build tool.
UI-TARS for the open-source agent model UI-TARS.
Qwen-VL for the open-source multimodal model Qwen-VL.
scrcpy and yume-chan allow us to control Android devices with browser.
appium-adb for the javascript bridge of adb.
appium-webdriveragent for the javascript operate XCTest。
YADB for the yadb tool which improves the performance of text input.
libnut-core for the cross-platform native keyboard and mouse control.
Puppeteer for browser automation and control.
Playwright for browser automation and control and testing.

📖 Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

✨ Star History

📝 License

Midscene.js is MIT licensed.

If this project helps you or inspires you, please give us a star

Name		Name	Last commit message	Last commit date
Latest commit History 1,979 Commits
.github		.github
.vscode		.vscode
apps		apps
packages		packages
scripts		scripts
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
biome.json		biome.json
commitlint.config.js		commitlint.config.js
cspell.config.cjs		cspell.config.cjs
nx.json		nx.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Midscene.js

📣 Midscene Skills is here!

Showcases

💡 Why Midscene

💡 What you can automate

🚀 Get started

✨ Driven by Multimodal Models

📄 Resources

🤝 Community

🌟 Awesome Midscene

📝 Credits

📖 Citation

✨ Star History

📝 License

About

Uh oh!

Releases 177

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Midscene.js

📣 Midscene Skills is here!

Showcases

💡 Why Midscene

💡 What you can automate

🚀 Get started

✨ Driven by Multimodal Models

📄 Resources

🤝 Community

🌟 Awesome Midscene

📝 Credits

📖 Citation

✨ Star History

📝 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 177

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages