Windows 365 for Agents MCP server reference

Windows 365 for Agents is an MCP server that gives you full operational control of a Windows 365 cloud PC. Use this MCP server to drive a real Windows environment through desktop interaction (mouse, keyboard, screen capture, command execution), browser automation via Microsoft Edge, and semantic UI inspection via Windows UI Automation.

Note

Browser automation works on Microsoft Edge. Edge launches automatically on the first browser tool call. focus_browser can also target Chrome or Firefox, but DOM-level browser tools only work on the Edge instance.

To learn more about Windows 365 for Agents, see Windows 365 for Agents documentation.

Overview

Server ID Tenant-level URL Display name Description
mcp_W365ComputerUse https://agent365.svc.cloud.microsoft/
agents/tenants/{tenantId}/
servers/mcp_W365ComputerUse
Windows 365 for Agents MCP server Full operational control of a Windows 365 cloud PC, including desktop interaction, browser automation, and UI inspection.

Available tools

mcp_W365ComputerUse_StartSession

Starts a Windows 365 Computer Use session, establishing a connection to a cloud PC and allocating a Cloud PC resource. Returns the sessionId which can be used to manage the session.

No required parameters.

mcp_W365ComputerUse_EndSession

Ends an active Windows 365 Computer Use session and releases the associated cloud PC resources. Pass the sessionId returned by mcp_W365ComputerUse_StartSession.

Required parameters: sessionId

mcp_W365ComputerUse_GetSessionDetails

Returns metadata for one Windows 365 Computer Use session identified by the sessionId. Pass the sessionId returned by mcp_W365ComputerUse_StartSession. Doesn't list multiple sessions.

Required parameters: sessionId

move_mouse

Moves the cursor to a screen position. Use click instead if you intend to click at the destination. Required parameters:

  • x: X coordinate in screen pixels
  • y: Y coordinate in screen pixels

click

Clicks at a position, or at the current cursor location if coordinates are omitted. Supports single-click, double-click, and all five mouse buttons.

Optional parameters:

  • x: X coordinate in screen pixels (omit for current position)
  • y: Y coordinate in screen pixels (omit for current position)
  • button: Left, Right, Middle, Forward, or Backward (default Left)
  • clickCount: 1 = single click, 2 = double click (default 1)

get_cursor_position

Returns the current cursor coordinates. No parameters. Returns {cursorX, cursorY}.

drag_mouse

Drags from one position to another. Useful for moving objects, resizing windows, or pixel-precise scrolling. Required parameters:

  • startX: Start X coordinate.
  • startY: Start Y coordinate.
  • endX: End X coordinate.
  • endY: End Y coordinate. Optional parameters:
  • button: Left, Right, or Middle (default is Left)

scroll

Scrolls at a position by using notch units, not pixels. Three notches are approximately one page.

Required parameters:

  • x: Scroll position X
  • y: Scroll position Y

Optional parameters:

  • deltaX: Horizontal notches, positive = right (default 0)
  • deltaY: Vertical notches, positive = down (default 0)

Note

Values are clamped to the range [-20, 20].

type_text

Types text by simulating keyboard input. For keyboard shortcuts, use press_keys. For web form fields, use browser_type.

Required parameters:

  • text: Text to type.

Optional parameters:

  • usePaste: Paste text from clipboard instead of typing.

press_keys

Presses a key combination simultaneously. Supports modifier keys, function keys, and standard keys.

Required parameters:

  • keys: Array of key names to press together (for example, ["ctrl","c"], ["alt","tab"], ["ctrl","shift","s"])

take_screenshot

Captures the full screen or a cropped region as a PNG image (base64-encoded).

Optional parameters:

  • x: Crop region left edge
  • y: Crop region top edge
  • width: Crop region width
  • height: Crop region height

Note

Provide all four crop parameters together, or omit all four for a full-screen capture.

zoom_region

Captures a screen region at native resolution as a PNG image (base64-encoded). Use this feature to inspect small text or dense UI elements that are hard to read in a downscaled full-screen screenshot.

Required parameters:

  • x: Left edge X coordinate in screen pixels
  • y: Top edge Y coordinate in screen pixels
  • width: Region width in pixels
  • height: Region height in pixels

Note

The maximum region size is 1920x1080 pixels.

analyze_screen

Performs OCR on the entire screen. No parameters. Returns {fullText, averageConfidence, boxes[{text, confidence, x, y, width, height}], width, height}.

get_screen_size

Returns the screen resolution. No parameters. Returns {width, height}.

list_windows

Lists all visible windows with their titles, positions, and dimensions. No parameters. Returns an array of {title, processName, handle, x, y, width, height}.

activate_window

Brings a window to the foreground by using a fuzzy title match.

Required parameters:

  • title: Partial window title (case-insensitive substring)

focus_browser

Focuses a browser window (Edge, Chrome, or Firefox), optionally filtered by URL or title.

Optional parameters:

  • pattern: URL or title substring to match (omit for any browser window)

close_window

Closes a window gracefully by using a fuzzy title match. The system protects critical processes and you can't close them.

Required parameters:

  • title: Partial window title (80% match threshold). Returns {matchedTitle, processName, closed}.

resize_window

Resizes, moves, maximizes, minimizes, or restores a window by using a fuzzy title match.

Required parameters:

  • title: Window title to match (case-insensitive fuzzy match)
  • action: Action to perform - Resize, Move, Maximize, Minimize, or Restore

Optional parameters:

  • x: Left edge X coordinate (used with Resize or Move)
  • y: Top edge Y coordinate (used with Resize or Move)
  • width: Width in pixels (used with Resize)
  • height: Height in pixels (used with Resize)

execute_shell_command

Runs a shell command in a sandbox environment. The command is checked against an allow list, and dangerous patterns are blocked.

Required parameters:

  • command: Command to run

Optional parameters:

  • cwd: Working directory. Use forward slashes (for example, C:/Users/me/project).
  • timeoutMs: Timeout in milliseconds (default 30000, max 120000)

Note

  • Allowed commands: git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, type, and notepad.
  • Blocked patterns include shell metacharacters (|, ;, &, <, >), environment variable expansion (%VAR%), interpreter eval flags (python -c or node -e), git config --global, npm -g, path-prefixed executables, rm -rf, sudo, and disk or system commands.
  • The command's stdout and stderr each truncate at 32 KB. For arbitrary computation, use execute_python_code. The command returns {stdout, stderr, exitCode, success, timedOut, resourceLimitsApplied}.

execute_python_code

Executes Python code in a sandbox environment with resource limits. This function is ideal for data processing, calculations, file I/O, and any computation that goes beyond simple shell commands.

Required parameters:

  • code: Python code (max 262,144 characters).

Optional parameters:

  • cwd: Working directory. Use forward slashes.
  • timeoutMs: Timeout in milliseconds (default 30000, max 120000).

Returns the same schema as execute_shell_command.

Note

The sandbox enforces a 512 MB memory limit and a 30-second timeout.

wait_milliseconds

Pauses execution to allow animations or transitions to complete. Don't use this function in polling loops. Instead, use browser_wait_for for DOM polling.

Required parameters:

  • ms: Wait duration in milliseconds (clamped to [0, 5000])

clipboard_read

Reads the current content of the system clipboard. This command doesn't require any parameters. It returns a JSON object that describes the clipboard format and payload, which can be either a text string or a base64-encoded image.

clipboard_write

Writes text to the system clipboard, replacing the current content.

Required parameters:

  • text: Text to write to the clipboard

Returns a confirmation that includes the character count.

list_processes

Lists running processes in the current session. Each entry includes the PID, process name, memory usage, window title (if any), and startTimeTicks. Pair startTimeTicks with kill_process to prevent killing a recycled PID.

Optional parameters:

  • maxCount: Maximum number of processes to return (default 200)

Returns a JSON array of process info objects.

kill_process

Terminates a process by PID. Supply the startTime value from list_processes to protect against PID recycling.

Required parameters:

  • pid: Process ID returned by list_processes
  • startTime: Process start time ticks returned by list_processes

Optional parameters:

  • force: Force-kill without a graceful shutdown (default false)

Returns a JSON result describing the outcome.

launch_application

Launches a GUI application from an allowed directory. Use execute_shell_command for CLI commands instead.

Required parameters:

  • path: Absolute path to the executable. Use forward slashes (for example, C:/Program Files/app.exe).

Optional parameters:

  • args: Array of command-line arguments

Returns {path, pid}.

get_system_info

Returns the OS version, CPU, RAM, available disk space, and display resolution. No parameters. Returns a JSON object containing the system information.

browser_navigate

Navigates to a URL and waits for the page to load.

Required parameters:

  • url: Full URL including protocol (for example, https://example.com)

browser_back

Navigates back in browser history. No parameters.

browser_forward

Navigates forward in browser history. No parameters.

browser_reload

Reloads the current page. No parameters.

browser_get_url

Returns the current page URL as a plain string. No parameters.

browser_get_title

Returns the current page title as a plain string. No parameters.

browser_get_text

Returns the visible page text content as a plain string. No parameters. Truncated at 512 KB.

browser_get_html

Returns the full page HTML source as a plain string. No parameters. Truncated at 512 KB.

browser_get_page_state

Retrieves multiple page state fields in a single call. Useful for capturing several signals at once without issuing separate tool calls.

Required parameters:

  • fields: Array of fields to return. Allowed values: url, title, dom, screenshot, tabs

Returns a JSON object containing only the requested fields.

browser_click

Clicks a DOM element by CSS selector. More reliable than coordinate-based clicking for web content.

Required parameters:

  • selector: CSS selector (for example, #submit-btn or a.nav-link)

browser_type

Types text into a form element by using a CSS selector.

Required parameters:

  • selector: CSS selector of the input element.
  • text: Text to type.

browser_query_text

Gets the text content of the first element that matches a CSS selector.

Required parameters:

  • selector: CSS selector.

browser_wait_for

Waits for a DOM element to appear. This function is useful for dynamic content that loads asynchronously.

Required parameters:

  • selector: CSS selector to wait for.

Optional parameters:

  • timeoutMs: Timeout in milliseconds. The default is 5,000 and the maximum is 30,000.

browser_eval_js

Evaluates a JavaScript expression in the page context and returns the result as a string.

Required parameters:

  • expression: JavaScript expression that returns a string

Note

If your expression returns an object or number, convert it to a string explicitly (for example, JSON.stringify(obj) or .toString()).

browser_list_tabs

Lists all open tabs with their index, title, and URL. No parameters required. Returns an array of {index, title, url}.

Optional parameters:

  • tabId: Unique tab identifier

browser_switch_tab

Switches to a tab by index.

Required parameters:

  • tabIndex: 0-based tab index

Optional parameters:

  • tabId: Unique tab identifier

browser_new_tab

Opens a new tab, optionally navigating to a URL.

Optional parameters:

  • url: URL to open (blank tab if omitted)

Returns {index, title, url}.

browser_create_tabs

Opens multiple tabs at once. Optionally bring one of them to the foreground.

Required parameters:

  • urls: Array of URLs to open, one tab per URL

Optional parameters:

  • foregroundIndex: Index of the tab to bring to the foreground after creation (omit to keep the current tab focused)

Returns a text confirmation.

browser_close_tab

Closes a tab by index.

Required parameters:

  • tabIndex: 0-based tab index Optional parameters:

  • tabId: Unique tab identifier

browser_screenshot

Captures a PNG screenshot of the browser viewport only (not the full screen). No parameters. Returns a base64-encoded PNG.

browser_select_option

Selects one or more options in a <select> element by their value attribute.

Required parameters:

  • selector: CSS selector for the <select> element
  • values: Array of option value(s) to select

Returns a confirmation with the count of selected options.

browser_fill_form

Fill multiple form fields in a single call. Each entry is a {selector, value} pair. The operation stops on the first failure and reports which fields succeeded.

Required parameters:

  • fields: Array of {selector, value} pairs

Returns a confirmation with the count of filled fields.

browser_drag

Drags a source element onto a target element. Both elements are identified by CSS selector.

Required parameters:

  • sourceSelector: CSS selector of the drag source
  • targetSelector: CSS selector of the drop target

browser_pdf_save

Saves the current page as a PDF file. Destination paths are restricted to %USERPROFILE% or %TEMP%.

Required parameters:

  • filePath: Destination file path under %USERPROFILE% or %TEMP%. Use forward slashes.

Returns a confirmation including the saved file path.

browser_handle_dialog

Accepts or dismisses a pending browser dialog (alert, confirm, prompt, or beforeunload). Returns "No dialog pending" if no dialog is active.

Required parameters:

  • action: accept or dismiss

Optional parameters:

  • promptText: Text to supply to a prompt dialog (ignored for alert and confirm)

browser_get_cookies

Gets cookies for the current page, or for a specified set of URLs. Cookie values are always redacted for security; names, domains, paths, and flags are returned.

Optional parameters:

  • urls: Array of URLs to get cookies for (omit for the current page)

Returns an array of cookie objects with redacted values.

browser_set_cookies

Sets cookies on the current page's domain. This action adds or overwrites cookies but doesn't clear existing cookies.

Required parameters:

  • cookies: Array of cookie objects. Each entry requires name and value. Optional fields: domain, path, secure, httpOnly, sameSite.

Returns a text confirmation.

browser_execute_batch

Executes multiple browser actions sequentially in a single call. This action stops on the first failure and returns the results collected up to that point.

Required parameters:

  • actions: Array of {action, params} objects. Allowed actions: navigate, snapshot, click_ref, type_ref, hover_ref, scroll_ref, keypress_ref, wait_for, eval_js.

Returns an array of results, one per executed action.

browser_snapshot

Captures the page's accessibility tree with stable ref IDs (for example, e5) that map to DOM nodes. Use the refs with browser_click_ref, browser_type_ref, and browser_hover_ref. Refs expire when the page navigates—retake a snapshot after navigation.

Optional parameters:

  • maxDepth: Maximum tree depth, 1-10 (default 5)
  • includeIframes: Include cross-origin iframes (default true)

Returns a JSON object containing the accessibility snapshot and ref IDs.

browser_click_ref

Clicks an element by ref ID from browser_snapshot. A hit-test verifies that no other element overlays the target. Fails if the snapshot expires—retake the snapshot in that case.

Required parameters:

  • snapshotId: Snapshot ID returned by browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes

Optional parameters:

  • button: Left, Right, or Middle (default Left)
  • clickCount: 1 = single click, 2 = double click (default 1)

Returns a confirmation including the clicked coordinates.

browser_type_ref

Types text into an element by using the ref ID from browser_snapshot. The element is focused first, and existing text is cleared by default. The operation fails if the snapshot expires.

Required parameters:

  • snapshotId: Snapshot ID returned by browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes
  • text: Text to type

Optional parameters:

  • clear: Clear existing text first (default true)

Returns a confirmation that includes the character count.

browser_hover_ref

Hovers over an element by using the ref ID from browser_snapshot. Returns immediately. The operation fails if the snapshot expires - retake the snapshot in that case.

Required parameters:

  • snapshotId: Snapshot ID returned by browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes

Returns a confirmation including the hover coordinates.

get_accessibility_tree

Retrieves the UI element tree for the foreground window. Each element includes its role, name, value, and screen coordinates.

Optional parameters:

  • maxDepth: Maximum tree traversal depth, 1-10 (default 3)
  • maxElements: Maximum elements to return, 1-2000 (default 500)

Returns a hierarchical tree of {role, name, value, x, y, width, height, children[...]}.

browser_keypress_ref

Presses a single key on an element by ref ID from browser_snapshot. The element is focused first. Supports modifier keys. Fails if the snapshot has expired — retake the snapshot in that case.

Required parameters:

  • snapshotId: Snapshot ID returned by browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes
  • key: Key name — for example, Enter, Escape, Tab, ArrowUp, ArrowDown, or F1F12

Optional parameters:

  • modifiers: Array of modifier keys to hold during the press — Ctrl, Shift, Alt, or Meta

Returns a text confirmation.

browser_scroll_ref

Scrolls an element into view by ref ID from browser_snapshot. Optionally, scrolls by a pixel delta within the element. Fails if the snapshot expires.

Required parameters:

  • snapshotId: Snapshot ID returned by browser_snapshot
  • ref: Element ref (for example, e5) from the snapshot nodes

Optional parameters:

  • deltaX: Horizontal scroll delta in pixels (default 0)
  • deltaY: Vertical scroll delta in pixels (default 0)

Returns a text confirmation.

browser_set_file_input_ref

Sets files on a file input element by ref ID from browser_snapshot. File paths are restricted to the user's Documents, Downloads, Desktop, or %TEMP% directories.

Required parameters:

  • snapshotId: Snapshot ID returned by browser_snapshot
  • ref: Element ref for the file input
  • filePaths: Array of file paths to upload

Returns a text confirmation.

find_ui_element

Searches for UI elements by text content, accessibility role, or name (case-insensitive substring). Returns matching elements with their clickable screen coordinates.

Optional parameters:

  • text: Text to search for (used as name if name omitted)
  • role: UI role filter - Button, TextBox, CheckBox, MenuItem, ComboBox, and more
  • name: Accessible name (takes precedence over text if both provided)
  • windowHandle: Target window handle (null = foreground window)

Key features

Desktop interaction

  • Click, double-click, right-click, and five-button mouse control.
  • Pixel-precise drag and drop.
  • Notch-based scrolling (three notches ≈ one page).
  • Keyboard typing and multi-key shortcut combos.
  • Cursor position tracking.
  • Screen resolution detection.

Screen capture and analysis

  • Full-screen or cropped PNG screenshots.
  • OCR of the full screen with per-region confidence scores and bounding boxes.
  • Browser-viewport-only screenshots for web content.

Window management

  • Enumerate all visible windows with positions and dimensions.
  • Activate windows by fuzzy title match.
  • Focus browser windows (Edge, Chrome, Firefox) optionally filtered by URL or title.
  • Graceful window close with protection for system-critical processes.

Command execution

  • Sandbox shell commands with an allow list (git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, type).
  • Sandbox Python execution up to 262,144 characters of code.
  • Working-directory and per-call timeout control (max 30 seconds).
  • Resource limits and hardened block list against shell metacharacters, eval flags, privilege escalation, and destructive operations.

Browser automation

  • Navigate, back, forward, reload, and configurable wait conditions on navigation (load, networkidle0, networkidle2).
  • Read page URL, title, visible text (512 KB cap), and full HTML (512 KB cap).
  • Consolidated page state retrieval — URL, title, DOM, screenshot, and tab list in a single call.
  • DOM-level click, type, form fill, drag, and <select> option selection by CSS selector.
  • Accessibility-snapshot-based interaction by ref ID — click, type, hover, keypress with modifiers, scroll, and file-input upload.
  • Wait for dynamic elements with configurable timeout, optionally requiring visibility.
  • Evaluate JavaScript expressions in the page context.
  • Multi-tab management: list, switch, open one or many at once, and close.
  • Cookie inspection (values redacted) and assignment on the current domain.
  • Batched action execution — sequence multiple browser steps in one call, stopping on first failure.
  • Save the current page as a PDF under %USERPROFILE% or %TEMP%.
  • Dialog handling for alert, confirm, prompt, and beforeunload.
  • Runs on Microsoft Edge, launched automatically on first use.

UI accessibility

  • Retrieve the Windows UI Automation tree for the foreground window with configurable depth and element count.
  • Find UI elements by text, role, or accessible name.
  • Returns clickable screen coordinates for precise targeting of buttons, text boxes, checkboxes, menu items, and combo boxes.

Timing and synchronization

  • Use wait_milliseconds for short one-shot pauses (up to five seconds).
  • Use browser_wait_for for DOM-level polling (up to 30 seconds).

Notes

  • All coordinates are in screen pixels with (0,0) at the top-left corner. Coordinates from take_screenshot, analyze_screen, find_ui_element, and list_windows all share the same coordinate space.
  • A cursor failsafe is active: If the cursor moves within five pixels of any screen corner, mouse operations are canceled. Avoid targeting the extreme edges of the screen.
  • Shell pipe operators (|), semicolons (;), ampersands (&), and output redirection (>, <) are blocked. To transform command output, capture it and process it with execute_python_code.
  • If interpreter eval flags are blocked or if python -c "..." and node -e "..." are rejected, you can use execute_python_code for Python code, or write code to a file first.
  • Command stdout/stderr is truncated at 32 KB each. Use flags to limit verbose output (for example, git log --oneline -20) or redirect to a file and read it separately.
  • Maximum timeout for execute_shell_command and execute_python_code is 30 seconds. For longer work, break it into smaller steps or launch a background process from Python and poll.
  • There's no dedicated file read/write tool. Read files with execute_shell_command using the type command. Write files with execute_python_code using Python's built-in file I/O. Shell output redirection (>, >>) is blocked.
  • browser_eval_js always returns a string. Convert objects or numbers explicitly before returning.
  • Browser DOM tools (browser_click, browser_type, browser_eval_js, and others) operate only on the Microsoft Edge instance. focus_browser can focus Chrome or Firefox windows, but DOM tools don't target them.
  • take_screenshot requires all four crop parameters (x, y, width, height) together, or none for a full-screen capture.
  • scroll uses notch units (clamped to [-20, 20]), not pixels. Three notches is approximately one page.
  • find_ui_element requires at least one of text, role, or name. When both text and name are provided, name takes precedence.
  • browser_snapshot refs expire on navigation. If a _ref tool (click, type, hover, keypress, scroll, or set file input) fails because the snapshot is stale, retake the snapshot and retry.
  • browser_set_file_input_ref only accepts file paths under the user's Documents, Downloads, Desktop, or %TEMP% directories. Files outside those locations are rejected.
  • browser_get_cookies always returns redacted cookie values. Use it for inspection—names, domains, paths, and flags are returned in full, but values aren't exposed.
  • browser_set_cookies only adds or overwrites cookies. It doesn't clear existing cookies. To remove a cookie, overwrite it with an expired expires value via this tool, or clear it through the page itself.
  • browser_execute_batch stops on the first failed action and returns only the results collected up to that point. Subsequent actions in the array aren't attempted. Allowed batch actions are limited to: navigate, snapshot, click_ref, type_ref, hover_ref, scroll_ref, keypress_ref, wait_for, and eval_js.
  • browser_create_tabs opens tabs in the order provided. If foregroundIndex is omitted, focus stays on the currently active tab.
  • browser_get_page_state only returns the fields listed in the fields array. Request only what you need – including dom or screenshot can produce large payloads.

Common use cases

Fill out a web form

  • Call browser_navigate to open the target page.
  • Call browser_wait_for to wait for the form to load.
  • Call browser_type to fill each field by CSS selector.
  • Call browser_click to submit the form.
  • Call browser_wait_for to wait for the confirmation element.
  • Call browser_get_text to read and verify the result.

Automate a desktop application

  • Call activate_window to bring the application to the foreground.
  • Call take_screenshot to capture the current state.
  • Call find_ui_element to locate a button or field by name.
  • Call click on the element's reported coordinates.
  • Call type_text to enter data.
  • Call press_keys for shortcuts (for example, ["ctrl","s"] to save).
  • Call take_screenshot to verify the result.

Extract data from a web page

  • Call browser_navigate to open the page.
  • Call browser_get_text to extract visible text content.
  • Call execute_python_code to parse and process the extracted data.
  • Call browser_eval_js to query specific values via JavaScript when text extraction isn't enough.

Run development tasks

  • Call execute_shell_command for git pull, npm install, and dotnet build.
  • Call take_screenshot to capture build output.
  • Call execute_python_code to analyze logs or test results.
  • Call browser_navigate to open a local dev server in the browser.
  • Call browser_screenshot to capture the rendered page.

Read and write files

  • Read a file by using execute_shell_command with type C:\path\to\file.txt.
  • Write a file by using execute_python_code with Python's open(...) and write(...).
  • Verify by using execute_shell_command with dir C:\path\to\output.txt.
  • Call get_accessibility_tree to understand the full UI structure.
  • Call find_ui_element to find a specific control (for example, role: "MenuItem", name: "Settings").
  • Call click using the element's reported coordinates.
  • Call find_ui_element again to find the next control in the dialog.
  • Call type_text or click to interact with it.

Keep a long-running session alive

  • Send any MCP request at least once every 30 minutes to prevent idle eviction.
  • get_screen_size is lightweight and works well as a heartbeat.

Learn more

Windows 365 for Agents