Ambient and Multi-modal”: Windows Head Outlines a Radical AI-Powered Future for the OS
Microsoft has released a new video interview with Pavan Davuluri, head of Windows, in which he outlined the company’s vision for the platform’s evolution and the transformative role artificial intelligence will play.
When asked how AI will change the way people interact with computers, Davuluri described a future in which computing becomes “more ambient, more all-encompassing, spanning diverse form factors and, undoubtedly, more multi-modal.” He emphasized that voice will assume an increasingly significant role, while the system itself will be able to “look at the screen” and remain contextually aware—ushering in a new paradigm of interaction.
He noted that users will be able to converse with their computers while writing, drawing, or communicating, with the operating system “semantically understanding” their intent. Voice, he said, will become a fully-fledged input method alongside the mouse, keyboard, and touch.
The subject of voice interaction has surfaced before—just last week, Microsoft’s Corporate Vice President for Enterprise & Security hinted at similar changes in the Windows 2030 Vision video. Davuluri added that over the next five years, Windows interfaces could change dramatically with the advent of agentic AI, making the operating system more “agent-like” and multi-modal—an area poised for substantial investment and innovation.
A core element of the new architecture will be the fusion of local and cloud computing. “Computing will become ubiquitous: the Windows experience will rely on a blend of on-device and cloud capabilities. Our mission is to make that seamless for customers,” Davuluri said.
Today, assistants like Copilot on Windows, Gemini on Android, or Siri on macOS operate as overlays or separate windows atop the OS. Microsoft, however, is preparing a version of Windows built with AI integrated at its core. These changes are expected within the next five years, possibly debuting with Windows 12.
Microsoft is not alone in this approach; according to rumors, iOS 26 will make voice control a central feature, enabling users to manage applications through intent-based commands.
On Windows, voice will complement—rather than replace—traditional input methods. Users will still be able to type, click, or speak, with voice serving as an optional yet powerful tool to streamline workflows.
However, enabling these capabilities will require the processing of significant amounts of personal data. Microsoft maintains that balancing local and cloud processing will help safeguard privacy, though the company anticipates that questions around data protection will spark considerable debate.