Large language models (LLMs) are the talk of the town, but small language models are also important for certain tasks, especially on power-limited devices like phones and laptops. Microsoft just revealed its new Mu language model, and it’s already powering some Windows 11 features.
Microsoft already uses a small language model called Phi Silica in Windows 11, allowing Copilot+ PC features to work without slowdowns on chipsets like the Snapdragon X Plus. Popular AI chatbots like ChatGPT, Copilot, and Gemini use more advanced LLMs that require powerful GPUs, but smaller models like Phi Silica and Mu can achieve similar results with a fraction of the processing power, at the cost of less versatility.
Mu is a “micro-sized, task-specific language model” designed to run efficiently on a Neural Processing Unit, or NPU, like the ones found in recent Copilot+ PC computers. Microsoft used many different optimization techniques to achieve high performance on limited power, including a transformer encoder–decoder architecture, weight sharing in certain components to reduce the total parameter count, and only using hardware-accelerated operations. Microsoft says Mu can run at more than 200 tokens per second on a Surface Laptop 7, which is a faster response than you’d typically get from the free versions of ChatGPT or Gemini in a web browser.
The Mu model is being used first for the search bar in the Windows 11 settings app, which rolled out recently to Windows Insiders on Snapdragon PCs. It can understand prompts like “how to control my PC by voice” or “my mouse pointer is too small” and locate the correct setting. It’s not clear if Mu will be used for other Copilot+ PC features.
Microsoft said in a blog post, “Managing the extensive array of Windows settings posed its own challenges, particularly with overlapping functionalities. For instance, even a simple query like “Increase brightness” could refer to multiple settings changes – if a user has dual monitors, does that mean increasing brightness to the primary monitor or a secondary monitor? To address this, we refined our training data to prioritize the most used settings as we continue to refine the experience for more complex tasks.”
Lightweight language models that run locally are some of the best uses for generative AI, since responsiveness and data privacy is much easier when there’s no cloud servers involved. That didn’t stop Recall from nearly being a security disaster, though.
Source: Windows Blog
Leave a Comment
Your email address will not be published. Required fields are marked *