Run an OpenAI Model on Your Mac: The Simple Guide to GPT-OS
This week, OpenAI unveiled its long-anticipated open-source language model, gpt-oss. Its most compelling feature is the ability to run locally on personal machines, including Macs powered by Apple Silicon. Here’s how it works—and what to expect.
The model comes in two variants: gpt-oss-20b and gpt-oss-120b. The former is a “mid-tier” model that can operate on high-end Macs, provided they have sufficient resources. The latter is a “heavyweight” model that demands considerably more robust hardware. Predictably, the smaller version is more prone to “hallucinations”—fabricating facts—due to its more limited training dataset. However, it offers faster performance and is realistically deployable on consumer-grade hardware.
Even in its streamlined form, gpt-oss is a compelling tool for anyone curious about running a large language model directly on their laptop. One must bear in mind, though, that unlike ChatGPT, this model operates entirely offline and lacks many of the refinements found in more advanced chatbots. For instance, it does not verify responses via search engines, which increases the likelihood of inaccuracies.
To run gpt-oss-20b, OpenAI recommends a minimum of 16 GB of RAM—though in practice, this is more of a baseline just to see it in action. It’s no surprise that Apple has ceased selling Macs with 8 GB of RAM; artificial intelligence is swiftly becoming a standard computing task.
Getting started is refreshingly simple. First, download the Ollama application, which manages the model, from ollama.com/download. Then, open the Terminal and enter the following commands:
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
The model requires approximately 15 GB of disk space. Once downloaded, it will appear in Ollama’s interface. If you prefer complete offline functionality, you can activate “Airplane Mode” in the settings—no internet connection or registration required.
From there, it’s as easy as typing your prompt and observing the output. However, be aware that the model will consume all available system resources—your Mac may noticeably slow down. On a MacBook Air with an M4 chip and 16 GB of RAM, for example, generating a response to “hello” took over five minutes. A query like “Who was the 13th president of the United States?” took roughly 43 minutes. So, if you’re planning to use the model seriously, 16 GB is, to put it mildly, insufficient.
If you decide you no longer need the model and wish to reclaim disk space, run the following command:
ollama rm gpt-oss:20b
Additional information is available on Ollama’s official site, or you may explore alternative macOS applications like LM Studio.