Swama
Machine-learning runtime
Swama is a pure‑Swift machine‑learning runtime that runs locally on macOS 15.0+ with Apple Silicon. It leverages Apple’s MLX framework to provide fast inference for large language models and vision‑language models, exposing standard OpenAI‑compatible endpoints for chat, embeddings, audio transcription and experimental speech synthesis. The runtime includes a menu‑bar application, a full command‑line interface, and a modular library (SwamaKit) that handles model downloading, caching, versioning and streaming responses.
The tool is aimed at developers and power users who need on‑device AI capabilities without relying on cloud services. It supports multimodal inputs, allowing text and image data to be processed together, and integrates Whisper for local audio transcription. Model management is automated: aliases such as “qwen3” or “llama3.2” trigger automatic download from the HuggingFace Hub and store the model in a local cache for reuse.
Swama’s architecture separates core logic, CLI utilities and the macOS graphical app, enabling flexible integration into scripts, terminal workflows, or a native UI. Installation is available via Homebrew, pre‑built DMG, or source compilation with Xcode 16 and Swift 6.2. The runtime is released under the MIT license.
Reviews
Loading reviews…
Similar apps
Window & Desktop Management
LlamaBarn
Menu bar app for running local LLMs

AI Coding Agents
Osaurus
LLM server built on MLX

System Monitoring & Maintenance
Stability Matrix
Package manager and inference UI for Stable Diffusion
AI Chat & Voice Agents
HuggingChat
Chat client for models on HuggingFace

AI Coding Agents
LM Studio
Discover, download, and run local LLMs

Terminals & CLI
Awal Terminal
AI-native terminal emulator with multi-provider profiles and voice input.