Unraid + Open WebUI + Desktop Ollama: Remote GPU Inference Setup (Simple Guide)

Spread the love

Think of it like this:

Your desktop (with the GPU) is the engine that actually does the AI work (inference).
Your Unraid server runs Open WebUI (the dashboard) and just talks to the engine over the network.

So Unraid doesn’t need the big GPU. It just needs network access to the desktop’s Ollama API.

1) Get Ollama running on the desktop (the “engine”)

Install Ollama on the desktop.
Make sure it can use the GPU (NVIDIA drivers installed, etc.).
Pull at least one model on the desktop:
- ollama pull llama3.1 (example)

Key point: the models live on the desktop, and inference happens there.

2) Make Ollama reachable from your Unraid box

By default, Ollama listens on port 11434.

You need Ollama to listen on the desktop’s network interface (not just localhost), and allow inbound connections.

Desktop settings (conceptual)

Ensure Ollama is listening on 0.0.0.0:11434 (LAN-accessible).
Allow inbound firewall access to TCP 11434 from your Unraid server’s IP (best: only from Unraid, not the whole LAN).

Quick sanity test (from Unraid terminal)

Run:

curl http://DESKTOP_IP:11434/api/tags

If that returns JSON listing models/tags, Unraid can “see” Ollama.

If it fails:

Wrong IP
Firewall blocked
Ollama only bound to localhost
Desktop and Unraid not on same network/VLAN (routing rules)

3) Run Open WebUI on Unraid (the “dashboard”)

On Unraid, install the Open WebUI container and point it at the desktop Ollama URL.

The one setting that matters

Set:

OLLAMA_BASE_URL = http://DESKTOP_IP:11434

That’s it. WebUI will send prompts to that URL, and the desktop GPU will do the inference.

Typical container values (FYI)

Container port: 8080
Host port: 3000 (common choice)
Then you open: http://UNRAID_IP:3000

4) Verify inside Open WebUI

Open WebUI in your browser.
Go to model selection.
You should see the models that are installed on the desktop (because WebUI is reading them through Ollama’s API).
Run a test chat.

If you see tokens streaming and the desktop’s GPU usage spikes → you’re done.

5) What this gives you (and what it doesn’t)

✅ Desktop GPU is used for inference, while Unraid hosts the web app
✅ Desktop can still be your normal PC + primary graphics card
✅ Great over 10GbE (latency matters more than bandwidth, but 10GbE is plenty)

⚠️ Unraid is not “using the GPU directly.” It’s just calling a remote API.

6) Security “don’t regret it later” checklist (still ELI5)

Do not expose 11434 to the internet.
Allow port 11434 only on your LAN, ideally only from Unraid’s IP.
If you need remote access, put WebUI behind a VPN or a reverse proxy with auth—don’t publish Ollama raw.

7) Optional: the “it just works” stability upgrades

Give the desktop a static IP or DHCP reservation.
Make sure Ollama starts on boot (service/autostart).
Keep WebUI and Ollama on the same VLAN/subnet for simplest routing.

Search Here