Run AI Locally: Install Ollama on Proxmox with Ubuntu and Tailscale | IT HomeLab

Ollama is a lightweight, open-source tool that lets you run large language models locally on your own hardware — completely offline, no API keys, no subscription fees. In this guide I’ll walk through creating a dedicated Ubuntu VM in Proxmox, installing Ollama, configuring it to accept connections from other machines on your network, connecting it through Tailscale, and pulling down your first model to query from the command line. The next video puts a web interface on top so you can chat with your models from a browser — but this guide gets the foundations right first.

🎥Watch the Video Tutorial


💡What Is Ollama and Why Run It in a Home Lab?

Ollama is essentially a local runtime for large language models. You pull down a model — the same kinds of models powering commercial AI tools — and run it entirely on your own hardware. No data leaves your machine, no API costs, no rate limits, and no dependency on any external service being available. For a home lab it’s a great fit. You can experiment with different models, use it in automations, plug it into a web UI for a proper chat interface, or just use it to learn how LLMs actually work under the hood. You don’t need a GPU — Ollama runs on CPU just fine, which means any reasonable home lab machine can run it. If you do have a compatible Nvidia or AMD GPU, Ollama detects it automatically and uses it, which gives you significantly faster responses.
ℹ️Note: Hardware used: Dell Latitude 5411 — Core i7 (12 cores), 32GB RAM, 512GB NVMe running Proxmox VE. No GPU in this setup — everything runs on CPU. The demo model used is llama3.2 (2B parameters), a small model well-suited to CPU-only hardware.

🛠What You’ll Need


📋Step-by-Step Setup

1. Create the Ubuntu Server VM in Proxmox

In the Proxmox web UI click Create VM. The key specs for an Ollama VM:
  • OS: Ubuntu Server 26.04 LTS ISO. Type: Linux
  • System: Machine: q35, BIOS: SeaBIOS
  • Disk: 50 GB minimum — the LLMs you download take up space. A small model like llama3.2 is around 2 GB; larger models can be 8–40 GB or more. Give yourself room
  • CPU: 2 cores. Ollama supports multi-threading so more cores will help, but 2 is enough to get started
  • Memory: 8 GB. RAM is the key constraint with Ollama on CPU — the more RAM you give it, the larger models you can run. 8 GB works for small models. For larger, higher-capability models you’d want 16 GB or more
  • Network: leave as default VirtIO bridge
Click Finish, start the VM, open the Console, and install Ubuntu Server. Key choices during install: skip LVM groups (easier to expand disks later without them), enable OpenSSH, give the server a hostname of ollama or similar. Skip any additional snaps.

2. SSH in and update

After install and reboot, check the VM’s IP address from the Proxmox console, then SSH in from your Windows machine:
ssh username@<ollama-ip-address>
The reason to work over SSH rather than the Proxmox web console is simple: copy and paste works properly. Pasting commands into the Proxmox web console is unreliable; SSH gives you a clean terminal where you can paste commands directly from this guide.
⚠️Warning: If you get a host key conflict error when SSH-ing (this happens when a previous machine used the same IP), open the known_hosts file in Notepad from C:\\Users\\YourName\\.ssh\\known_hosts, find and delete the lines for that IP address, save the file, and retry. SSH will then prompt you to confirm the new fingerprint.
Once connected, run updates:
sudo apt update && sudo apt upgrade -y
On a fresh 26.04 install released just days ago there may be no updates available yet — that’s expected.

3. Install Ollama

The official one-line install script from ollama.com handles everything — it downloads the binary, sets up the systemd service, and starts Ollama automatically:
curl -fsSL https://ollama.com/install.sh | sh
💡Tip: If you prefer to review scripts before running them — which is always a good habit — download it first and inspect it: curl -fsSL https://ollama.com/install.sh -o install.sh && nano install.sh. Running it from an HTTPS source on a VM you control is low risk, but scrutinising install scripts is a good practice to build.
Once the install completes, verify Ollama is running:
systemctl status ollama
You should see active (running). Press q to exit the status view. If you have an Nvidia or AMD GPU in the machine, Ollama detects it automatically at this point — you’ll see it reported in the install output. On CPU-only hardware it runs fine, just slower.

4. Configure Ollama to accept connections from other machines

By default Ollama only listens on localhost — it won’t accept connections from other machines on your network. To change this, edit the Ollama systemd service file:
sudo nano /etc/systemd/system/ollama.service
Find the [Service] section and add this line:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
This tells Ollama to listen on all network interfaces rather than just localhost. Save the file (Ctrl+X → Y → Enter), then reload the daemon and restart the service:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Ollama now listens on port 11434 on all interfaces.

5. Install Tailscale for secure remote access (optional but recommended)

Tailscale creates a private encrypted network between your devices. Rather than exposing Ollama’s API port to your whole local network or the internet, Tailscale keeps access limited to devices you’ve explicitly added to your Tailscale account — a clean security boundary for a home lab.
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
Tailscale outputs a URL after tailscale up. Copy it, open it in a browser, and log into your Tailscale account to authenticate the machine. Once authenticated, the Ollama VM appears in your Tailscale device list. Get the Tailscale IP of the VM:
tailscale ip
Then verify Ollama is accessible over Tailscale from another machine — open a browser and go to:
http://ollama.your-tailscale.ts.net:11434
You should see: Ollama is running.
⚠️Warning: Ollama has no built-in authentication on its API by default. Routing access through Tailscale rather than exposing the port directly to your local network or the internet is the right approach for a home lab. Don’t expose port 11434 directly to the internet.

6. Pull your first model and run it

Pull a model from the Ollama model library. For CPU-only hardware, start with a small model — llama3.2 at the 2B parameter size is a good first choice. It’s capable enough to demonstrate what’s possible without being too slow on modest hardware:
ollama pull llama3.2
Once downloaded, run it interactively:
ollama run llama3.2
This drops you into an Ollama prompt where you can ask questions directly. Try a couple of test queries:
>>> What is the capital of France?

>>> In one sentence, explain what Proxmox is.
On CPU-only hardware the responses will be noticeably slower than a GPU-accelerated setup — seconds to tens of seconds depending on the query length and your hardware. That’s the honest trade-off. For experimentation, learning, running automations overnight, or anything where you’re not waiting at the keyboard for an instant response, it’s perfectly functional. For interactive use with acceptable latency, a compatible GPU makes a significant difference. You can download and switch between multiple models. To see what you’ve pulled:
ollama list
To exit the interactive prompt:
/bye
💡Tip: Model sizes and RAM requirements roughly line up like this: 2B parameter models need around 2–3 GB RAM, 7B models need 6–8 GB, 13B models need 10–12 GB, and 70B models need 40 GB or more. Match the model size to the RAM you’ve allocated to the VM — running a model that exceeds available RAM will cause it to swap to disk, making it extremely slow.

✅Conclusion

You now have Ollama running on a dedicated Proxmox VM, accessible from any device on your Tailscale network, with your first model downloaded and queryable from the command line. No GPU needed, no API costs, no data leaving your network. The next step is putting a proper web interface on top so you can interact with models through a browser chat interface — I’ll be using Open WebUI in a Docker container for that, which keeps the footprint small. That’s the next video in this series. Related guides: Proxmox VE Home Lab Setup · Docker Home Lab Setup 📺Watch the full video guide here: https://youtu.be/kVAViWD19ZU If you found this helpful, like and subscribe to IT HomeLab Online on YouTube for more tutorials. ☕Support the channel: Patreon · Buy Me a Coffee

Enjoyed this guide?

Subscribe to the channel for more homelab builds, Raspberry Pi projects, and AI automation tutorials.

▶ Watch on YouTube