Your company just banned ChatGPT. You aren’t alone—27% of organizations have temporarily banned GenAI due to privacy risks according to Cisco. Now you have gigabytes of PDF documentation, a need for semantic search, and a strict “no data leaves the laptop” policy.
The solution? PrivateGPT.
The problem? Most installation guides are broken. They rely on Docker (which eats your RAM) or outdated commands that fail instantly in 2025. If you’ve tried following other tutorials and hit “Group not found” errors or “BLAS=0” CPU bottlenecks, you know the frustration. The fan noise on your laptop spins up, but the model just won’t load.
In this guide, we’re ditching Docker. We are going “bare metal” on Windows Subsystem for Linux (WSL). We will fix the broken dependency syntax and set up a Hybrid Ollama configuration that actually works.
Why Ditch Docker? The Privacy & Performance Case
Strictly speaking, Docker is easier. But “easier” isn’t “better.”
On Windows, Docker Desktop runs inside a lightweight VM. While efficient, it still introduces overhead. It reserves a chunk of your RAM for the Vmmem process, often 2GB to 4GB, regardless of what the container is actually using. When you are trying to squeeze a 70B parameter model into a consumer GPU, every megabyte of VRAM and System RAM counts.
Here is the reality of the trade-off:
| Feature | Docker Setup | Native WSL (This Guide) |
|---|---|---|
| Memory Overhead | High (VM Reservation) | Minimal (Shared Kernel Resources) |
| GPU Access | Requires Nvidia Container Toolkit | Native (DirectX 12 / CUDA) |
| Disk Usage | High (Duplicated Libraries) | Low (Shared System Libs) |
| Setup Complexity | Low (One Command) | Medium (Requires dependency mgmt) |
Phase 1: The Hard Prerequisites
Before you copy-paste a single command, we need to address two critical traps.
1. The Virtualization Check (Don’t Skip This!)
WSL will fail to launch if your BIOS settings are incorrect.
- Open Task Manager (Ctrl+Shift+Esc).
- Click on the Performance tab -> CPU.
- Look for “Virtualization” in the bottom right. It MUST say “Enabled”.
If it says “Disabled,” restart your computer, enter BIOS, and enable SVM (AMD) or VT-x (Intel).
2. The Driver Separation Rule
There is a misconception that you need to install NVIDIA Drivers inside Linux (WSL). Do not do this. It will break your installation.
- Windows Host: Install the standard NVIDIA GeForce/Studio Drivers here. WSL2 inherits these drivers automatically.
- WSL (Ubuntu): Install the CUDA Toolkit here.
Hardware Reality Check: According to the Stanford AI Index 2025, small models are getting efficient, but they still demand VRAM. If you have less than 6GB of VRAM, you will be forced to use quantized 4-bit models or suffer slow CPU inference.
Phase 2: Solving the Python Version Trap
Here is the trap: If you install Ubuntu 24.04 LTS on WSL today, it comes with Python 3.12. PrivateGPT strictly requires Python 3.11. If you try to run it on 3.12, it crashes. If you try to downgrade the system Python, you might break Ubuntu.
The only clean way out is pyenv. But wait—if you just install pyenv, the build will fail because you are missing C compilers.
Execute these commands in order to build a perfect environment:
# 1. Update and install the "Build Essentials" (Crucial Step)
sudo apt update && sudo apt upgrade -y
sudo apt install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncurses5-dev libncursesw5-dev xz-utils tk-dev \
libffi-dev liblzma-dev git
# 2. Install Pyenv
curl https://pyenv.run | bash
# 3. Add Pyenv to your shell (Run these lines one by one)
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
# 4. Restart your shell
exec "$SHELL"
# 5. Install Python 3.11.9 (This takes a few minutes—grab a coffee)
pyenv install 3.11.9
pyenv global 3.11.9
Verify it: Run python --version. If it says 3.11.9, you are safe.
Phase 3: The “Ollama Hybrid” Setup (Recommended)
Most guides tell you to compile llama-cpp-python from scratch. I’ve done it; it’s brittle. One CUDA update and your environment breaks.
The smart move for 2025 is the “Ollama Hybrid” method. We use Ollama as the backend engine (because it handles the GPU hardware abstraction perfectly) and PrivateGPT as the frontend UI and RAG (Retrieval-Augmented Generation) logic.
Step 1: Install the Backend
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a standard model (e.g., Llama 3)
ollama pull llama3
# Pull a text embedding model (Required for RAG)
ollama pull nomic-embed-text
Ollama needs to run in the background. Open a new, separate terminal window and type:
ollama serveKeep that window open! If you close it, the backend will die.Phase 4: Installing PrivateGPT (The “Poetry” Fix)
This is where the other ranking articles fail. They give you a command that worked in 2023 but throws a Group not found error today.
The Issue: PrivateGPT changed its pyproject.toml file. They moved dependencies from “Groups” to “Extras.”
The Fix:
# 1. Clone the Repo
git clone https://github.com/zylon-ai/private-gpt
cd private-gpt
# 2. Install Poetry (Dependency Manager)
pip install poetry
# 3. The CORRECT Install Command (Do NOT use --with ui)
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
Pro Tip: If the installation hangs on “Resolving dependencies,” try running
poetry lock --no-updatefirst to refresh the lock file. Note: Dependency names change frequently; if the command above fails, check the `pyproject.toml` file in the repo for updated “extras” names.
Phase 5: Configuration & Launch
You have the software installed. Now you need to tell PrivateGPT to talk to Ollama instead of trying to run the model itself.
- Open the
settings.yamlfile (or create asettings-ollama.yaml). - Ensure the connection port matches your Ollama instance (usually
localhost:11434). - Launch the stack:
# Set the profile to Ollama and Run
PGPT_PROFILES=ollama make run
If you see a URL like http://0.0.0.0:8001, you are live. Open that in your Windows browser (localhost:8001). You now have a fully private, air-gapped ChatGPT competitor running on your machine.
Alternative: The “Pure Native” Build (Advanced)
If you refuse to use Ollama—maybe you want to run a specific GGUF model that isn’t on the Ollama registry—you have to do it the hard way. This requires compiling the C++ bindings manually.
The secret ingredient here is CMAKE_ARGS. Without this flag, pip installs a CPU-only version of the library.
# 1. Install standard dependencies
poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
# 2. Force-Reinstall Llama-cpp with CUDA support
# Note: Updated flag for 2025 compatibility (GGML_CUDA instead of LLAMA_CUBLAS)
CMAKE_ARGS='-DGGML_CUDA=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
Troubleshooting the “No-Docker” Setup
1. The “BLAS = 0” Error
If you start the app and the logs say BLAS = 0, your GPU is not being used. The model is running on your CPU (and it will be painfully slow). This usually means the CUDA Toolkit path isn’t in your WSL environment variables. Add /usr/local/cuda/bin to your PATH.
2. The “Disk Full” Panic
WSL uses a virtual disk (`ext4.vhdx`) that grows but never shrinks. If you download 50GB of models and then delete them, Windows won’t reclaim that space automatically. You have to run a PowerShell command Optimize-VHD to shrink the file manually. Keep an eye on this if your C: drive starts looking red.
Conclusion
By bypassing Docker, you’ve saved about 4GB of RAM and avoided a complex licensing layer. You’ve built a system that aligns with the growing trend of data sovereignty, ensuring your documents never touch a public cloud.
It’s not just a cool project; for many, it’s a compliance necessity. Now, go load up some PDFs and ask questions without looking over your shoulder.
⚖️ Liability Waiver & Technical Disclaimer
1. “AS IS” Basis: The instructions, code snippets, and configurations provided in this article (“Bare Metal WSL Setup”) are provided “as is” without warranty of any kind, express or implied.
2. Hardware & Data Risk: Running Large Language Models (LLMs) locally places significant stress on hardware (GPU/CPU/RAM). The author (MyAngle) is not responsible for hardware overheating, component failure, or data loss resulting from the execution of these commands.
3. Third-Party Dependencies: Software repositories (Pyenv, Poetry, Ollama) change frequently. Commands that work today (February 2026) may require modification in future versions. Always verify commands against official documentation.


