rfcp/docs/devlog/gpu_supp/RFCP-3.6.0-GPU-Build-Task.md

# RFCP 3.6.0 — Production GPU Build (Claude Code Task)

## Goal

Build `rfcp-server.exe` (PyInstaller) with CuPy GPU support so production RFCP
detects the NVIDIA GPU without manual `pip install`.

Currently production exe shows "CPU (NumPy)" because CuPy is not bundled.

## Current Environment (CONFIRMED WORKING)

```
Windows 10 (10.0.26200)
Python 3.11.8 (C:\Python311)
NVIDIA GeForce RTX 4060 Laptop GPU (8 GB VRAM)
CUDA Toolkit 13.1 (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1)
CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1

Packages:
  cupy-cuda13x  13.6.0   ← NOT cuda12x!
  numpy         1.26.4
  scipy         1.17.0
  fastrlock     0.8.3
  pyinstaller   6.18.0

GPU compute verified:
  python -c "import cupy; a = cupy.array([1,2,3]); print(a.sum())"  → 6 ✅
```

## What We Already Tried (And Why It Failed)

### Attempt 1: ONEFILE spec with collect_all('cupy')
- `collect_all('cupy')` returns 1882 datas, **0 binaries** — CuPy pip doesn't bundle DLLs on Windows
- CUDA DLLs come from two separate sources:
  - **nvidia pip packages** (14 DLLs in `C:\Python311\Lib\site-packages\nvidia\*/bin/`)
  - **CUDA Toolkit** (13 DLLs in `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1\bin\x64\`)
- We manually collected these 27 DLLs in the spec
- Build succeeded (3 GB exe!) but crashed on launch:
  ```
  [PYI-10456:ERROR] Failed to extract cufft64_12.dll: decompression resulted in return code -1!
  ```
- Root cause: `cufft64_12.dll` is 297 MB — PyInstaller's zlib compression fails on it in ONEFILE mode

### Attempt 2: We were about to try ONEDIR but haven't built it yet

### Key Insight: Duplicate DLLs from two sources
nvidia pip packages have CUDA 12.x DLLs (cublas64_12.dll etc.)
CUDA Toolkit 13.1 has CUDA 13.x DLLs (cublas64_13.dll etc.)
CuPy-cuda13x needs the 13.x versions. The 12.x from pip may conflict.

## What Needs To Happen

1. **Build rfcp-server as ONEDIR** (folder with exe + DLLs, not single exe)
   - This avoids the decompression crash with large CUDA DLLs
   - Output: `backend/dist/rfcp-server/rfcp-server.exe` + all DLLs alongside

2. **Include ONLY the correct CUDA DLLs**
   - Prefer CUDA Toolkit 13.1 DLLs (match cupy-cuda13x)
   - The nvidia pip packages have cuda12x DLLs — may cause version conflicts
   - Key DLLs needed: cublas, cusparse, cusolver, curand, cufft, nvrtc, cudart

3. **Exclude bloat** — the previous build pulled in tensorflow, grpc, opentelemetry etc.
   making it 3 GB. Real size should be ~600-800 MB.

4. **Test the built exe** — run it standalone and verify:
   - `curl http://localhost:8090/api/health` returns `"build": "gpu"`
   - `curl http://localhost:8090/api/gpu/status` returns `"available": true`
   - Or at minimum: the exe starts without errors and CuPy imports successfully

5. **Update Electron integration** if needed:
   - Current Electron expects a single `rfcp-server.exe` file
   - With ONEDIR, it's a folder `rfcp-server/rfcp-server.exe`
   - File: `desktop/main.js` or `desktop/src/main.ts` — look for where it spawns backend
   - The path needs to change from `resources/backend/rfcp-server.exe`
     to `resources/backend/rfcp-server/rfcp-server.exe`

## File Locations

```
D:\root\rfcp\
├── backend\
│   ├── run_server.py          ← PyInstaller entry point
│   ├── app\
│   │   ├── main.py            ← FastAPI app
│   │   ├── services\
│   │   │   ├── gpu_backend.py ← GPU detection (CuPy/NumPy fallback)
│   │   │   └── coverage_service.py ← Uses get_array_module()
│   │   └── api\routes\gpu.py  ← /api/gpu/status, /api/gpu/diagnostics
│   ├── dist\                  ← PyInstaller output goes here
│   └── build\                 ← PyInstaller build cache
├── installer\
│   ├── rfcp-server-gpu.spec   ← GPU spec (needs fixing)
│   ├── rfcp-server.spec       ← CPU spec (working, don't touch)
│   ├── rfcp.ico               ← Icon (exists)
│   └── build-gpu.bat          ← Build script
├── desktop\
│   ├── main.js or src/main.ts ← Electron main process
│   └── resources\backend\     ← Where production exe lives
└── frontend\                  ← React frontend (no changes needed)
```

## Existing CPU spec for reference

The working CPU-only spec is at `installer/rfcp-server.spec`. Use it as the base
and ADD CuPy + CUDA on top. Don't reinvent the wheel.

## Build Command

```powershell
cd D:\root\rfcp\backend
pyinstaller ..\installer\rfcp-server-gpu.spec --clean --noconfirm
```

## Success Criteria

- [ ] `dist/rfcp-server/rfcp-server.exe` starts without errors
- [ ] CuPy imports successfully inside the exe (no missing DLL errors)
- [ ] `/api/gpu/status` returns `"available": true, "device": "RTX 4060"`
- [ ] Total folder size < 1 GB (ideally 600-800 MB)
- [ ] No tensorflow/grpc/opentelemetry bloat
- [ ] Electron can find and launch the backend (path updated if needed)

## Important Notes

- Do NOT use cupy-cuda12x — we migrated to cupy-cuda13x
- Do NOT try ONEFILE mode — cufft64_12.dll (297 MB) crashes decompression
- The nvidia pip packages (nvidia-cublas-cu12, etc.) are still installed but may
  conflict with CUDA Toolkit 13.1 — prefer Toolkit DLLs
- `collect_all('cupy')` gives 0 binaries on Windows — DLLs must be manually specified
- gpu_backend.py already handles CuPy absence gracefully (falls back to NumPy)