131 lines
5.4 KiB
Markdown
131 lines
5.4 KiB
Markdown
# RFCP 3.6.0 — Production GPU Build (Claude Code Task)
|
|
|
|
## Goal
|
|
|
|
Build `rfcp-server.exe` (PyInstaller) with CuPy GPU support so production RFCP
|
|
detects the NVIDIA GPU without manual `pip install`.
|
|
|
|
Currently production exe shows "CPU (NumPy)" because CuPy is not bundled.
|
|
|
|
## Current Environment (CONFIRMED WORKING)
|
|
|
|
```
|
|
Windows 10 (10.0.26200)
|
|
Python 3.11.8 (C:\Python311)
|
|
NVIDIA GeForce RTX 4060 Laptop GPU (8 GB VRAM)
|
|
CUDA Toolkit 13.1 (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1)
|
|
CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1
|
|
|
|
Packages:
|
|
cupy-cuda13x 13.6.0 ← NOT cuda12x!
|
|
numpy 1.26.4
|
|
scipy 1.17.0
|
|
fastrlock 0.8.3
|
|
pyinstaller 6.18.0
|
|
|
|
GPU compute verified:
|
|
python -c "import cupy; a = cupy.array([1,2,3]); print(a.sum())" → 6 ✅
|
|
```
|
|
|
|
## What We Already Tried (And Why It Failed)
|
|
|
|
### Attempt 1: ONEFILE spec with collect_all('cupy')
|
|
- `collect_all('cupy')` returns 1882 datas, **0 binaries** — CuPy pip doesn't bundle DLLs on Windows
|
|
- CUDA DLLs come from two separate sources:
|
|
- **nvidia pip packages** (14 DLLs in `C:\Python311\Lib\site-packages\nvidia\*/bin/`)
|
|
- **CUDA Toolkit** (13 DLLs in `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1\bin\x64\`)
|
|
- We manually collected these 27 DLLs in the spec
|
|
- Build succeeded (3 GB exe!) but crashed on launch:
|
|
```
|
|
[PYI-10456:ERROR] Failed to extract cufft64_12.dll: decompression resulted in return code -1!
|
|
```
|
|
- Root cause: `cufft64_12.dll` is 297 MB — PyInstaller's zlib compression fails on it in ONEFILE mode
|
|
|
|
### Attempt 2: We were about to try ONEDIR but haven't built it yet
|
|
|
|
### Key Insight: Duplicate DLLs from two sources
|
|
nvidia pip packages have CUDA 12.x DLLs (cublas64_12.dll etc.)
|
|
CUDA Toolkit 13.1 has CUDA 13.x DLLs (cublas64_13.dll etc.)
|
|
CuPy-cuda13x needs the 13.x versions. The 12.x from pip may conflict.
|
|
|
|
## What Needs To Happen
|
|
|
|
1. **Build rfcp-server as ONEDIR** (folder with exe + DLLs, not single exe)
|
|
- This avoids the decompression crash with large CUDA DLLs
|
|
- Output: `backend/dist/rfcp-server/rfcp-server.exe` + all DLLs alongside
|
|
|
|
2. **Include ONLY the correct CUDA DLLs**
|
|
- Prefer CUDA Toolkit 13.1 DLLs (match cupy-cuda13x)
|
|
- The nvidia pip packages have cuda12x DLLs — may cause version conflicts
|
|
- Key DLLs needed: cublas, cusparse, cusolver, curand, cufft, nvrtc, cudart
|
|
|
|
3. **Exclude bloat** — the previous build pulled in tensorflow, grpc, opentelemetry etc.
|
|
making it 3 GB. Real size should be ~600-800 MB.
|
|
|
|
4. **Test the built exe** — run it standalone and verify:
|
|
- `curl http://localhost:8090/api/health` returns `"build": "gpu"`
|
|
- `curl http://localhost:8090/api/gpu/status` returns `"available": true`
|
|
- Or at minimum: the exe starts without errors and CuPy imports successfully
|
|
|
|
5. **Update Electron integration** if needed:
|
|
- Current Electron expects a single `rfcp-server.exe` file
|
|
- With ONEDIR, it's a folder `rfcp-server/rfcp-server.exe`
|
|
- File: `desktop/main.js` or `desktop/src/main.ts` — look for where it spawns backend
|
|
- The path needs to change from `resources/backend/rfcp-server.exe`
|
|
to `resources/backend/rfcp-server/rfcp-server.exe`
|
|
|
|
## File Locations
|
|
|
|
```
|
|
D:\root\rfcp\
|
|
├── backend\
|
|
│ ├── run_server.py ← PyInstaller entry point
|
|
│ ├── app\
|
|
│ │ ├── main.py ← FastAPI app
|
|
│ │ ├── services\
|
|
│ │ │ ├── gpu_backend.py ← GPU detection (CuPy/NumPy fallback)
|
|
│ │ │ └── coverage_service.py ← Uses get_array_module()
|
|
│ │ └── api\routes\gpu.py ← /api/gpu/status, /api/gpu/diagnostics
|
|
│ ├── dist\ ← PyInstaller output goes here
|
|
│ └── build\ ← PyInstaller build cache
|
|
├── installer\
|
|
│ ├── rfcp-server-gpu.spec ← GPU spec (needs fixing)
|
|
│ ├── rfcp-server.spec ← CPU spec (working, don't touch)
|
|
│ ├── rfcp.ico ← Icon (exists)
|
|
│ └── build-gpu.bat ← Build script
|
|
├── desktop\
|
|
│ ├── main.js or src/main.ts ← Electron main process
|
|
│ └── resources\backend\ ← Where production exe lives
|
|
└── frontend\ ← React frontend (no changes needed)
|
|
```
|
|
|
|
## Existing CPU spec for reference
|
|
|
|
The working CPU-only spec is at `installer/rfcp-server.spec`. Use it as the base
|
|
and ADD CuPy + CUDA on top. Don't reinvent the wheel.
|
|
|
|
## Build Command
|
|
|
|
```powershell
|
|
cd D:\root\rfcp\backend
|
|
pyinstaller ..\installer\rfcp-server-gpu.spec --clean --noconfirm
|
|
```
|
|
|
|
## Success Criteria
|
|
|
|
- [ ] `dist/rfcp-server/rfcp-server.exe` starts without errors
|
|
- [ ] CuPy imports successfully inside the exe (no missing DLL errors)
|
|
- [ ] `/api/gpu/status` returns `"available": true, "device": "RTX 4060"`
|
|
- [ ] Total folder size < 1 GB (ideally 600-800 MB)
|
|
- [ ] No tensorflow/grpc/opentelemetry bloat
|
|
- [ ] Electron can find and launch the backend (path updated if needed)
|
|
|
|
## Important Notes
|
|
|
|
- Do NOT use cupy-cuda12x — we migrated to cupy-cuda13x
|
|
- Do NOT try ONEFILE mode — cufft64_12.dll (297 MB) crashes decompression
|
|
- The nvidia pip packages (nvidia-cublas-cu12, etc.) are still installed but may
|
|
conflict with CUDA Toolkit 13.1 — prefer Toolkit DLLs
|
|
- `collect_all('cupy')` gives 0 binaries on Windows — DLLs must be manually specified
|
|
- gpu_backend.py already handles CuPy absence gracefully (falls back to NumPy)
|