134 lines
4.3 KiB
Markdown
134 lines
4.3 KiB
Markdown
# RFCP 3.7.0 — GPU-Accelerated Coverage Calculations
|
|
|
|
## Context
|
|
|
|
Iteration 3.6.0 completed: CuPy-cuda13x works in production PyInstaller build,
|
|
RTX 4060 detected, ONEDIR build with CUDA DLLs. BUT coverage calculations still
|
|
run on CPU because coverage_service.py uses `import numpy as np` directly instead
|
|
of the GPU backend.
|
|
|
|
The GPU infrastructure is ready:
|
|
- `app/services/gpu_backend.py` has `GPUManager.get_array_module()` → returns cupy or numpy
|
|
- `/api/gpu/status` confirms `"active_backend": "cuda"`
|
|
- CuPy is imported and GPU detected in the frozen exe
|
|
|
|
## Goal
|
|
|
|
Replace direct `np.` calls in coverage_service.py with `xp = gpu_manager.get_array_module()`
|
|
so calculations run on GPU when available, with automatic NumPy fallback.
|
|
|
|
## Files to Modify
|
|
|
|
### `app/services/coverage_service.py`
|
|
|
|
**Line 7**: `import numpy as np` — keep this but also import gpu_manager
|
|
|
|
Add near top:
|
|
```python
|
|
from app.services.gpu_backend import gpu_manager
|
|
```
|
|
|
|
**Key sections to GPU-accelerate** (highest impact first):
|
|
|
|
#### 1. Grid array creation (lines 549-550, 922-923)
|
|
```python
|
|
# BEFORE:
|
|
grid_lats = np.array([lat for lat, lon in grid])
|
|
grid_lons = np.array([lon for lat, lon in grid])
|
|
|
|
# AFTER:
|
|
xp = gpu_manager.get_array_module()
|
|
grid_lats = xp.array([lat for lat, lon in grid])
|
|
grid_lons = xp.array([lon for lat, lon in grid])
|
|
```
|
|
|
|
#### 2. Trig calculations (line 468, 1031, 1408-1415, 1442)
|
|
These use np.cos, np.radians, np.sin, np.degrees, np.arctan2 — all have CuPy equivalents.
|
|
```python
|
|
# BEFORE:
|
|
lon_delta = settings.radius / (111000 * np.cos(np.radians(center_lat)))
|
|
cos_lat = np.cos(np.radians(center_lat))
|
|
|
|
# AFTER:
|
|
xp = gpu_manager.get_array_module()
|
|
lon_delta = settings.radius / (111000 * float(xp.cos(xp.radians(center_lat))))
|
|
cos_lat = float(xp.cos(xp.radians(center_lat)))
|
|
```
|
|
|
|
#### 3. The heavy calculation loop — `_run_point_loop` (line 1070) and `_calculate_point_sync` (line 1112)
|
|
This is where 90% of time is spent. Currently processes points one-by-one.
|
|
The GPU win comes from vectorizing the path loss calculation across ALL grid points at once.
|
|
|
|
**Strategy**: Instead of looping through points, create arrays of all distances/angles
|
|
and compute path loss for all points in one vectorized operation.
|
|
|
|
#### 4. `_calculate_bearing` (line 1402) — already vectorizable
|
|
```python
|
|
# All np.* functions here have direct CuPy equivalents
|
|
# Just replace np → xp
|
|
```
|
|
|
|
## Important Rules
|
|
|
|
1. **Always get xp at function scope**, not module scope:
|
|
```python
|
|
def my_function(self, ...):
|
|
xp = gpu_manager.get_array_module()
|
|
# use xp instead of np
|
|
```
|
|
|
|
2. **Convert GPU arrays back to CPU** before returning to non-GPU code:
|
|
```python
|
|
if hasattr(result, 'get'): # CuPy array
|
|
result = result.get() # → numpy array
|
|
```
|
|
|
|
3. **Keep np for small/scalar operations** — GPU overhead isn't worth it for single values.
|
|
Only use xp for array operations on 100+ elements.
|
|
|
|
4. **Don't break the fallback** — if CuPy isn't available, `get_array_module()` returns numpy,
|
|
so `xp.array()` etc. work identically.
|
|
|
|
5. **Test both paths** — run with GPU and verify same results as CPU.
|
|
|
|
## Testing
|
|
|
|
After changes:
|
|
```powershell
|
|
# Rebuild
|
|
cd D:\root\rfcp\backend
|
|
pyinstaller ..\installer\rfcp-server-gpu.spec --noconfirm
|
|
|
|
# Run
|
|
.\dist\rfcp-server\rfcp-server.exe
|
|
|
|
# Test calculation via frontend — watch Task Manager GPU utilization
|
|
# Should see GPU Compute spike during coverage calculation
|
|
# Time should be significantly faster than 10s for 1254 points
|
|
```
|
|
|
|
Compare before/after:
|
|
- Current (CPU): ~10s for 1254 points, 5km radius
|
|
- Expected (GPU): 1-3s for same calculation
|
|
|
|
Also test GPU diagnostics:
|
|
```
|
|
curl http://localhost:8888/api/gpu/diagnostics
|
|
```
|
|
|
|
## What NOT to Change
|
|
|
|
- Don't modify gpu_backend.py — it's working correctly
|
|
- Don't change the API endpoints or response format
|
|
- Don't remove the NumPy import — keep it for non-array operations
|
|
- Don't change propagation model math — only the array operations
|
|
- Don't change _filter_buildings_to_bbox or OSM functions — they use lists not arrays
|
|
|
|
## Success Criteria
|
|
|
|
- [ ] Coverage calculation uses GPU (visible in Task Manager)
|
|
- [ ] Calculation time reduced for 1000+ point grids
|
|
- [ ] CPU fallback still works (test by setting active_backend to cpu via API)
|
|
- [ ] Same coverage results (heatmap should look identical)
|
|
- [ ] No regression in tiled processing mode
|