# RFCP 3.7.0 — GPU-Accelerated Coverage Calculations ## Context Iteration 3.6.0 completed: CuPy-cuda13x works in production PyInstaller build, RTX 4060 detected, ONEDIR build with CUDA DLLs. BUT coverage calculations still run on CPU because coverage_service.py uses `import numpy as np` directly instead of the GPU backend. The GPU infrastructure is ready: - `app/services/gpu_backend.py` has `GPUManager.get_array_module()` → returns cupy or numpy - `/api/gpu/status` confirms `"active_backend": "cuda"` - CuPy is imported and GPU detected in the frozen exe ## Goal Replace direct `np.` calls in coverage_service.py with `xp = gpu_manager.get_array_module()` so calculations run on GPU when available, with automatic NumPy fallback. ## Files to Modify ### `app/services/coverage_service.py` **Line 7**: `import numpy as np` — keep this but also import gpu_manager Add near top: ```python from app.services.gpu_backend import gpu_manager ``` **Key sections to GPU-accelerate** (highest impact first): #### 1. Grid array creation (lines 549-550, 922-923) ```python # BEFORE: grid_lats = np.array([lat for lat, lon in grid]) grid_lons = np.array([lon for lat, lon in grid]) # AFTER: xp = gpu_manager.get_array_module() grid_lats = xp.array([lat for lat, lon in grid]) grid_lons = xp.array([lon for lat, lon in grid]) ``` #### 2. Trig calculations (line 468, 1031, 1408-1415, 1442) These use np.cos, np.radians, np.sin, np.degrees, np.arctan2 — all have CuPy equivalents. ```python # BEFORE: lon_delta = settings.radius / (111000 * np.cos(np.radians(center_lat))) cos_lat = np.cos(np.radians(center_lat)) # AFTER: xp = gpu_manager.get_array_module() lon_delta = settings.radius / (111000 * float(xp.cos(xp.radians(center_lat)))) cos_lat = float(xp.cos(xp.radians(center_lat))) ``` #### 3. The heavy calculation loop — `_run_point_loop` (line 1070) and `_calculate_point_sync` (line 1112) This is where 90% of time is spent. Currently processes points one-by-one. The GPU win comes from vectorizing the path loss calculation across ALL grid points at once. **Strategy**: Instead of looping through points, create arrays of all distances/angles and compute path loss for all points in one vectorized operation. #### 4. `_calculate_bearing` (line 1402) — already vectorizable ```python # All np.* functions here have direct CuPy equivalents # Just replace np → xp ``` ## Important Rules 1. **Always get xp at function scope**, not module scope: ```python def my_function(self, ...): xp = gpu_manager.get_array_module() # use xp instead of np ``` 2. **Convert GPU arrays back to CPU** before returning to non-GPU code: ```python if hasattr(result, 'get'): # CuPy array result = result.get() # → numpy array ``` 3. **Keep np for small/scalar operations** — GPU overhead isn't worth it for single values. Only use xp for array operations on 100+ elements. 4. **Don't break the fallback** — if CuPy isn't available, `get_array_module()` returns numpy, so `xp.array()` etc. work identically. 5. **Test both paths** — run with GPU and verify same results as CPU. ## Testing After changes: ```powershell # Rebuild cd D:\root\rfcp\backend pyinstaller ..\installer\rfcp-server-gpu.spec --noconfirm # Run .\dist\rfcp-server\rfcp-server.exe # Test calculation via frontend — watch Task Manager GPU utilization # Should see GPU Compute spike during coverage calculation # Time should be significantly faster than 10s for 1254 points ``` Compare before/after: - Current (CPU): ~10s for 1254 points, 5km radius - Expected (GPU): 1-3s for same calculation Also test GPU diagnostics: ``` curl http://localhost:8888/api/gpu/diagnostics ``` ## What NOT to Change - Don't modify gpu_backend.py — it's working correctly - Don't change the API endpoints or response format - Don't remove the NumPy import — keep it for non-array operations - Don't change propagation model math — only the array operations - Don't change _filter_buildings_to_bbox or OSM functions — they use lists not arrays ## Success Criteria - [ ] Coverage calculation uses GPU (visible in Task Manager) - [ ] Calculation time reduced for 1000+ point grids - [ ] CPU fallback still works (test by setting active_backend to cpu via API) - [ ] Same coverage results (heatmap should look identical) - [ ] No regression in tiled processing mode