# RFCP 3.8.0 — Vectorize Per-Point Coverage Calculations ## Context Iteration 3.7.0 added GPU precompute for distances + base path loss (Phase 2.5). But Phase 3 (per-point loop) still runs on CPU, one point at a time across workers. This is where 95% of time goes on Full preset (195s for 6,642 points). Current pipeline: ``` Phase 2.5 (GPU, 0.01s): distances + base path_loss → precomputed arrays Phase 3 (CPU, 195s): per-point terrain_loss, building_loss, reflections, vegetation ``` Goal: Vectorize the heavy per-point calculations so GPU handles them in bulk. ## Architecture The key insight: `_calculate_point_sync` (line ~1127) does these steps per point: 1. **Terrain LOS check** — get elevation profile between site and point, check clearance 2. **Diffraction loss** — knife-edge based on Fresnel zone clearance 3. **Building obstruction** — find buildings between site and point, calculate penetration loss 4. **Materials penalty** — add loss based on building material type 5. **Dominant path analysis** — LOS vs reflection vs diffraction 6. **Street canyon** — check if point is in urban canyon 7. **Reflections** — find reflection paths off buildings (most expensive!) 8. **Vegetation loss** — check vegetation between site and point 9. **Final RSRP** — tx_power - path_loss - terrain_loss - building_loss - veg_loss + gains ## Strategy: Vectorize in Stages NOT everything can be vectorized equally. Prioritize by time spent: ### Stage 1: Terrain LOS + Diffraction (HIGH IMPACT) Currently: For each point, sample ~50-100 elevation values along radial path, find min clearance, compute knife-edge diffraction. **Vectorize**: Create 2D elevation profiles for ALL points at once. - All points share the same site location - For N points, create N terrain profiles (each M samples) - Compute Fresnel clearance for all profiles vectorized - Compute diffraction loss vectorized ```python # Instead of per-point: for point in grid: profile = get_terrain_profile(site, point, num_samples=50) clearance = min_clearance(profile) loss = diffraction_loss(clearance, freq) # Vectorized: xp = gpu_manager.get_array_module() # all_profiles shape: (N_points, M_samples) all_profiles = get_terrain_profiles_batch(site, all_points, num_samples=50) all_clearances = compute_clearances_batch(all_profiles, site_elev, point_elevs, distances) all_terrain_loss = diffraction_loss_batch(all_clearances, freq) ``` ### Stage 2: Building Obstruction (HIGH IMPACT) Currently: For each point, find nearby buildings, check if they obstruct path. **Vectorize**: Use spatial indexing but batch the geometry checks. - Pre-compute building bounding boxes as GPU arrays - For each point, ray-building intersection can be done as matrix operation - Building penetration loss is simple lookup after intersection NOTE: This is harder to vectorize because each point has different number of nearby buildings. Options: a) Pad to max buildings per point (wastes memory but simple) b) Use sparse representation c) Keep per-point but use GPU for the geometry math Recommend option (c) initially — keep the spatial query on CPU but move the trig/geometry calculations to GPU. ### Stage 3: Reflections (MEDIUM IMPACT, only on Full preset) Currently: For each point with buildings, compute reflection paths. This is the most complex calculation and hardest to vectorize. **Approach**: Keep reflections per-point for now, but optimize the inner math with vectorized operations. ### Stage 4: Vegetation Loss (LOW IMPACT) Simple lookup — not worth GPU overhead. ## Implementation Plan ### Step 1: Batch terrain profiling Add to coverage_service.py a new method: ```python def _batch_terrain_profiles(self, site_lat, site_lon, site_elev, grid_lats, grid_lons, grid_elevs, distances, frequency, num_samples=50): """Compute terrain LOS and diffraction loss for all points at once.""" xp = gpu_manager.get_array_module() N = len(grid_lats) # Interpolate terrain profiles for all points # Each profile: site → point, num_samples elevation values # Use terrain tile data directly # Compute Fresnel zone clearance for each profile # Compute knife-edge diffraction loss return terrain_losses # shape (N,) ``` ### Step 2: Batch building check Add method: ```python def _batch_building_obstruction(self, site_lat, site_lon, grid_lats, grid_lons, distances, buildings_spatial_index, all_buildings): """Compute building loss for all points at once.""" # For each point, query spatial index (CPU) # Batch the geometry intersection math (GPU) # Return losses return building_losses # shape (N,) ``` ### Step 3: Replace _run_point_loop Instead of ProcessPool workers, do: ```python # In calculate_coverage, after Phase 2.5: terrain_losses = self._batch_terrain_profiles(...) building_losses = self._batch_building_obstruction(...) # Final RSRP is now fully vectorized: rsrp = tx_power - precomputed_path_loss - terrain_losses - building_losses - veg_losses # + antenna_gains + reflection_gains ``` ### Step 4: Keep worker fallback If GPU not available or for very complex calculations (reflections), fall back to the existing per-point ProcessPool approach. ## Important Notes 1. **GPU code only in main process** — learned from 3.7.0, never import gpu_manager in workers 2. **Terrain data access** — terrain tiles are in memory, need efficient sampling for batch profiles 3. **CuPy ↔ NumPy bridge** — use `xp.asnumpy()` or `.get()` to convert back to CPU 4. **Memory** — 6,642 points × 50 terrain samples = 332,100 floats = 2.5 MB on GPU, no problem 5. **Accuracy** — results must match existing per-point calculation within 1 dB ## Testing ```powershell cd D:\root\rfcp\backend pyinstaller ..\installer\rfcp-server-gpu.spec --noconfirm .\dist\rfcp-server\rfcp-server.exe ``` Compare Full preset: - Before (3.7.0): ~195s for 6,642 points - Target (3.8.0): <30s for same calculation - Stretch goal: <10s Verify accuracy: - Run same location with GPU and CPU backend - Compare RSRP values — should be within 1 dB - Coverage percentages (Excellent/Good/Fair/Weak) should be very close ## What NOT to Change - Don't modify propagation model math (Okumura-Hata, COST-231, Free-Space formulas) - Don't change API endpoints or response format - Don't remove the ProcessPool fallback — keep it for CPU-only mode - Don't change OSM fetching or caching - Don't modify the frontend ## Success Criteria - [ ] Full preset completes in <30s (was 195s) - [ ] Standard preset completes in <5s (was 7.2s) - [ ] No CuPy errors in worker processes - [ ] CPU fallback still works - [ ] Results match within 1 dB accuracy - [ ] GPU utilization visible in Task Manager during calculation