mytec/rfcp

Files

mytec 81e078e92a @mytec : iter3.10 start, baseline rc ready

2026-02-04 15:56:09 +02:00

6.8 KiB

Raw Permalink Blame History

RFCP 3.8.0 — Vectorize Per-Point Coverage Calculations

Context

Iteration 3.7.0 added GPU precompute for distances + base path loss (Phase 2.5). But Phase 3 (per-point loop) still runs on CPU, one point at a time across workers. This is where 95% of time goes on Full preset (195s for 6,642 points).

Current pipeline:

Phase 2.5 (GPU, 0.01s): distances + base path_loss → precomputed arrays
Phase 3 (CPU, 195s): per-point terrain_loss, building_loss, reflections, vegetation

Goal: Vectorize the heavy per-point calculations so GPU handles them in bulk.

Architecture

The key insight: _calculate_point_sync (line ~1127) does these steps per point:

Terrain LOS check — get elevation profile between site and point, check clearance
Diffraction loss — knife-edge based on Fresnel zone clearance
Building obstruction — find buildings between site and point, calculate penetration loss
Materials penalty — add loss based on building material type
Dominant path analysis — LOS vs reflection vs diffraction
Street canyon — check if point is in urban canyon
Reflections — find reflection paths off buildings (most expensive!)
Vegetation loss — check vegetation between site and point
Final RSRP — tx_power - path_loss - terrain_loss - building_loss - veg_loss + gains

Strategy: Vectorize in Stages

NOT everything can be vectorized equally. Prioritize by time spent:

Stage 1: Terrain LOS + Diffraction (HIGH IMPACT)

Currently: For each point, sample ~50-100 elevation values along radial path, find min clearance, compute knife-edge diffraction.

Vectorize: Create 2D elevation profiles for ALL points at once.

All points share the same site location
For N points, create N terrain profiles (each M samples)
Compute Fresnel clearance for all profiles vectorized
Compute diffraction loss vectorized

# Instead of per-point:
for point in grid:
    profile = get_terrain_profile(site, point, num_samples=50)
    clearance = min_clearance(profile)
    loss = diffraction_loss(clearance, freq)

# Vectorized:
xp = gpu_manager.get_array_module()
# all_profiles shape: (N_points, M_samples)
all_profiles = get_terrain_profiles_batch(site, all_points, num_samples=50)
all_clearances = compute_clearances_batch(all_profiles, site_elev, point_elevs, distances)
all_terrain_loss = diffraction_loss_batch(all_clearances, freq)

Stage 2: Building Obstruction (HIGH IMPACT)

Currently: For each point, find nearby buildings, check if they obstruct path.

Vectorize: Use spatial indexing but batch the geometry checks.

Pre-compute building bounding boxes as GPU arrays
For each point, ray-building intersection can be done as matrix operation
Building penetration loss is simple lookup after intersection

NOTE: This is harder to vectorize because each point has different number of nearby buildings. Options: a) Pad to max buildings per point (wastes memory but simple) b) Use sparse representation c) Keep per-point but use GPU for the geometry math

Recommend option (c) initially — keep the spatial query on CPU but move the trig/geometry calculations to GPU.

Stage 3: Reflections (MEDIUM IMPACT, only on Full preset)

Currently: For each point with buildings, compute reflection paths. This is the most complex calculation and hardest to vectorize.

Approach: Keep reflections per-point for now, but optimize the inner math with vectorized operations.

Stage 4: Vegetation Loss (LOW IMPACT)

Simple lookup — not worth GPU overhead.

Implementation Plan

Step 1: Batch terrain profiling

Add to coverage_service.py a new method:

def _batch_terrain_profiles(self, site_lat, site_lon, site_elev,
                             grid_lats, grid_lons, grid_elevs, 
                             distances, frequency, num_samples=50):
    """Compute terrain LOS and diffraction loss for all points at once."""
    xp = gpu_manager.get_array_module()
    N = len(grid_lats)
    
    # Interpolate terrain profiles for all points
    # Each profile: site → point, num_samples elevation values
    # Use terrain tile data directly
    
    # Compute Fresnel zone clearance for each profile
    # Compute knife-edge diffraction loss
    
    return terrain_losses  # shape (N,)

Step 2: Batch building check

Add method:

def _batch_building_obstruction(self, site_lat, site_lon,
                                  grid_lats, grid_lons, 
                                  distances, buildings_spatial_index,
                                  all_buildings):
    """Compute building loss for all points at once."""
    # For each point, query spatial index (CPU)
    # Batch the geometry intersection math (GPU)
    # Return losses
    
    return building_losses  # shape (N,)

Step 3: Replace _run_point_loop

Instead of ProcessPool workers, do:

# In calculate_coverage, after Phase 2.5:
terrain_losses = self._batch_terrain_profiles(...)
building_losses = self._batch_building_obstruction(...)

# Final RSRP is now fully vectorized:
rsrp = tx_power - precomputed_path_loss - terrain_losses - building_losses - veg_losses
# + antenna_gains + reflection_gains

Step 4: Keep worker fallback

If GPU not available or for very complex calculations (reflections), fall back to the existing per-point ProcessPool approach.

Important Notes

GPU code only in main process — learned from 3.7.0, never import gpu_manager in workers
Terrain data access — terrain tiles are in memory, need efficient sampling for batch profiles
CuPy ↔ NumPy bridge — use xp.asnumpy() or .get() to convert back to CPU
Memory — 6,642 points × 50 terrain samples = 332,100 floats = 2.5 MB on GPU, no problem
Accuracy — results must match existing per-point calculation within 1 dB

Testing

cd D:\root\rfcp\backend
pyinstaller ..\installer\rfcp-server-gpu.spec --noconfirm
.\dist\rfcp-server\rfcp-server.exe

Compare Full preset:

Before (3.7.0): ~195s for 6,642 points
Target (3.8.0): <30s for same calculation
Stretch goal: <10s

Verify accuracy:

Run same location with GPU and CPU backend
Compare RSRP values — should be within 1 dB
Coverage percentages (Excellent/Good/Fair/Weak) should be very close

What NOT to Change

Don't modify propagation model math (Okumura-Hata, COST-231, Free-Space formulas)
Don't change API endpoints or response format
Don't remove the ProcessPool fallback — keep it for CPU-only mode
Don't change OSM fetching or caching
Don't modify the frontend

Success Criteria

Full preset completes in <30s (was 195s)
Standard preset completes in <5s (was 7.2s)
No CuPy errors in worker processes
CPU fallback still works
Results match within 1 dB accuracy
GPU utilization visible in Task Manager during calculation

6.8 KiB Raw Permalink Blame History Unescape Escape