Files
rfcp/docs/devlog/installer/compass_artifact_wf-6a6a41ed-a3b3-461d-9341-81ce0ff4260d_text_markdown.md
2026-02-02 21:30:00 +02:00

19 KiB
Raw Permalink Blame History

RF Coverage Planning Software: Performance Optimization and Propagation Models

The performance gap between fast per-point calculations (~1ms) and slow overall throughput (~258ms/point) is caused by pickle serialization overhead in Python multiprocessing, which dominates actual compute time when processing small batches. The solution involves batching 1000+ points per IPC round-trip, using shared memory for terrain data, and leveraging GPU acceleration for workloads exceeding 10,000 points—achieving 10-50x speedups. Modern RF coverage tools like Signal-Server, SPLAT!, and Sionna RT demonstrate that combining radial segment parallelization, multi-resolution terrain tiling, and appropriate propagation model selection (Longley-Rice ITM for terrain-based VHF/UHF, COST-231 Hata for cellular) enables efficient large-area calculations while maintaining accuracy within 6-10 dB standard deviation.


The multiprocessing bottleneck: why per-point speed deceives

The dramatic discrepancy between fast individual point calculations and slow aggregate throughput stems from a classic Python multiprocessing anti-pattern where inter-process communication overhead dominates computation time. When each worker processes a single point or small batch, the system spends more time serializing and deserializing data than performing actual RF calculations.

Python's multiprocessing uses pickle for IPC by default, requiring objects to be serialized twice per task (sending to worker and returning results). For RF calculations involving terrain data, DEM arrays, and GIS features, this serialization cost becomes catastrophic. Research shows that pickling a 40 MB dictionary four times per task can cause a 600% slowdown. The situation worsens because spawning a subprocess takes approximately 50ms (50,000µs) compared to ~100µs for a thread—making process pool initialization per-request extremely expensive.

The solution architecture requires three fundamental changes. First, batch operations must amortize serialization costs by processing 1,000-10,000 points per IPC round-trip rather than individual points. Second, shared memory (multiprocessing.shared_memory or numpy.memmap) should hold terrain data to eliminate pickle overhead entirely. Third, process pools must be pre-initialized at application startup rather than per-request:

# Anti-pattern: Single-point processing (slow)
with Pool() as pool:
    results = pool.map(calculate_point, points)  # Each point pickled separately

# Optimal pattern: Batch processing with shared memory
from multiprocessing import shared_memory
shm = shared_memory.SharedMemory(create=True, size=terrain_data.nbytes)
chunk_size = 1000  # Process 1000 points per IPC round-trip
batches = [points[i:i+chunk_size] for i in range(0, len(points), chunk_size)]

The target metric is ensuring computation time exceeds serialization time by 10-100x. For a 1ms per-point calculation, this means batching at least 100-1000 points to make serialization overhead negligible.


Open-source RF tools reveal proven optimization architectures

Signal-Server, the C++14 multi-threaded engine that powered CloudRF from 2012-2016, demonstrates the foundational architecture for RF coverage calculations. Its primary improvement over the original SPLAT! was multi-threading through radial segment parallelization—splitting the circular coverage area so multiple threads process different azimuth ranges simultaneously. The implementation uses POSIX threads with configurable segment counts (must be even and greater than 4), processing up to 32 terrain tiles simultaneously with support for gzip/bzip2 compressed tiles for faster I/O.

Signal-Server supports 12 propagation models through a simple command-line parameter: ITM (Longley-Rice), line-of-sight, Hata, ECC33, SUI, COST-Hata, free-space, ITWOM, Ericsson, Plane Earth, Egli, and Soil models. The terrain tiling system uses SDF format converted from SRTM HGT files, supporting resolutions of 300/600/1200/3600 pixels per tile with automatic multi-tile loading based on calculation bounds.

SPLAT! (Signal Propagation, Loss, And Terrain), the foundational tool started in 1997, uses a radial ray-casting algorithm that projects rays from the transmitter in all azimuths (0-360°), samples terrain elevation along each path, and applies Longley-Rice ITM calculations to the terrain profile. Its Longley-Rice integration handles three prediction ranges (line-of-sight, diffraction, scatter) with terrain irregularity parameter Δh(d) computed from terrain samples. Key parameters include earth dielectric constant (5-80), ground conductivity (0.001-5.0 S/m), atmospheric refractivity (250-400 N-units), and climate zone selection.

Sionna RT by NVIDIA represents the state-of-the-art in GPU-accelerated RF simulation, using differentiable ray tracing built on TensorFlow, Mitsuba 3, and Dr.Jit. Its key innovation enables gradient computation through channel impulse responses with respect to material properties, antenna patterns, and transmitter/receiver positions—making it suitable for ML-integrated optimization. The path solver supports both Shooting and Bouncing Rays (SBR) and the Image Method, handling direct LOS paths, reflections, diffractions, and scattering patterns. Memory efficiency improvements in version 1.0 support scenes with 3D building models from OpenStreetMap, while configurable path loss thresholds and angular separation control enable scalable computation.

CloudRF's SLEIPNIR engine (replacing Signal-Server in 2019) achieves up to 10x faster performance through multi-resolution modeling that seamlessly merges different resolution data sources, dual CPU/GPU engines (78% speedup with GPU for clutter calculations), and 1m LiDAR resolution support with global 10m land cover integration.


VHF and UHF propagation models differ fundamentally from cellular bands

The Longley-Rice Irregular Terrain Model (ITM) serves as the most comprehensive model for terrain-based VHF/UHF propagation, predicting median attenuation over irregular terrain for frequencies from 20 MHz to 20 GHz across distances of 1-2000 km. The model handles five propagation mechanisms: free-space loss, terrain diffraction (multiple knife-edge), ground reflection, atmospheric refraction (4/3 Earth radius approximation), and tropospheric scatter beyond the horizon. Statistical variables include time, location, and situation variability ranging from 0.01 to 0.99, with typical accuracy of ±6-10 dB standard deviation for point-to-point mode.

Critical ITM parameters require careful selection based on environment:

Ground Type Permittivity Conductivity (S/m)
Average Ground 15 0.005
Poor Ground 4 0.001
Good Ground 25 0.020
Fresh Water 81 0.010
Sea Water 81 5.0

ITU-R P.1546 provides empirical field-strength curves for 30 MHz to 4 GHz based on extensive Northern Hemisphere measurements, covering distances of 1-1000 km with time percentages of 1%, 10%, and 50%. The model uses reference frequencies of 100, 600, and 2000 MHz with interpolation for other frequencies, applying corrections for terrain clearance angle, receiving antenna height, clutter losses, and mixed land/sea paths.

For UHF and cellular bands, the Okumura-Hata model (150-1500 MHz, 1-20 km distance) and its COST-231 extension (1500-2000 MHz) provide rapid empirical calculations with 6-8 dB standard deviation in urban environments. The urban path loss formula is:

L_urban = 69.55 + 26.16*log10(f) - 13.82*log10(h_b) - a(h_m) 
          + (44.9 - 6.55*log10(h_b))*log10(d)

Where a(h_m) is the mobile antenna correction factor varying by city size and frequency. Suburban and rural corrections reduce urban loss by 2*(log10(f/28))² + 5.4 dB and 4.78*(log10(f))² - 18.33*log10(f) + 40.94 dB respectively.

The key propagation differences across frequency bands are dramatic: VHF wavelengths (1-10m) enable strong diffraction around obstacles but poor building penetration, while UHF (0.1-1m wavelength) provides better building penetration but weaker terrain following. Cellular frequencies (1800+ MHz) have the highest free-space loss baseline, weakest diffraction, and moderate building penetration. Vegetation penetration follows the opposite pattern—VHF penetrates foliage better than higher frequencies where specific attenuation increases significantly.


Terrain diffraction models handle mountainous areas differently

The single knife-edge diffraction model (ITU-R P.526) calculates the Fresnel parameter v and corresponding loss:

v = h * sqrt(2 * (d1 + d2) / (wavelength * d1 * d2))
# For v > -0.78:
if v < 0: loss = 6.02 + 9.11*v - 1.27*
elif v < 2.4: loss = 6.02 + 9.11*v + 1.65*
else: loss = 12.953 + 20*log10(v)

For multiple obstacles, the Deygout method finds the main obstacle (highest Fresnel parameter v between transmitter and receiver), calculates its diffraction loss, then recursively finds secondary obstacles on each side. It provides better accuracy for widely spaced obstacles (2-4 ridges) but tends to overestimate for closely spaced obstacles. The Epstein-Peterson method calculates diffraction loss sequentially from transmitter to receiver, providing better accuracy for closely spaced obstacles but underestimating for widely separated ones.

The Bullington equivalent single edge method replaces all obstacles with one equivalent knife edge, providing the simplest and fastest calculation but often underestimating loss (too optimistic)—useful only for initial estimates. Professional tools like CloudRF implement Delta-Bullington as the default for its balance of accuracy and speed, with configurable options including Huygens (basic), sequential multi-obstacle, and Deygout 94 with combining factor.


GPU acceleration delivers 10-50x speedups for appropriate workloads

The RF calculations benefiting most from GPU acceleration are embarrassingly parallel operations: ray tracing (10-100x+ speedup with NVIDIA OptiX), FFT operations (cuFFT highly optimized), viewshed/LOS calculations (CloudRF reports 50x faster than CPU), and batch path loss calculations for many points. Matrix operations in propagation models benefit from cuBLAS, while terrain correlation matrices and large array operations see significant acceleration.

CuPy provides a drop-in NumPy replacement for NVIDIA GPUs with 10-100x speedups for large arrays (>100,000 elements):

import cupy as cp
terrain_gpu = cp.asarray(terrain_data)
distances = cp.sqrt(cp.sum((points_gpu - tx_position)**2, axis=1))
path_loss = 20 * cp.log10(distances) + 20 * cp.log10(frequency_mhz) - 27.55
results = path_loss.get()  # Transfer back to CPU

Numba CUDA enables writing custom GPU kernels in Python for complex propagation models requiring control flow:

from numba import cuda
import math

@cuda.jit
def free_space_path_loss_kernel(distances, frequency, output):
    idx = cuda.grid(1)
    if idx < distances.shape[0]:
        output[idx] = 20 * math.log10(distances[idx]) + 20 * math.log10(frequency) - 27.55

Minimum problem sizes for GPU benefit are: 10,000+ elements for array operations, 1,024+ points for FFT, 512x512+ for matrix multiply, and 5,000+ points for path loss calculations. Memory transfer overhead (PCIe 3.0: ~8 GB/s practical) means the critical formula is GPU_worthwhile = compute_time > (2 × transfer_time). For 100MB terrain data, transfer overhead is approximately 5-12ms.

AMD ROCm/HIP provides cross-platform compatibility through CuPy (pip install cupy-rocm-5-0), with PyTorch and TensorFlow also offering official ROCm builds. Intel integrated graphics support via PyOpenCL achieves 2-10x speedups over CPU (3-6x slower than discrete GPUs), suitable for edge deployments with moderate workloads (10,000-100,000 points).


Environment modeling requires frequency-dependent clutter coefficients

ITU-R P.1812-6 defines default clutter heights and losses by environment type: dense urban (20-25m height, 15-25 dB loss), urban (15-20m, 10-20 dB), suburban (9-12m, 5-15 dB), rural (0-4m, 0-5 dB), and forest (15-20m, 10-25 dB). The 3GPP TR 38.901 path loss models define specific scenarios: UMa (Urban Macro) with 25m base station height, UMi (Urban Micro Street Canyon) with 10m base station, RMa (Rural Macro), and InF (Indoor Factory) variants.

For vegetation, ITU-R P.833-10 specifies excess attenuation using A_ev = A_m * (1 - exp(-d*γ/A_m)) where specific attenuation γ varies by frequency: 0.06 dB/m at 200 MHz, 0.20 dB/m at 1 GHz, and 0.60 dB/m at 5 GHz for in-leaf conditions. Seasonal variation reduces loss by approximately 20% out-of-leaf for deciduous forests, with 2 dB variation at 900 MHz increasing to 8.5 dB at 1800+ MHz.

Building entry loss per ITU-R P.2109 distinguishes traditional buildings (median 10-16 dB at 100 MHz to 2 GHz) from thermally-efficient modern buildings with metallized glass and foil insulation (25-32 dB). Material-specific losses from 3GPP TR 38.901 show standard glass at 2.4 dB at 2 GHz, concrete at 13 dB at 2 GHz increasing to 117 dB at 28 GHz, and IRR/Low-E glass at 23.6 dB at 2 GHz.


Machine learning and hybrid approaches complement physics-based models

Current ML approaches for path loss prediction rank by accuracy: XGBoost/Gradient Boosting (RMSE: 2.1-3.4 dB, best for small-medium datasets), Neural Network Ensembles (2.5-4.0 dB), Random Forest (3.0-4.5 dB), and Deep Neural Networks (3.0-5.0 dB). Training data requirements scale predictably: <1,000 samples yield RMSE 6-10 dB, 10,000-100,000 samples achieve production-quality RMSE 2-4 dB.

Hybrid physics+ML architectures prove most effective. The ML Correction approach calculates PL_total = PL_empirical(d, f, h_tx, h_rx) + ΔPL_ML(features) where ΔPL_ML learns systematic biases. The LOS/NLOS Ensemble uses a classifier to weight separate LOS and NLOS regressors. Physics-Informed Neural Networks add penalty terms that enforce physical constraints like "path loss should increase with distance" and "FSPL provides a lower bound."

Pre-computed propagation databases store path loss values at 20-50 bytes per grid cell, enabling sub-millisecond lookups. For a 10km radius at 30m resolution (~349,000 cells), storage is approximately 7 MB compressed. Interpolation techniques range from fast bilinear (1-2 dB error) to kriging (higher accuracy with uncertainty estimates).


Tile-based caching enables responsive coverage map delivery

The optimal caching architecture uses XYZ (Slippy Map) tiles with multi-tier storage: L1 in-memory Redis (sub-millisecond access, ~100GB capacity), L2 disk cache (SQLite/MBTiles format), and L3 cloud storage (S3 for permanent pre-computed tiles). Cache keys should incorporate parameter hashes for instant invalidation when transmitter settings change:

def get_tile_key(z: int, x: int, y: int, params_hash: str) -> str:
    return f"tile:coverage:{params_hash}:{z}:{x}:{y}"

For dynamic coverage, TTL-based expiration (15 minutes to 24 hours) combined with Redis pub/sub channels (map:update:region:*) enables targeted geographic invalidation. The hybrid approach pre-computes base zoom levels (z=6-12) for commonly accessed areas while generating higher zoom levels (z>12) on-demand.

Level of Detail (LOD) techniques adapt computation intensity to distance: Tier 1 (0-500m) uses full 3D building geometry with 1m terrain resolution, Tier 2 (500m-2km) uses simplified buildings with 10m terrain, Tier 3 (2-10km) uses clutter heights only with 30m terrain, and Tier 4 (>10km) uses statistical clutter with 90m SRTM terrain. Adaptive grid generation provides higher resolution near the transmitter (10m) transitioning to coarser resolution (100m) at distance, reducing computation while maintaining visual quality where it matters.


The optimal stack combines FastAPI (async API gateway with rate limiting), Celery (distributed task queue for heavy RF calculations), Redis (tile caching and job status), and CuPy/Numba (GPU acceleration). Terrain data should use numpy.memmap for memory-mapped access to large DEMs with STRtree spatial indexing for tile lookups via Shapely.

For the propagation engine, implement Longley-Rice ITM as the primary terrain model (using the itmlogic Python package), COST-231 Hata for quick urban estimates, and Deygout diffraction for multiple terrain obstacles. The model selection logic should consider frequency range (Hata for 150-1500 MHz, COST-231 for 1500-2000 MHz, ITM for terrain-specific), distance (empirical for <20km, ITM for longer paths), and accuracy requirements (ray tracing only for <5km urban scenarios).

class GPURFEngine:
    def __init__(self, max_points=1_000_000):
        # Pre-allocate GPU memory at startup
        self.d_buffer = cp.empty((max_points, 3), dtype=cp.float32)
        
    async def calculate_coverage(self, points: np.ndarray) -> np.ndarray:
        if len(points) < 1000:
            return self._cpu_fallback(points)  # Small workloads on CPU
        # GPU path for large workloads
        d_points = cp.asarray(points)
        # ... GPU computation
        return results.get()

Celery configuration should use separate queues for fast (cached), compute (full calculation), and batch operations, with worker_prefetch_multiplier=1 for heavy tasks and task_acks_late=True for reliability. Output formats should include PNG tiles with colormap lookup for web display and Cloud-Optimized GeoTIFF for professional GIS integration.


Conclusion

Building efficient RF coverage planning software requires addressing the fundamental mismatch between fast per-point propagation calculations and the overhead of Python's multiprocessing model. Batch processing (1000+ points per IPC round-trip), shared memory for terrain data, and GPU acceleration for workloads exceeding 10,000 points provide the foundation for achieving throughput within an order of magnitude of commercial tools.

The propagation model selection should follow a tiered approach: Longley-Rice ITM for terrain-based VHF/UHF planning with available DEM data, Okumura-Hata/COST-231 for rapid urban cellular estimates, and Deygout diffraction for mountainous terrain with multiple obstacles. Environment modeling through ITU-R P.2108/P.2109/P.833 provides standardized clutter, building entry, and vegetation loss coefficients that maintain accuracy across diverse deployment scenarios.

The most impactful optimizations in order of implementation priority are: fixing the multiprocessing serialization bottleneck (immediate 100x throughput improvement), implementing tile-based caching with parameter-hash keys (sub-millisecond repeat queries), adding GPU acceleration for large coverage maps (10-50x for >10,000 points), and incorporating LOD techniques (3-10x computation reduction with minimal accuracy impact). This architecture enables a Python/FastAPI backend to compete with commercial tools while maintaining the flexibility for custom propagation models and ML integration.