mytec/rfcp

Fork 0

Files

mytec 81e078e92a @mytec : iter3.10 start, baseline rc ready

2026-02-04 15:56:09 +02:00

5.4 KiB

Raw Blame History

RFCP Session Summary — February 4, 2026

GPU Acceleration Complete: 195s → 11.2s (17.4x Speedup)

🎯 Session Goal

Complete GPU acceleration pipeline and optimize Full preset performance.

📊 Results

Performance Achievement

Metric	Before (3.7.0)	After (3.8.0)	Improvement
Full preset (6640 pts, 50m)	195s	11.2s	17.4x
Standard preset (1975 pts, 200m)	7.2s	2.3s (cached)	3.1x
Phase 2.5 (distances+path_loss)	0.33s	0.006s	55x
Phase 2.6 (terrain LOS)	7.29s	0.04s	182x
Per-point (workers)	1.1ms	0.1ms	11x

GPU Pipeline (Final Architecture)

Phase 1:   OSM data fetch (Overpass API)          ~6-10s (network)
Phase 2:   Terrain tile download + cache           ~4s first / 0s cached
Phase 2.5: GPU — distances + base path_loss        0.006s ⚡
Phase 2.6: GPU — terrain LOS + diffraction loss    0.04s  ⚡
Phase 2.7: GPU — antenna pattern loss              ~0s    ⚡
Phase 3:   CPU workers — buildings + vegetation     ~2s    
─────────────────────────────────────────────────
TOTAL (cached):                                    ~2.3s (Standard)
TOTAL (cached):                                    ~11.2s (Full)

🔧 Changes Made (Iterations 3.7.0 → 3.8.0)

Iteration 3.7.0 — GPU Precompute Foundation

Added gpu_manager import to coverage_service.py
Grid arrays created on GPU (CuPy)
GPU precompute for distances + path_loss (vectorized)
Fixed critical bug: CuPy worker process crashes (CUDA context sharing)
Solution: GPU only in main process, workers use precomputed CPU values
Fixed frontend duplicate calculation guard

Iteration 3.8.0 — Full Vectorization

Phase 2.6: batch_terrain_los() in gpu_service.py
- Vectorized terrain profile sampling for ALL points simultaneously
- Earth curvature correction vectorized
- Fresnel clearance + diffraction loss vectorized
Phase 2.7: batch_antenna_pattern() in gpu_service.py
Workers receive precomputed has_los, terrain_loss, antenna_loss
Workers only compute buildings + reflections + vegetation

Critical Fix: `_batch_elevation_lookup` Vectorization

Before: Python for loop over 59,250 coordinates (7.29s)
After: Vectorized NumPy tile indexing, loop only over tiles (0.04s)
Impact: 182x speedup on Phase 2.6 alone

Critical Fix: Vegetation Bbox Pre-filter

Before: Each sample point checked ALL 683 vegetation polygons
After: Bounding box pre-filter skips 95%+ of polygons
Impact: Full preset 156s → 11.2s

📁 Files Modified

Backend

app/services/coverage_service.py — precomputed values passthrough
app/services/parallel_coverage_service.py — 5 worker functions updated
app/services/gpu_service.py — batch_terrain_los, batch_antenna_pattern, batch_final_rsrp
app/services/vegetation_service.py — bbox pre-filter on _point_in_vegetation

Build

PyInstaller ONEDIR build: 1.6 GB dist → 1.2 GB NSIS installer
CUDA DLLs bundled (cublas, cusparse, curand, etc.)
Runtime hook for DLL directory setup

🏗️ Architecture (Final State)

Main Process (asyncio event loop)
├── Phase 2.5: GPU precompute
│   └── CuPy arrays: distances, path_loss (vectorized)
├── Phase 2.6: GPU terrain LOS  
│   └── Batch elevation lookup (vectorized NumPy)
│   └── Earth curvature + Fresnel (CuPy)
│   └── Diffraction loss (CuPy)
├── Phase 2.7: GPU antenna pattern
│   └── Bearing + pattern loss (CuPy)
│
└── Phase 3: CPU ProcessPool (3 workers)
    └── Receive precomputed dict per point
    └── Skip terrain/antenna (already computed)
    └── Only: buildings + reflections + vegetation
    └── Pure NumPy + CPU

Key Rule: GPU (CuPy) code ONLY in main process. Workers never import gpu_manager.

🎮 Side Activity: Dwarf Fortress Gamelog Analysis

Analyzed 102,669-line gamelog from fort "Lashderush (Prophethandle)":

8-9 years, 23 migrant waves, 1,943 masterpieces
51,599 combat actions, only 4 deaths (weredeer outbreak)
Top crafter: Momuz Nëkorlibash (201 masterpieces)
Sole survivor transforms between dwarf/weredeer

🔮 Next Steps

Immediate

~~GPU acceleration~~ ✅ COMPLETE
SRTM terrain data integration (higher accuracy than current tiles)
Session history persistence across app restarts

Short Term

Multi-station dashboard
Project export/import (JSON)
Link budget analysis view

Medium Term

LimeSDR hardware integration testing
Real RF validation against field measurements
3D visualization mode

💡 Key Learnings

Python for-loops are the enemy — _batch_elevation_lookup went from 7.3s to 0.04s by replacing enumerate(zip()) with NumPy indexing
Spatial pre-filtering is massive — vegetation bbox check eliminated 95%+ of polygon tests
GPU context can't be shared across processes — spawn mode creates new CUDA contexts that OOM
Vectorize in main, distribute to workers — best pattern for GPU + multiprocessing
Profile before optimizing — Phase 2.6 bottleneck was invisible until measured

Session duration: ~4 hours Lines of code changed: ~300 Performance gain: 17.4x Feeling: 🚀

5.4 KiB Raw Blame History