5.4 KiB
5.4 KiB
RFCP Session Summary — February 4, 2026
GPU Acceleration Complete: 195s → 11.2s (17.4x Speedup)
🎯 Session Goal
Complete GPU acceleration pipeline and optimize Full preset performance.
📊 Results
Performance Achievement
| Metric | Before (3.7.0) | After (3.8.0) | Improvement |
|---|---|---|---|
| Full preset (6640 pts, 50m) | 195s | 11.2s | 17.4x |
| Standard preset (1975 pts, 200m) | 7.2s | 2.3s (cached) | 3.1x |
| Phase 2.5 (distances+path_loss) | 0.33s | 0.006s | 55x |
| Phase 2.6 (terrain LOS) | 7.29s | 0.04s | 182x |
| Per-point (workers) | 1.1ms | 0.1ms | 11x |
GPU Pipeline (Final Architecture)
Phase 1: OSM data fetch (Overpass API) ~6-10s (network)
Phase 2: Terrain tile download + cache ~4s first / 0s cached
Phase 2.5: GPU — distances + base path_loss 0.006s ⚡
Phase 2.6: GPU — terrain LOS + diffraction loss 0.04s ⚡
Phase 2.7: GPU — antenna pattern loss ~0s ⚡
Phase 3: CPU workers — buildings + vegetation ~2s
─────────────────────────────────────────────────
TOTAL (cached): ~2.3s (Standard)
TOTAL (cached): ~11.2s (Full)
🔧 Changes Made (Iterations 3.7.0 → 3.8.0)
Iteration 3.7.0 — GPU Precompute Foundation
- Added
gpu_managerimport tocoverage_service.py - Grid arrays created on GPU (CuPy)
- GPU precompute for distances + path_loss (vectorized)
- Fixed critical bug: CuPy worker process crashes (CUDA context sharing)
- Solution: GPU only in main process, workers use precomputed CPU values
- Fixed frontend duplicate calculation guard
Iteration 3.8.0 — Full Vectorization
- Phase 2.6:
batch_terrain_los()ingpu_service.py- Vectorized terrain profile sampling for ALL points simultaneously
- Earth curvature correction vectorized
- Fresnel clearance + diffraction loss vectorized
- Phase 2.7:
batch_antenna_pattern()ingpu_service.py - Workers receive precomputed
has_los,terrain_loss,antenna_loss - Workers only compute buildings + reflections + vegetation
Critical Fix: _batch_elevation_lookup Vectorization
- Before: Python
forloop over 59,250 coordinates (7.29s) - After: Vectorized NumPy tile indexing, loop only over tiles (0.04s)
- Impact: 182x speedup on Phase 2.6 alone
Critical Fix: Vegetation Bbox Pre-filter
- Before: Each sample point checked ALL 683 vegetation polygons
- After: Bounding box pre-filter skips 95%+ of polygons
- Impact: Full preset 156s → 11.2s
📁 Files Modified
Backend
app/services/coverage_service.py— precomputed values passthroughapp/services/parallel_coverage_service.py— 5 worker functions updatedapp/services/gpu_service.py— batch_terrain_los, batch_antenna_pattern, batch_final_rsrpapp/services/vegetation_service.py— bbox pre-filter on _point_in_vegetation
Build
- PyInstaller ONEDIR build: 1.6 GB dist → 1.2 GB NSIS installer
- CUDA DLLs bundled (cublas, cusparse, curand, etc.)
- Runtime hook for DLL directory setup
🏗️ Architecture (Final State)
Main Process (asyncio event loop)
├── Phase 2.5: GPU precompute
│ └── CuPy arrays: distances, path_loss (vectorized)
├── Phase 2.6: GPU terrain LOS
│ └── Batch elevation lookup (vectorized NumPy)
│ └── Earth curvature + Fresnel (CuPy)
│ └── Diffraction loss (CuPy)
├── Phase 2.7: GPU antenna pattern
│ └── Bearing + pattern loss (CuPy)
│
└── Phase 3: CPU ProcessPool (3 workers)
└── Receive precomputed dict per point
└── Skip terrain/antenna (already computed)
└── Only: buildings + reflections + vegetation
└── Pure NumPy + CPU
Key Rule: GPU (CuPy) code ONLY in main process. Workers never import gpu_manager.
🎮 Side Activity: Dwarf Fortress Gamelog Analysis
Analyzed 102,669-line gamelog from fort "Lashderush (Prophethandle)":
- 8-9 years, 23 migrant waves, 1,943 masterpieces
- 51,599 combat actions, only 4 deaths (weredeer outbreak)
- Top crafter: Momuz Nëkorlibash (201 masterpieces)
- Sole survivor transforms between dwarf/weredeer
🔮 Next Steps
Immediate
GPU acceleration✅ COMPLETE- SRTM terrain data integration (higher accuracy than current tiles)
- Session history persistence across app restarts
Short Term
- Multi-station dashboard
- Project export/import (JSON)
- Link budget analysis view
Medium Term
- LimeSDR hardware integration testing
- Real RF validation against field measurements
- 3D visualization mode
💡 Key Learnings
- Python for-loops are the enemy —
_batch_elevation_lookupwent from 7.3s to 0.04s by replacing enumerate(zip()) with NumPy indexing - Spatial pre-filtering is massive — vegetation bbox check eliminated 95%+ of polygon tests
- GPU context can't be shared across processes — spawn mode creates new CUDA contexts that OOM
- Vectorize in main, distribute to workers — best pattern for GPU + multiprocessing
- Profile before optimizing — Phase 2.6 bottleneck was invisible until measured
Session duration: ~4 hours Lines of code changed: ~300 Performance gain: 17.4x Feeling: 🚀