150 lines
5.4 KiB
Markdown
150 lines
5.4 KiB
Markdown
# RFCP Session Summary — February 4, 2026
|
|
## GPU Acceleration Complete: 195s → 11.2s (17.4x Speedup)
|
|
|
|
---
|
|
|
|
## 🎯 Session Goal
|
|
Complete GPU acceleration pipeline and optimize Full preset performance.
|
|
|
|
## 📊 Results
|
|
|
|
### Performance Achievement
|
|
|
|
| Metric | Before (3.7.0) | After (3.8.0) | Improvement |
|
|
|--------|----------------|---------------|-------------|
|
|
| **Full preset** (6640 pts, 50m) | 195s | **11.2s** | **17.4x** |
|
|
| **Standard preset** (1975 pts, 200m) | 7.2s | **2.3s** (cached) | **3.1x** |
|
|
| Phase 2.5 (distances+path_loss) | 0.33s | **0.006s** | 55x |
|
|
| Phase 2.6 (terrain LOS) | 7.29s | **0.04s** | 182x |
|
|
| Per-point (workers) | 1.1ms | **0.1ms** | 11x |
|
|
|
|
### GPU Pipeline (Final Architecture)
|
|
|
|
```
|
|
Phase 1: OSM data fetch (Overpass API) ~6-10s (network)
|
|
Phase 2: Terrain tile download + cache ~4s first / 0s cached
|
|
Phase 2.5: GPU — distances + base path_loss 0.006s ⚡
|
|
Phase 2.6: GPU — terrain LOS + diffraction loss 0.04s ⚡
|
|
Phase 2.7: GPU — antenna pattern loss ~0s ⚡
|
|
Phase 3: CPU workers — buildings + vegetation ~2s
|
|
─────────────────────────────────────────────────
|
|
TOTAL (cached): ~2.3s (Standard)
|
|
TOTAL (cached): ~11.2s (Full)
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Changes Made (Iterations 3.7.0 → 3.8.0)
|
|
|
|
### Iteration 3.7.0 — GPU Precompute Foundation
|
|
- Added `gpu_manager` import to `coverage_service.py`
|
|
- Grid arrays created on GPU (CuPy)
|
|
- GPU precompute for distances + path_loss (vectorized)
|
|
- Fixed critical bug: CuPy worker process crashes (CUDA context sharing)
|
|
- Solution: GPU only in main process, workers use precomputed CPU values
|
|
- Fixed frontend duplicate calculation guard
|
|
|
|
### Iteration 3.8.0 — Full Vectorization
|
|
- **Phase 2.6**: `batch_terrain_los()` in `gpu_service.py`
|
|
- Vectorized terrain profile sampling for ALL points simultaneously
|
|
- Earth curvature correction vectorized
|
|
- Fresnel clearance + diffraction loss vectorized
|
|
- **Phase 2.7**: `batch_antenna_pattern()` in `gpu_service.py`
|
|
- Workers receive precomputed `has_los`, `terrain_loss`, `antenna_loss`
|
|
- Workers only compute buildings + reflections + vegetation
|
|
|
|
### Critical Fix: `_batch_elevation_lookup` Vectorization
|
|
- **Before**: Python `for` loop over 59,250 coordinates (7.29s)
|
|
- **After**: Vectorized NumPy tile indexing, loop only over tiles (0.04s)
|
|
- **Impact**: 182x speedup on Phase 2.6 alone
|
|
|
|
### Critical Fix: Vegetation Bbox Pre-filter
|
|
- **Before**: Each sample point checked ALL 683 vegetation polygons
|
|
- **After**: Bounding box pre-filter skips 95%+ of polygons
|
|
- **Impact**: Full preset 156s → 11.2s
|
|
|
|
---
|
|
|
|
## 📁 Files Modified
|
|
|
|
### Backend
|
|
- `app/services/coverage_service.py` — precomputed values passthrough
|
|
- `app/services/parallel_coverage_service.py` — 5 worker functions updated
|
|
- `app/services/gpu_service.py` — batch_terrain_los, batch_antenna_pattern, batch_final_rsrp
|
|
- `app/services/vegetation_service.py` — bbox pre-filter on _point_in_vegetation
|
|
|
|
### Build
|
|
- PyInstaller ONEDIR build: 1.6 GB dist → 1.2 GB NSIS installer
|
|
- CUDA DLLs bundled (cublas, cusparse, curand, etc.)
|
|
- Runtime hook for DLL directory setup
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture (Final State)
|
|
|
|
```
|
|
Main Process (asyncio event loop)
|
|
├── Phase 2.5: GPU precompute
|
|
│ └── CuPy arrays: distances, path_loss (vectorized)
|
|
├── Phase 2.6: GPU terrain LOS
|
|
│ └── Batch elevation lookup (vectorized NumPy)
|
|
│ └── Earth curvature + Fresnel (CuPy)
|
|
│ └── Diffraction loss (CuPy)
|
|
├── Phase 2.7: GPU antenna pattern
|
|
│ └── Bearing + pattern loss (CuPy)
|
|
│
|
|
└── Phase 3: CPU ProcessPool (3 workers)
|
|
└── Receive precomputed dict per point
|
|
└── Skip terrain/antenna (already computed)
|
|
└── Only: buildings + reflections + vegetation
|
|
└── Pure NumPy + CPU
|
|
```
|
|
|
|
**Key Rule**: GPU (CuPy) code ONLY in main process. Workers never import gpu_manager.
|
|
|
|
---
|
|
|
|
## 🎮 Side Activity: Dwarf Fortress Gamelog Analysis
|
|
|
|
Analyzed 102,669-line gamelog from fort "Lashderush (Prophethandle)":
|
|
- 8-9 years, 23 migrant waves, 1,943 masterpieces
|
|
- 51,599 combat actions, only 4 deaths (weredeer outbreak)
|
|
- Top crafter: Momuz Nëkorlibash (201 masterpieces)
|
|
- Sole survivor transforms between dwarf/weredeer
|
|
|
|
---
|
|
|
|
## 🔮 Next Steps
|
|
|
|
### Immediate
|
|
- [x] ~~GPU acceleration~~ ✅ COMPLETE
|
|
- [ ] SRTM terrain data integration (higher accuracy than current tiles)
|
|
- [ ] Session history persistence across app restarts
|
|
|
|
### Short Term
|
|
- [ ] Multi-station dashboard
|
|
- [ ] Project export/import (JSON)
|
|
- [ ] Link budget analysis view
|
|
|
|
### Medium Term
|
|
- [ ] LimeSDR hardware integration testing
|
|
- [ ] Real RF validation against field measurements
|
|
- [ ] 3D visualization mode
|
|
|
|
---
|
|
|
|
## 💡 Key Learnings
|
|
|
|
1. **Python for-loops are the enemy** — `_batch_elevation_lookup` went from 7.3s to 0.04s by replacing enumerate(zip()) with NumPy indexing
|
|
2. **Spatial pre-filtering is massive** — vegetation bbox check eliminated 95%+ of polygon tests
|
|
3. **GPU context can't be shared across processes** — spawn mode creates new CUDA contexts that OOM
|
|
4. **Vectorize in main, distribute to workers** — best pattern for GPU + multiprocessing
|
|
5. **Profile before optimizing** — Phase 2.6 bottleneck was invisible until measured
|
|
|
|
---
|
|
|
|
*Session duration: ~4 hours*
|
|
*Lines of code changed: ~300*
|
|
*Performance gain: 17.4x*
|
|
*Feeling: 🚀*
|