@mytec: feat: Phase 3.0 Architecture Refactor

Major refactoring of RFCP backend:
- Modular propagation models (8 models)
- SharedMemoryManager for terrain data
- ProcessPoolExecutor parallel processing
- WebSocket progress streaming
- Building filtering pipeline (351k → 15k)
- 82 unit tests

Performance: Standard preset 38s → 5s (7.6x speedup)

Known issue: Detailed preset timeout (fix in 3.1.0)
This commit is contained in:
2026-02-01 23:12:26 +02:00
parent 1dde56705a
commit defa3ad440
71 changed files with 7134 additions and 256 deletions

View File

@@ -0,0 +1,233 @@
# RFCP Development Session Summary
## Date: February 1, 2025 (actually 2026)
## Status: Phase 3.0 Complete, Performance Optimization Ongoing
---
## 🎯 Project Overview
**RFCP (Radio Frequency Coverage Planning)** — desktop application for tactical LTE network planning, part of UMTC (Ukrainian Military Tactical Communications) project.
**Tech Stack:**
- Backend: Python/FastAPI + NumPy + ProcessPoolExecutor
- Frontend: React + TypeScript + Vite
- Desktop: Electron
- Build: PyInstaller (backend), electron-builder (desktop)
**Goal:** Calculate RF coverage maps with terrain, buildings, vegetation analysis.
---
## ✅ What Works (Phase 3.0 Achievements)
### Performance
| Preset | Before | After | Status |
|--------|--------|-------|--------|
| Standard (100-200m res) | 38s | **~5s** | ✅ EXCELLENT |
| Detailed (300m, 5km) | timeout | timeout | ❌ Still broken |
### Architecture (48 new files, 82 tests)
- ✅ Modular propagation models (8 models: FreeSpace, Okumura-Hata, COST-231, ITU-R P.1546, etc.)
- ✅ SharedMemoryManager for terrain data (zero-copy, 25 MB)
- ✅ Building filtering (351k → 27k bbox → 15k cap)
- ✅ WebSocket progress streaming (backend works)
- ✅ Clean model selection by frequency/environment
- ✅ Worker cleanup on shutdown
- ✅ Overpass API retry with failover (3 attempts, mirror endpoint)
### New Files Structure
```
backend/app/
├── propagation/ # 8 model files
├── geometry/ # 5 files (haversine, intersection, reflection, diffraction, los)
├── core/ # 4 files (engine, grid, calculator, result)
├── parallel/ # 3 files (manager, worker, pool)
├── services/ # cache.py, osm_client.py
├── utils/ # logging.py, progress.py, units.py
└── api/websocket.py
frontend/src/
├── hooks/useWebSocket.ts
├── services/websocket.ts
└── components/FrequencyBandPanel.tsx
```
---
## ❌ Current Blockers
### 1. Detailed Preset Timeout (CRITICAL)
**Symptom:** 300s timeout, only 194/868 points calculated
**Latest test results:**
```
[DOMINANT_PATH_VEC] Point #1: buildings=30, walls=214, dist=4887m
302.8ms/point × 868 points = 262 seconds
```
**Root Cause Analysis:**
- Early return fix (Claude Code) was for `buildings=[]` case
- But in reality, buildings ARE present (15,000 after cap)
- Each point finds 17-30 nearby buildings
- Each building has 100-295 wall segments
- **dominant_path_service** geometry calculations are expensive
**The real problem is NOT "buildings=0 is slow"**
**The real problem IS "dominant_path with buildings is inherently slow"**
**Potential solutions:**
1. Simplify building geometry (reduce wall count)
2. Use spatial indexing more aggressively
3. Skip dominant_path for distant points (>3km?)
4. Reduce building query radius
5. Use simpler path loss model when buildings present
6. GPU acceleration (CuPy) for geometry
### 2. Progress Bar Stuck at "Initializing 5%"
**Symptom:** UI shows "Initializing 5%" forever
**Fix attempted:** `await asyncio.sleep(0)` after progress_fn() — not working
**Likely cause:** Frontend WebSocket connection or state update issue
### 3. App Close Broken
**Symptom:** Clicking X kills backend but frontend stays open
**Partial fix:** Worker cleanup works, but Electron window doesn't close
### 4. Memory Not Released
**Symptom:** 1328 MB not freed after calculation
```
Before: 3904 MB free
After: 2576 MB free
```
---
## 📊 Performance Analysis
### Why Detailed is slow (the math):
```
Points: 868
Buildings nearby per point: ~25 average
Walls per building: ~150 average
Wall intersection checks: 868 × 25 × 150 = 3,255,000
At 0.1ms per check = 325 seconds
```
### Why Standard is fast:
- Lower resolution = fewer points (~200 vs 868)
- Likely skips some detailed calculations
- Buildings still processed but fewer points to check
---
## 🔧 Key Files to Review
### Backend (performance critical)
```
backend/app/services/
├── dominant_path_service.py # THE BOTTLENECK
├── coverage_service.py # Orchestration, progress
├── parallel_coverage_service.py # Worker management
└── buildings_service.py # OSM fetch, caching
```
### Frontend (UI bugs)
```
frontend/src/
├── App.tsx # Progress display
├── store/coverage.ts # WebSocket state
└── services/websocket.ts # WS connection
```
### Desktop (close bug)
```
desktop/main.js # Electron lifecycle
```
---
## 🎯 Recommended Next Steps
### Priority 1: Fix Detailed Performance
**Option A: Aggressive spatial filtering**
```python
# In dominant_path_service.py
# Only check buildings within line-of-sight corridor
# Not all buildings within radius
```
**Option B: LOD (Level of Detail)**
```python
# Distance > 2km: skip dominant path entirely
# Distance 1-2km: simplified model
# Distance < 1km: full calculation
```
**Option C: Building simplification**
```python
# Reduce wall count per building
# Merge adjacent buildings
# Use bounding boxes instead of polygons for far buildings
```
### Priority 2: Fix UI Bugs
- Debug WebSocket in browser DevTools
- Check Electron close handler
### Priority 3: Memory
- Explicit cleanup after calculation
- Check for leaked references
---
## 📝 Session Timeline
1. **Phase 2.4-2.5.1** — Vectorization attempt (didn't help)
2. **Decision** — Full Phase 3.0 architecture refactor
3. **Architecture Doc** — 1719 lines specification
4. **Claude Code Round 1** — 48 files, 82 tests (35 min)
5. **Integration Round** — WebSocket, progress, model selection (20 min)
6. **Bug Fix Round** — Memory, workers, app close (15 min)
7. **Claude Code Fix** — Dominant path early return, Overpass retry, progress (13 min)
8. **Current** — Still timeout, need different approach
---
## 💡 Key Insights
1. **Vectorization alone doesn't help** — problem is algorithmic, not just numpy
2. **SharedMemory works** — terrain in shared memory is efficient
3. **Building count matters** — 351k→15k filtering helps but not enough
4. **dominant_path is the bottleneck** — consistently 200-300ms/point
5. **Standard preset proves architecture works** — fast when less work needed
---
## 🔗 Related Documents
- `/mnt/project/RFCP-Phase-3.0-Architecture-Refactor.md` — Full architecture spec
- `/mnt/project/SESSION-2025-01-30-Iteration-10_1-Complete.md` — Previous session
- `/mnt/transcripts/2026-02-01-19-06-32-phase-3.0-refactor-implementation-results.txt` — Detailed transcript
---
## 🎮 Side Project
During this session, also designed **DF Diplomacy Expanded** mod:
- Design doc: `DF-Diplomacy-Expanded-Design-Doc.md` (1202 lines)
- MVP: War score, peace negotiation, tribute, reputation
- Motto: *"Losing is fun, but sometimes you want to lose diplomatically."*
---
*"Standard preset works beautifully. Detailed preset needs love. The architecture is solid — now we optimize."*