Files
rfcp/RFCP-Session-Summary-2025-02-01.md
mytec defa3ad440 @mytec: feat: Phase 3.0 Architecture Refactor
Major refactoring of RFCP backend:
- Modular propagation models (8 models)
- SharedMemoryManager for terrain data
- ProcessPoolExecutor parallel processing
- WebSocket progress streaming
- Building filtering pipeline (351k → 15k)
- 82 unit tests

Performance: Standard preset 38s → 5s (7.6x speedup)

Known issue: Detailed preset timeout (fix in 3.1.0)
2026-02-01 23:12:26 +02:00

234 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RFCP Development Session Summary
## Date: February 1, 2025 (actually 2026)
## Status: Phase 3.0 Complete, Performance Optimization Ongoing
---
## 🎯 Project Overview
**RFCP (Radio Frequency Coverage Planning)** — desktop application for tactical LTE network planning, part of UMTC (Ukrainian Military Tactical Communications) project.
**Tech Stack:**
- Backend: Python/FastAPI + NumPy + ProcessPoolExecutor
- Frontend: React + TypeScript + Vite
- Desktop: Electron
- Build: PyInstaller (backend), electron-builder (desktop)
**Goal:** Calculate RF coverage maps with terrain, buildings, vegetation analysis.
---
## ✅ What Works (Phase 3.0 Achievements)
### Performance
| Preset | Before | After | Status |
|--------|--------|-------|--------|
| Standard (100-200m res) | 38s | **~5s** | ✅ EXCELLENT |
| Detailed (300m, 5km) | timeout | timeout | ❌ Still broken |
### Architecture (48 new files, 82 tests)
- ✅ Modular propagation models (8 models: FreeSpace, Okumura-Hata, COST-231, ITU-R P.1546, etc.)
- ✅ SharedMemoryManager for terrain data (zero-copy, 25 MB)
- ✅ Building filtering (351k → 27k bbox → 15k cap)
- ✅ WebSocket progress streaming (backend works)
- ✅ Clean model selection by frequency/environment
- ✅ Worker cleanup on shutdown
- ✅ Overpass API retry with failover (3 attempts, mirror endpoint)
### New Files Structure
```
backend/app/
├── propagation/ # 8 model files
├── geometry/ # 5 files (haversine, intersection, reflection, diffraction, los)
├── core/ # 4 files (engine, grid, calculator, result)
├── parallel/ # 3 files (manager, worker, pool)
├── services/ # cache.py, osm_client.py
├── utils/ # logging.py, progress.py, units.py
└── api/websocket.py
frontend/src/
├── hooks/useWebSocket.ts
├── services/websocket.ts
└── components/FrequencyBandPanel.tsx
```
---
## ❌ Current Blockers
### 1. Detailed Preset Timeout (CRITICAL)
**Symptom:** 300s timeout, only 194/868 points calculated
**Latest test results:**
```
[DOMINANT_PATH_VEC] Point #1: buildings=30, walls=214, dist=4887m
302.8ms/point × 868 points = 262 seconds
```
**Root Cause Analysis:**
- Early return fix (Claude Code) was for `buildings=[]` case
- But in reality, buildings ARE present (15,000 after cap)
- Each point finds 17-30 nearby buildings
- Each building has 100-295 wall segments
- **dominant_path_service** geometry calculations are expensive
**The real problem is NOT "buildings=0 is slow"**
**The real problem IS "dominant_path with buildings is inherently slow"**
**Potential solutions:**
1. Simplify building geometry (reduce wall count)
2. Use spatial indexing more aggressively
3. Skip dominant_path for distant points (>3km?)
4. Reduce building query radius
5. Use simpler path loss model when buildings present
6. GPU acceleration (CuPy) for geometry
### 2. Progress Bar Stuck at "Initializing 5%"
**Symptom:** UI shows "Initializing 5%" forever
**Fix attempted:** `await asyncio.sleep(0)` after progress_fn() — not working
**Likely cause:** Frontend WebSocket connection or state update issue
### 3. App Close Broken
**Symptom:** Clicking X kills backend but frontend stays open
**Partial fix:** Worker cleanup works, but Electron window doesn't close
### 4. Memory Not Released
**Symptom:** 1328 MB not freed after calculation
```
Before: 3904 MB free
After: 2576 MB free
```
---
## 📊 Performance Analysis
### Why Detailed is slow (the math):
```
Points: 868
Buildings nearby per point: ~25 average
Walls per building: ~150 average
Wall intersection checks: 868 × 25 × 150 = 3,255,000
At 0.1ms per check = 325 seconds
```
### Why Standard is fast:
- Lower resolution = fewer points (~200 vs 868)
- Likely skips some detailed calculations
- Buildings still processed but fewer points to check
---
## 🔧 Key Files to Review
### Backend (performance critical)
```
backend/app/services/
├── dominant_path_service.py # THE BOTTLENECK
├── coverage_service.py # Orchestration, progress
├── parallel_coverage_service.py # Worker management
└── buildings_service.py # OSM fetch, caching
```
### Frontend (UI bugs)
```
frontend/src/
├── App.tsx # Progress display
├── store/coverage.ts # WebSocket state
└── services/websocket.ts # WS connection
```
### Desktop (close bug)
```
desktop/main.js # Electron lifecycle
```
---
## 🎯 Recommended Next Steps
### Priority 1: Fix Detailed Performance
**Option A: Aggressive spatial filtering**
```python
# In dominant_path_service.py
# Only check buildings within line-of-sight corridor
# Not all buildings within radius
```
**Option B: LOD (Level of Detail)**
```python
# Distance > 2km: skip dominant path entirely
# Distance 1-2km: simplified model
# Distance < 1km: full calculation
```
**Option C: Building simplification**
```python
# Reduce wall count per building
# Merge adjacent buildings
# Use bounding boxes instead of polygons for far buildings
```
### Priority 2: Fix UI Bugs
- Debug WebSocket in browser DevTools
- Check Electron close handler
### Priority 3: Memory
- Explicit cleanup after calculation
- Check for leaked references
---
## 📝 Session Timeline
1. **Phase 2.4-2.5.1** — Vectorization attempt (didn't help)
2. **Decision** — Full Phase 3.0 architecture refactor
3. **Architecture Doc** — 1719 lines specification
4. **Claude Code Round 1** — 48 files, 82 tests (35 min)
5. **Integration Round** — WebSocket, progress, model selection (20 min)
6. **Bug Fix Round** — Memory, workers, app close (15 min)
7. **Claude Code Fix** — Dominant path early return, Overpass retry, progress (13 min)
8. **Current** — Still timeout, need different approach
---
## 💡 Key Insights
1. **Vectorization alone doesn't help** — problem is algorithmic, not just numpy
2. **SharedMemory works** — terrain in shared memory is efficient
3. **Building count matters** — 351k→15k filtering helps but not enough
4. **dominant_path is the bottleneck** — consistently 200-300ms/point
5. **Standard preset proves architecture works** — fast when less work needed
---
## 🔗 Related Documents
- `/mnt/project/RFCP-Phase-3.0-Architecture-Refactor.md` — Full architecture spec
- `/mnt/project/SESSION-2025-01-30-Iteration-10_1-Complete.md` — Previous session
- `/mnt/transcripts/2026-02-01-19-06-32-phase-3.0-refactor-implementation-results.txt` — Detailed transcript
---
## 🎮 Side Project
During this session, also designed **DF Diplomacy Expanded** mod:
- Design doc: `DF-Diplomacy-Expanded-Design-Doc.md` (1202 lines)
- MVP: War score, peace negotiation, tribute, reputation
- Motto: *"Losing is fun, but sometimes you want to lose diplomatically."*
---
*"Standard preset works beautifully. Detailed preset needs love. The architecture is solid — now we optimize."*