550 lines
14 KiB
Markdown
550 lines
14 KiB
Markdown
# RFCP Phase 2.4.1: Critical Fixes
|
||
|
||
**Date:** February 1, 2025
|
||
**Type:** Bug Fixes + Performance
|
||
**Priority:** CRITICAL
|
||
**Depends on:** Phase 2.4
|
||
|
||
---
|
||
|
||
## 🎯 Goals
|
||
|
||
1. Fix memory leak — worker processes not terminating
|
||
2. Fix app close — хрестик не вбиває backend
|
||
3. Optimize dominant path — 600 buildings занадто багато
|
||
4. Fix GPU detection in PyInstaller build
|
||
|
||
---
|
||
|
||
## 🔴 Bug 2.4.1a: Memory Leak — Workers Not Terminated
|
||
|
||
**Symptoms:**
|
||
- After timeout (300s), worker processes stay running
|
||
- Task Manager shows 8× rfcp-server.exe using 7.8 GB RAM
|
||
- CPU stays loaded after calculation cancelled
|
||
- Only manual kill or reboot frees resources
|
||
|
||
**Root cause:**
|
||
ProcessPoolExecutor workers ignore cancellation token — they're separate processes that don't share memory with main process.
|
||
|
||
**File:** `backend/app/services/parallel_coverage_service.py`
|
||
|
||
**Fix:**
|
||
|
||
```python
|
||
import psutil
|
||
import os
|
||
|
||
def _kill_worker_processes():
|
||
"""Kill all child processes of current process."""
|
||
current = psutil.Process(os.getpid())
|
||
children = current.children(recursive=True)
|
||
|
||
for child in children:
|
||
try:
|
||
child.terminate()
|
||
except psutil.NoSuchProcess:
|
||
pass
|
||
|
||
# Wait briefly, then force kill survivors
|
||
gone, alive = psutil.wait_procs(children, timeout=3)
|
||
for p in alive:
|
||
try:
|
||
p.kill()
|
||
except psutil.NoSuchProcess:
|
||
pass
|
||
|
||
return len(children)
|
||
|
||
|
||
async def _calculate_with_process_pool(..., cancel_token=None):
|
||
"""ProcessPool calculation with proper cleanup."""
|
||
pool = None
|
||
try:
|
||
pool = ProcessPoolExecutor(max_workers=num_workers)
|
||
futures = [pool.submit(_process_chunk, chunk, ...) for chunk in chunks]
|
||
|
||
for future in as_completed(futures):
|
||
if cancel_token and cancel_token.is_cancelled():
|
||
_clog("Cancellation detected — terminating pool")
|
||
break
|
||
|
||
result = future.result(timeout=1)
|
||
results.extend(result)
|
||
|
||
except Exception as e:
|
||
_clog(f"ProcessPool error: {e}")
|
||
|
||
finally:
|
||
# CRITICAL: Always cleanup
|
||
if pool:
|
||
pool.shutdown(wait=False, cancel_futures=True)
|
||
|
||
# Kill any orphaned workers
|
||
killed = _kill_worker_processes()
|
||
if killed > 0:
|
||
_clog(f"Killed {killed} orphaned worker processes")
|
||
|
||
return results
|
||
```
|
||
|
||
**Also add cleanup on timeout in coverage.py:**
|
||
|
||
```python
|
||
@router.post("/calculate")
|
||
async def calculate_coverage(request: CoverageRequest):
|
||
cancel_token = CancellationToken()
|
||
|
||
try:
|
||
result = await asyncio.wait_for(
|
||
coverage_service.calculate_coverage(
|
||
sites=request.sites,
|
||
settings=request.settings,
|
||
cancel_token=cancel_token
|
||
),
|
||
timeout=300
|
||
)
|
||
return result
|
||
|
||
except asyncio.TimeoutError:
|
||
cancel_token.cancel()
|
||
|
||
# Force cleanup
|
||
from app.services.parallel_coverage_service import _kill_worker_processes
|
||
killed = _kill_worker_processes()
|
||
|
||
raise HTTPException(
|
||
status_code=408,
|
||
detail=f"Calculation timeout (5 min). Cleaned up {killed} workers."
|
||
)
|
||
```
|
||
|
||
**Add psutil to requirements.txt:**
|
||
```
|
||
psutil>=5.9.0
|
||
```
|
||
|
||
---
|
||
|
||
## 🔴 Bug 2.4.1b: App Close Not Working
|
||
|
||
**Symptoms:**
|
||
- Clicking X closes window but processes stay
|
||
- Multiple rfcp-server.exe in Task Manager
|
||
- Have to manually End Task
|
||
|
||
**Root cause:**
|
||
- Electron's killBackend() uses PID that may be wrong
|
||
- Child processes (workers) not killed
|
||
- taskkill may fail silently
|
||
|
||
**File:** `desktop/main.js`
|
||
|
||
**Fix — nuclear option (kill ALL rfcp-server.exe):**
|
||
|
||
```javascript
|
||
const { execSync } = require('child_process');
|
||
|
||
function killAllBackendProcesses() {
|
||
console.log('[KILL] Killing all rfcp-server processes...');
|
||
|
||
if (process.platform === 'win32') {
|
||
try {
|
||
// Kill ALL rfcp-server.exe processes
|
||
execSync('taskkill /F /IM rfcp-server.exe /T', {
|
||
stdio: 'ignore',
|
||
timeout: 5000
|
||
});
|
||
console.log('[KILL] taskkill completed');
|
||
} catch (e) {
|
||
// Error means no processes found — that's OK
|
||
console.log('[KILL] No rfcp-server processes found or already killed');
|
||
}
|
||
} else {
|
||
// Unix: pkill
|
||
try {
|
||
execSync('pkill -9 -f rfcp-server', {
|
||
stdio: 'ignore',
|
||
timeout: 5000
|
||
});
|
||
} catch (e) {
|
||
console.log('[KILL] No rfcp-server processes found');
|
||
}
|
||
}
|
||
}
|
||
|
||
// Replace killBackend() calls with killAllBackendProcesses()
|
||
|
||
// In close handler:
|
||
mainWindow.on('close', (event) => {
|
||
console.log('[CLOSE] Window close event');
|
||
killAllBackendProcesses();
|
||
});
|
||
|
||
// In app quit handlers:
|
||
app.on('before-quit', () => {
|
||
console.log('[QUIT] before-quit');
|
||
killAllBackendProcesses();
|
||
});
|
||
|
||
app.on('will-quit', () => {
|
||
console.log('[QUIT] will-quit');
|
||
killAllBackendProcesses();
|
||
});
|
||
|
||
// Last resort:
|
||
process.on('exit', () => {
|
||
console.log('[EXIT] process.exit');
|
||
killAllBackendProcesses();
|
||
});
|
||
|
||
// Also add SIGINT/SIGTERM handlers:
|
||
process.on('SIGINT', () => {
|
||
console.log('[SIGNAL] SIGINT received');
|
||
killAllBackendProcesses();
|
||
process.exit(0);
|
||
});
|
||
|
||
process.on('SIGTERM', () => {
|
||
console.log('[SIGNAL] SIGTERM received');
|
||
killAllBackendProcesses();
|
||
process.exit(0);
|
||
});
|
||
```
|
||
|
||
**Also add cleanup endpoint to backend:**
|
||
|
||
```python
|
||
# backend/app/api/routes/system.py
|
||
|
||
@router.post("/shutdown")
|
||
async def shutdown():
|
||
"""Graceful shutdown endpoint."""
|
||
from app.services.parallel_coverage_service import _kill_worker_processes
|
||
|
||
killed = _kill_worker_processes()
|
||
|
||
# Schedule shutdown
|
||
import asyncio
|
||
asyncio.get_event_loop().call_later(0.5, lambda: os._exit(0))
|
||
|
||
return {"status": "shutting down", "workers_killed": killed}
|
||
```
|
||
|
||
**Call shutdown from Electron before killing:**
|
||
|
||
```javascript
|
||
async function gracefulShutdown() {
|
||
console.log('[SHUTDOWN] Requesting graceful shutdown...');
|
||
|
||
try {
|
||
await fetch('http://127.0.0.1:8888/api/system/shutdown', {
|
||
method: 'POST',
|
||
timeout: 2000
|
||
});
|
||
console.log('[SHUTDOWN] Backend acknowledged');
|
||
} catch (e) {
|
||
console.log('[SHUTDOWN] Backend did not respond, force killing');
|
||
}
|
||
|
||
// Wait a moment, then force kill
|
||
await new Promise(r => setTimeout(r, 500));
|
||
killAllBackendProcesses();
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🟡 Bug 2.4.1c: Dominant Path Too Slow
|
||
|
||
**Symptoms:**
|
||
- Detailed preset: 351ms/point (should be <50ms)
|
||
- Log shows: `line_bldgs=646, refl_bldgs=302`
|
||
- 600+ buildings checked per point = too slow
|
||
|
||
**Root cause:**
|
||
Spatial index returns all buildings on path, no distance limit.
|
||
|
||
**File:** `backend/app/services/dominant_path_service.py`
|
||
|
||
**Fix — limit buildings by distance:**
|
||
|
||
```python
|
||
# At the start of find_dominant_paths() or equivalent:
|
||
|
||
MAX_BUILDINGS_FOR_REFLECTION = 100
|
||
MAX_DISTANCE_FROM_PATH = 500 # meters
|
||
|
||
def _filter_buildings_by_distance(buildings, tx_point, rx_point, max_count=100, max_distance=500):
|
||
"""
|
||
Filter buildings to only those close to the TX-RX path.
|
||
Sort by distance to path midpoint, take top N.
|
||
"""
|
||
if len(buildings) <= max_count:
|
||
return buildings
|
||
|
||
# Calculate midpoint
|
||
mid_lat = (tx_point[0] + rx_point[0]) / 2
|
||
mid_lon = (tx_point[1] + rx_point[1]) / 2
|
||
|
||
# Calculate distance to midpoint for each building
|
||
def distance_to_midpoint(building):
|
||
blat = building.get('centroid_lat', building.get('lat', mid_lat))
|
||
blon = building.get('centroid_lon', building.get('lon', mid_lon))
|
||
# Simple Euclidean approximation (fast)
|
||
dlat = (blat - mid_lat) * 111000
|
||
dlon = (blon - mid_lon) * 111000 * 0.7 # rough cos correction
|
||
return dlat*dlat + dlon*dlon # squared distance, no sqrt needed
|
||
|
||
# Sort by distance
|
||
buildings_with_dist = [(b, distance_to_midpoint(b)) for b in buildings]
|
||
buildings_with_dist.sort(key=lambda x: x[1])
|
||
|
||
# Filter by max distance (squared)
|
||
max_dist_sq = max_distance * max_distance
|
||
filtered = [b for b, d in buildings_with_dist if d <= max_dist_sq]
|
||
|
||
# Take top N
|
||
return filtered[:max_count]
|
||
|
||
|
||
# In the main calculation:
|
||
def calculate_dominant_path(tx, rx, buildings, spatial_idx, ...):
|
||
# Get buildings from spatial index
|
||
line_buildings = spatial_idx.query_line(tx.lat, tx.lon, rx.lat, rx.lon)
|
||
|
||
# FILTER to reduce count
|
||
line_buildings = _filter_buildings_by_distance(
|
||
line_buildings,
|
||
(tx.lat, tx.lon),
|
||
(rx.lat, rx.lon),
|
||
max_count=MAX_BUILDINGS_FOR_REFLECTION,
|
||
max_distance=MAX_DISTANCE_FROM_PATH
|
||
)
|
||
|
||
# Same for reflection buildings
|
||
refl_buildings = spatial_idx.query_point(mid_lat, mid_lon, buffer_cells=5)
|
||
refl_buildings = _filter_buildings_by_distance(
|
||
refl_buildings,
|
||
(tx.lat, tx.lon),
|
||
(rx.lat, rx.lon),
|
||
max_count=MAX_BUILDINGS_FOR_REFLECTION,
|
||
max_distance=MAX_DISTANCE_FROM_PATH
|
||
)
|
||
|
||
_clog(f"Filtered: {len(line_buildings)} line, {len(refl_buildings)} refl (from {original_count})")
|
||
|
||
# Continue with calculation...
|
||
```
|
||
|
||
**Expected improvement:**
|
||
- Before: 600-700 buildings → 351ms/point
|
||
- After: 100 buildings → ~50ms/point
|
||
- **7x speedup** for Detailed preset
|
||
|
||
---
|
||
|
||
## 🟡 Bug 2.4.1d: GPU Not Detected in PyInstaller
|
||
|
||
**Symptoms:**
|
||
```json
|
||
"gpu": {"available": false, "name": null, "memory_mb": null}
|
||
```
|
||
- RTX 4060 present but not detected
|
||
- CuPy not bundled in exe
|
||
|
||
**Root cause:**
|
||
CuPy has complex CUDA dependencies that PyInstaller doesn't auto-detect.
|
||
|
||
**Option A: Document manual CuPy install (RECOMMENDED for now)**
|
||
|
||
```python
|
||
# backend/app/services/gpu_service.py
|
||
|
||
# At module level, add clear message:
|
||
GPU_INSTALL_INSTRUCTIONS = """
|
||
GPU acceleration requires manual CuPy installation:
|
||
|
||
1. Check your CUDA version:
|
||
nvidia-smi
|
||
|
||
2. Install matching CuPy:
|
||
# For CUDA 12.x:
|
||
pip install cupy-cuda12x
|
||
|
||
# For CUDA 11.x:
|
||
pip install cupy-cuda11x
|
||
|
||
3. Restart RFCP
|
||
|
||
Note: GPU is optional. CPU calculations work fine.
|
||
"""
|
||
|
||
try:
|
||
import cupy as cp
|
||
# ... detection code ...
|
||
except ImportError:
|
||
print("[GPU] CuPy not installed — using CPU/NumPy")
|
||
print("[GPU] To enable GPU acceleration:")
|
||
print("[GPU] pip install cupy-cuda12x")
|
||
```
|
||
|
||
**Option B: Add CuPy to PyInstaller (complex, large file size)**
|
||
|
||
If we want CuPy bundled, add to `installer/rfcp-server.spec`:
|
||
|
||
```python
|
||
hiddenimports=[
|
||
# ... existing ...
|
||
'cupy',
|
||
'cupy._core',
|
||
'cupy._core._kernel',
|
||
'cupy.cuda',
|
||
'cupy.cuda.runtime',
|
||
'cupy.cuda.driver',
|
||
# Many more cupy submodules...
|
||
],
|
||
|
||
# Also need to include CUDA DLLs
|
||
binaries=[
|
||
# This gets complicated — need CUDA toolkit DLLs
|
||
],
|
||
```
|
||
|
||
**Recommendation:** Start with Option A (manual install), add Option B later if needed. GPU is nice-to-have, not critical.
|
||
|
||
---
|
||
|
||
## 🟢 Feature 2.4.1e: Elevation Layer Toggle
|
||
|
||
**Current state:**
|
||
- Elevation layer exists but always visible as green overlay
|
||
- No toggle in UI
|
||
|
||
**File:** `frontend/src/App.tsx` or settings panel
|
||
|
||
**Fix:**
|
||
|
||
```tsx
|
||
// In settings/layer controls section:
|
||
|
||
<div className="layer-section">
|
||
<h4>Map Layers</h4>
|
||
|
||
<label className="toggle-row">
|
||
<input
|
||
type="checkbox"
|
||
checked={showElevation}
|
||
onChange={(e) => setShowElevation(e.target.checked)}
|
||
/>
|
||
<span>Elevation Colors</span>
|
||
</label>
|
||
|
||
{showElevation && (
|
||
<div className="slider-row">
|
||
<label>Opacity</label>
|
||
<input
|
||
type="range"
|
||
min="0.2"
|
||
max="0.8"
|
||
step="0.1"
|
||
value={elevationOpacity}
|
||
onChange={(e) => setElevationOpacity(parseFloat(e.target.value))}
|
||
/>
|
||
<span>{Math.round(elevationOpacity * 100)}%</span>
|
||
</div>
|
||
)}
|
||
</div>
|
||
```
|
||
|
||
**In Map.tsx:**
|
||
|
||
```tsx
|
||
{showElevation && (
|
||
<ElevationLayer
|
||
opacity={elevationOpacity}
|
||
bounds={mapBounds}
|
||
/>
|
||
)}
|
||
```
|
||
|
||
---
|
||
|
||
## 📁 Files to Modify
|
||
|
||
| File | Changes |
|
||
|------|---------|
|
||
| `backend/app/services/parallel_coverage_service.py` | Add `_kill_worker_processes()`, cleanup in finally block |
|
||
| `backend/app/api/routes/coverage.py` | Force cleanup on timeout |
|
||
| `backend/app/api/routes/system.py` | Add `/shutdown` endpoint |
|
||
| `backend/requirements.txt` | Add `psutil>=5.9.0` |
|
||
| `desktop/main.js` | Replace `killBackend()` with `killAllBackendProcesses()`, add graceful shutdown |
|
||
| `backend/app/services/dominant_path_service.py` | Add `_filter_buildings_by_distance()`, limit to 100 buildings |
|
||
| `backend/app/services/gpu_service.py` | Add install instructions in log |
|
||
| `frontend/src/App.tsx` | Add elevation toggle if missing |
|
||
| `frontend/src/components/map/Map.tsx` | Conditional elevation layer |
|
||
|
||
---
|
||
|
||
## 🧪 Testing
|
||
|
||
### Test 1: Memory Cleanup
|
||
```bash
|
||
# Start app
|
||
# Run Detailed preset (will timeout)
|
||
# Check Task Manager — should be only 1 rfcp-server.exe after timeout
|
||
# RAM should return to normal (~200MB)
|
||
```
|
||
|
||
### Test 2: App Close
|
||
```bash
|
||
# Start RFCP via installer (not debug)
|
||
# Click X
|
||
# Check Task Manager — NO rfcp-server.exe should remain
|
||
```
|
||
|
||
### Test 3: Dominant Path Speed
|
||
```bash
|
||
# Run test-coverage.bat
|
||
# Detailed preset should complete in <120s (was 300s timeout)
|
||
# Log should show "Filtered: 100 line, 100 refl"
|
||
```
|
||
|
||
### Test 4: Elevation Toggle
|
||
```bash
|
||
# Open app
|
||
# Find elevation checkbox in settings
|
||
# Toggle on/off — layer should appear/disappear
|
||
# Adjust opacity — should change transparency
|
||
```
|
||
|
||
---
|
||
|
||
## ✅ Success Criteria
|
||
|
||
- [ ] After timeout: only 1 rfcp-server.exe, RAM < 500MB
|
||
- [ ] After close (X): 0 rfcp-server.exe processes
|
||
- [ ] Detailed preset: completes in <120s (not timeout)
|
||
- [ ] Detailed preset: ~50ms/point (not 350ms)
|
||
- [ ] Elevation layer: toggleable on/off
|
||
- [ ] GPU message: clear install instructions in console
|
||
|
||
---
|
||
|
||
## 📈 Expected Performance After Fixes
|
||
|
||
| Preset | Before | After | Improvement |
|
||
|--------|--------|-------|-------------|
|
||
| Fast | 0.03s | 0.03s | — |
|
||
| Standard | 37s | 35s | 1.1x |
|
||
| Detailed | 300s (timeout) | ~90s | 3x+ |
|
||
|
||
---
|
||
|
||
## 🔜 Next Steps
|
||
|
||
After 2.4.1 fixes:
|
||
- Phase 2.5: Fun facts loading screen
|
||
- Phase 2.5: LOS visualization (rays showing blocked paths)
|
||
- Phase 2.6: Multi-site support improvements
|