@mytec: iter2.4.1 start

This commit is contained in:
2026-02-01 11:14:04 +02:00
parent 5488633e43
commit fa7378cf3f
4 changed files with 552 additions and 0 deletions

View File

@@ -0,0 +1,549 @@
# RFCP Phase 2.4.1: Critical Fixes
**Date:** February 1, 2025
**Type:** Bug Fixes + Performance
**Priority:** CRITICAL
**Depends on:** Phase 2.4
---
## 🎯 Goals
1. Fix memory leak — worker processes not terminating
2. Fix app close — хрестик не вбиває backend
3. Optimize dominant path — 600 buildings занадто багато
4. Fix GPU detection in PyInstaller build
---
## 🔴 Bug 2.4.1a: Memory Leak — Workers Not Terminated
**Symptoms:**
- After timeout (300s), worker processes stay running
- Task Manager shows 8× rfcp-server.exe using 7.8 GB RAM
- CPU stays loaded after calculation cancelled
- Only manual kill or reboot frees resources
**Root cause:**
ProcessPoolExecutor workers ignore cancellation token — they're separate processes that don't share memory with main process.
**File:** `backend/app/services/parallel_coverage_service.py`
**Fix:**
```python
import psutil
import os
def _kill_worker_processes():
"""Kill all child processes of current process."""
current = psutil.Process(os.getpid())
children = current.children(recursive=True)
for child in children:
try:
child.terminate()
except psutil.NoSuchProcess:
pass
# Wait briefly, then force kill survivors
gone, alive = psutil.wait_procs(children, timeout=3)
for p in alive:
try:
p.kill()
except psutil.NoSuchProcess:
pass
return len(children)
async def _calculate_with_process_pool(..., cancel_token=None):
"""ProcessPool calculation with proper cleanup."""
pool = None
try:
pool = ProcessPoolExecutor(max_workers=num_workers)
futures = [pool.submit(_process_chunk, chunk, ...) for chunk in chunks]
for future in as_completed(futures):
if cancel_token and cancel_token.is_cancelled():
_clog("Cancellation detected — terminating pool")
break
result = future.result(timeout=1)
results.extend(result)
except Exception as e:
_clog(f"ProcessPool error: {e}")
finally:
# CRITICAL: Always cleanup
if pool:
pool.shutdown(wait=False, cancel_futures=True)
# Kill any orphaned workers
killed = _kill_worker_processes()
if killed > 0:
_clog(f"Killed {killed} orphaned worker processes")
return results
```
**Also add cleanup on timeout in coverage.py:**
```python
@router.post("/calculate")
async def calculate_coverage(request: CoverageRequest):
cancel_token = CancellationToken()
try:
result = await asyncio.wait_for(
coverage_service.calculate_coverage(
sites=request.sites,
settings=request.settings,
cancel_token=cancel_token
),
timeout=300
)
return result
except asyncio.TimeoutError:
cancel_token.cancel()
# Force cleanup
from app.services.parallel_coverage_service import _kill_worker_processes
killed = _kill_worker_processes()
raise HTTPException(
status_code=408,
detail=f"Calculation timeout (5 min). Cleaned up {killed} workers."
)
```
**Add psutil to requirements.txt:**
```
psutil>=5.9.0
```
---
## 🔴 Bug 2.4.1b: App Close Not Working
**Symptoms:**
- Clicking X closes window but processes stay
- Multiple rfcp-server.exe in Task Manager
- Have to manually End Task
**Root cause:**
- Electron's killBackend() uses PID that may be wrong
- Child processes (workers) not killed
- taskkill may fail silently
**File:** `desktop/main.js`
**Fix — nuclear option (kill ALL rfcp-server.exe):**
```javascript
const { execSync } = require('child_process');
function killAllBackendProcesses() {
console.log('[KILL] Killing all rfcp-server processes...');
if (process.platform === 'win32') {
try {
// Kill ALL rfcp-server.exe processes
execSync('taskkill /F /IM rfcp-server.exe /T', {
stdio: 'ignore',
timeout: 5000
});
console.log('[KILL] taskkill completed');
} catch (e) {
// Error means no processes found — that's OK
console.log('[KILL] No rfcp-server processes found or already killed');
}
} else {
// Unix: pkill
try {
execSync('pkill -9 -f rfcp-server', {
stdio: 'ignore',
timeout: 5000
});
} catch (e) {
console.log('[KILL] No rfcp-server processes found');
}
}
}
// Replace killBackend() calls with killAllBackendProcesses()
// In close handler:
mainWindow.on('close', (event) => {
console.log('[CLOSE] Window close event');
killAllBackendProcesses();
});
// In app quit handlers:
app.on('before-quit', () => {
console.log('[QUIT] before-quit');
killAllBackendProcesses();
});
app.on('will-quit', () => {
console.log('[QUIT] will-quit');
killAllBackendProcesses();
});
// Last resort:
process.on('exit', () => {
console.log('[EXIT] process.exit');
killAllBackendProcesses();
});
// Also add SIGINT/SIGTERM handlers:
process.on('SIGINT', () => {
console.log('[SIGNAL] SIGINT received');
killAllBackendProcesses();
process.exit(0);
});
process.on('SIGTERM', () => {
console.log('[SIGNAL] SIGTERM received');
killAllBackendProcesses();
process.exit(0);
});
```
**Also add cleanup endpoint to backend:**
```python
# backend/app/api/routes/system.py
@router.post("/shutdown")
async def shutdown():
"""Graceful shutdown endpoint."""
from app.services.parallel_coverage_service import _kill_worker_processes
killed = _kill_worker_processes()
# Schedule shutdown
import asyncio
asyncio.get_event_loop().call_later(0.5, lambda: os._exit(0))
return {"status": "shutting down", "workers_killed": killed}
```
**Call shutdown from Electron before killing:**
```javascript
async function gracefulShutdown() {
console.log('[SHUTDOWN] Requesting graceful shutdown...');
try {
await fetch('http://127.0.0.1:8888/api/system/shutdown', {
method: 'POST',
timeout: 2000
});
console.log('[SHUTDOWN] Backend acknowledged');
} catch (e) {
console.log('[SHUTDOWN] Backend did not respond, force killing');
}
// Wait a moment, then force kill
await new Promise(r => setTimeout(r, 500));
killAllBackendProcesses();
}
```
---
## 🟡 Bug 2.4.1c: Dominant Path Too Slow
**Symptoms:**
- Detailed preset: 351ms/point (should be <50ms)
- Log shows: `line_bldgs=646, refl_bldgs=302`
- 600+ buildings checked per point = too slow
**Root cause:**
Spatial index returns all buildings on path, no distance limit.
**File:** `backend/app/services/dominant_path_service.py`
**Fix — limit buildings by distance:**
```python
# At the start of find_dominant_paths() or equivalent:
MAX_BUILDINGS_FOR_REFLECTION = 100
MAX_DISTANCE_FROM_PATH = 500 # meters
def _filter_buildings_by_distance(buildings, tx_point, rx_point, max_count=100, max_distance=500):
"""
Filter buildings to only those close to the TX-RX path.
Sort by distance to path midpoint, take top N.
"""
if len(buildings) <= max_count:
return buildings
# Calculate midpoint
mid_lat = (tx_point[0] + rx_point[0]) / 2
mid_lon = (tx_point[1] + rx_point[1]) / 2
# Calculate distance to midpoint for each building
def distance_to_midpoint(building):
blat = building.get('centroid_lat', building.get('lat', mid_lat))
blon = building.get('centroid_lon', building.get('lon', mid_lon))
# Simple Euclidean approximation (fast)
dlat = (blat - mid_lat) * 111000
dlon = (blon - mid_lon) * 111000 * 0.7 # rough cos correction
return dlat*dlat + dlon*dlon # squared distance, no sqrt needed
# Sort by distance
buildings_with_dist = [(b, distance_to_midpoint(b)) for b in buildings]
buildings_with_dist.sort(key=lambda x: x[1])
# Filter by max distance (squared)
max_dist_sq = max_distance * max_distance
filtered = [b for b, d in buildings_with_dist if d <= max_dist_sq]
# Take top N
return filtered[:max_count]
# In the main calculation:
def calculate_dominant_path(tx, rx, buildings, spatial_idx, ...):
# Get buildings from spatial index
line_buildings = spatial_idx.query_line(tx.lat, tx.lon, rx.lat, rx.lon)
# FILTER to reduce count
line_buildings = _filter_buildings_by_distance(
line_buildings,
(tx.lat, tx.lon),
(rx.lat, rx.lon),
max_count=MAX_BUILDINGS_FOR_REFLECTION,
max_distance=MAX_DISTANCE_FROM_PATH
)
# Same for reflection buildings
refl_buildings = spatial_idx.query_point(mid_lat, mid_lon, buffer_cells=5)
refl_buildings = _filter_buildings_by_distance(
refl_buildings,
(tx.lat, tx.lon),
(rx.lat, rx.lon),
max_count=MAX_BUILDINGS_FOR_REFLECTION,
max_distance=MAX_DISTANCE_FROM_PATH
)
_clog(f"Filtered: {len(line_buildings)} line, {len(refl_buildings)} refl (from {original_count})")
# Continue with calculation...
```
**Expected improvement:**
- Before: 600-700 buildings → 351ms/point
- After: 100 buildings → ~50ms/point
- **7x speedup** for Detailed preset
---
## 🟡 Bug 2.4.1d: GPU Not Detected in PyInstaller
**Symptoms:**
```json
"gpu": {"available": false, "name": null, "memory_mb": null}
```
- RTX 4060 present but not detected
- CuPy not bundled in exe
**Root cause:**
CuPy has complex CUDA dependencies that PyInstaller doesn't auto-detect.
**Option A: Document manual CuPy install (RECOMMENDED for now)**
```python
# backend/app/services/gpu_service.py
# At module level, add clear message:
GPU_INSTALL_INSTRUCTIONS = """
GPU acceleration requires manual CuPy installation:
1. Check your CUDA version:
nvidia-smi
2. Install matching CuPy:
# For CUDA 12.x:
pip install cupy-cuda12x
# For CUDA 11.x:
pip install cupy-cuda11x
3. Restart RFCP
Note: GPU is optional. CPU calculations work fine.
"""
try:
import cupy as cp
# ... detection code ...
except ImportError:
print("[GPU] CuPy not installed — using CPU/NumPy")
print("[GPU] To enable GPU acceleration:")
print("[GPU] pip install cupy-cuda12x")
```
**Option B: Add CuPy to PyInstaller (complex, large file size)**
If we want CuPy bundled, add to `installer/rfcp-server.spec`:
```python
hiddenimports=[
# ... existing ...
'cupy',
'cupy._core',
'cupy._core._kernel',
'cupy.cuda',
'cupy.cuda.runtime',
'cupy.cuda.driver',
# Many more cupy submodules...
],
# Also need to include CUDA DLLs
binaries=[
# This gets complicated — need CUDA toolkit DLLs
],
```
**Recommendation:** Start with Option A (manual install), add Option B later if needed. GPU is nice-to-have, not critical.
---
## 🟢 Feature 2.4.1e: Elevation Layer Toggle
**Current state:**
- Elevation layer exists but always visible as green overlay
- No toggle in UI
**File:** `frontend/src/App.tsx` or settings panel
**Fix:**
```tsx
// In settings/layer controls section:
<div className="layer-section">
<h4>Map Layers</h4>
<label className="toggle-row">
<input
type="checkbox"
checked={showElevation}
onChange={(e) => setShowElevation(e.target.checked)}
/>
<span>Elevation Colors</span>
</label>
{showElevation && (
<div className="slider-row">
<label>Opacity</label>
<input
type="range"
min="0.2"
max="0.8"
step="0.1"
value={elevationOpacity}
onChange={(e) => setElevationOpacity(parseFloat(e.target.value))}
/>
<span>{Math.round(elevationOpacity * 100)}%</span>
</div>
)}
</div>
```
**In Map.tsx:**
```tsx
{showElevation && (
<ElevationLayer
opacity={elevationOpacity}
bounds={mapBounds}
/>
)}
```
---
## 📁 Files to Modify
| File | Changes |
|------|---------|
| `backend/app/services/parallel_coverage_service.py` | Add `_kill_worker_processes()`, cleanup in finally block |
| `backend/app/api/routes/coverage.py` | Force cleanup on timeout |
| `backend/app/api/routes/system.py` | Add `/shutdown` endpoint |
| `backend/requirements.txt` | Add `psutil>=5.9.0` |
| `desktop/main.js` | Replace `killBackend()` with `killAllBackendProcesses()`, add graceful shutdown |
| `backend/app/services/dominant_path_service.py` | Add `_filter_buildings_by_distance()`, limit to 100 buildings |
| `backend/app/services/gpu_service.py` | Add install instructions in log |
| `frontend/src/App.tsx` | Add elevation toggle if missing |
| `frontend/src/components/map/Map.tsx` | Conditional elevation layer |
---
## 🧪 Testing
### Test 1: Memory Cleanup
```bash
# Start app
# Run Detailed preset (will timeout)
# Check Task Manager — should be only 1 rfcp-server.exe after timeout
# RAM should return to normal (~200MB)
```
### Test 2: App Close
```bash
# Start RFCP via installer (not debug)
# Click X
# Check Task Manager — NO rfcp-server.exe should remain
```
### Test 3: Dominant Path Speed
```bash
# Run test-coverage.bat
# Detailed preset should complete in <120s (was 300s timeout)
# Log should show "Filtered: 100 line, 100 refl"
```
### Test 4: Elevation Toggle
```bash
# Open app
# Find elevation checkbox in settings
# Toggle on/off — layer should appear/disappear
# Adjust opacity — should change transparency
```
---
## ✅ Success Criteria
- [ ] After timeout: only 1 rfcp-server.exe, RAM < 500MB
- [ ] After close (X): 0 rfcp-server.exe processes
- [ ] Detailed preset: completes in <120s (not timeout)
- [ ] Detailed preset: ~50ms/point (not 350ms)
- [ ] Elevation layer: toggleable on/off
- [ ] GPU message: clear install instructions in console
---
## 📈 Expected Performance After Fixes
| Preset | Before | After | Improvement |
|--------|--------|-------|-------------|
| Fast | 0.03s | 0.03s | — |
| Standard | 37s | 35s | 1.1x |
| Detailed | 300s (timeout) | ~90s | 3x+ |
---
## 🔜 Next Steps
After 2.4.1 fixes:
- Phase 2.5: Fun facts loading screen
- Phase 2.5: LOS visualization (rays showing blocked paths)
- Phase 2.6: Multi-site support improvements

View File

@@ -0,0 +1 @@
{"detail":"Calculation timeout (5 min) — try smaller radius or lower resolution"}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long