# RFCP Phase 2.4.1: Critical Fixes **Date:** February 1, 2025 **Type:** Bug Fixes + Performance **Priority:** CRITICAL **Depends on:** Phase 2.4 --- ## 🎯 Goals 1. Fix memory leak β€” worker processes not terminating 2. Fix app close β€” хрСстик Π½Π΅ Π²Π±ΠΈΠ²Π°Ρ” backend 3. Optimize dominant path β€” 600 buildings Π·Π°Π½Π°Π΄Ρ‚ΠΎ Π±Π°Π³Π°Ρ‚ΠΎ 4. Fix GPU detection in PyInstaller build --- ## πŸ”΄ Bug 2.4.1a: Memory Leak β€” Workers Not Terminated **Symptoms:** - After timeout (300s), worker processes stay running - Task Manager shows 8Γ— rfcp-server.exe using 7.8 GB RAM - CPU stays loaded after calculation cancelled - Only manual kill or reboot frees resources **Root cause:** ProcessPoolExecutor workers ignore cancellation token β€” they're separate processes that don't share memory with main process. **File:** `backend/app/services/parallel_coverage_service.py` **Fix:** ```python import psutil import os def _kill_worker_processes(): """Kill all child processes of current process.""" current = psutil.Process(os.getpid()) children = current.children(recursive=True) for child in children: try: child.terminate() except psutil.NoSuchProcess: pass # Wait briefly, then force kill survivors gone, alive = psutil.wait_procs(children, timeout=3) for p in alive: try: p.kill() except psutil.NoSuchProcess: pass return len(children) async def _calculate_with_process_pool(..., cancel_token=None): """ProcessPool calculation with proper cleanup.""" pool = None try: pool = ProcessPoolExecutor(max_workers=num_workers) futures = [pool.submit(_process_chunk, chunk, ...) for chunk in chunks] for future in as_completed(futures): if cancel_token and cancel_token.is_cancelled(): _clog("Cancellation detected β€” terminating pool") break result = future.result(timeout=1) results.extend(result) except Exception as e: _clog(f"ProcessPool error: {e}") finally: # CRITICAL: Always cleanup if pool: pool.shutdown(wait=False, cancel_futures=True) # Kill any orphaned workers killed = _kill_worker_processes() if killed > 0: _clog(f"Killed {killed} orphaned worker processes") return results ``` **Also add cleanup on timeout in coverage.py:** ```python @router.post("/calculate") async def calculate_coverage(request: CoverageRequest): cancel_token = CancellationToken() try: result = await asyncio.wait_for( coverage_service.calculate_coverage( sites=request.sites, settings=request.settings, cancel_token=cancel_token ), timeout=300 ) return result except asyncio.TimeoutError: cancel_token.cancel() # Force cleanup from app.services.parallel_coverage_service import _kill_worker_processes killed = _kill_worker_processes() raise HTTPException( status_code=408, detail=f"Calculation timeout (5 min). Cleaned up {killed} workers." ) ``` **Add psutil to requirements.txt:** ``` psutil>=5.9.0 ``` --- ## πŸ”΄ Bug 2.4.1b: App Close Not Working **Symptoms:** - Clicking X closes window but processes stay - Multiple rfcp-server.exe in Task Manager - Have to manually End Task **Root cause:** - Electron's killBackend() uses PID that may be wrong - Child processes (workers) not killed - taskkill may fail silently **File:** `desktop/main.js` **Fix β€” nuclear option (kill ALL rfcp-server.exe):** ```javascript const { execSync } = require('child_process'); function killAllBackendProcesses() { console.log('[KILL] Killing all rfcp-server processes...'); if (process.platform === 'win32') { try { // Kill ALL rfcp-server.exe processes execSync('taskkill /F /IM rfcp-server.exe /T', { stdio: 'ignore', timeout: 5000 }); console.log('[KILL] taskkill completed'); } catch (e) { // Error means no processes found β€” that's OK console.log('[KILL] No rfcp-server processes found or already killed'); } } else { // Unix: pkill try { execSync('pkill -9 -f rfcp-server', { stdio: 'ignore', timeout: 5000 }); } catch (e) { console.log('[KILL] No rfcp-server processes found'); } } } // Replace killBackend() calls with killAllBackendProcesses() // In close handler: mainWindow.on('close', (event) => { console.log('[CLOSE] Window close event'); killAllBackendProcesses(); }); // In app quit handlers: app.on('before-quit', () => { console.log('[QUIT] before-quit'); killAllBackendProcesses(); }); app.on('will-quit', () => { console.log('[QUIT] will-quit'); killAllBackendProcesses(); }); // Last resort: process.on('exit', () => { console.log('[EXIT] process.exit'); killAllBackendProcesses(); }); // Also add SIGINT/SIGTERM handlers: process.on('SIGINT', () => { console.log('[SIGNAL] SIGINT received'); killAllBackendProcesses(); process.exit(0); }); process.on('SIGTERM', () => { console.log('[SIGNAL] SIGTERM received'); killAllBackendProcesses(); process.exit(0); }); ``` **Also add cleanup endpoint to backend:** ```python # backend/app/api/routes/system.py @router.post("/shutdown") async def shutdown(): """Graceful shutdown endpoint.""" from app.services.parallel_coverage_service import _kill_worker_processes killed = _kill_worker_processes() # Schedule shutdown import asyncio asyncio.get_event_loop().call_later(0.5, lambda: os._exit(0)) return {"status": "shutting down", "workers_killed": killed} ``` **Call shutdown from Electron before killing:** ```javascript async function gracefulShutdown() { console.log('[SHUTDOWN] Requesting graceful shutdown...'); try { await fetch('http://127.0.0.1:8888/api/system/shutdown', { method: 'POST', timeout: 2000 }); console.log('[SHUTDOWN] Backend acknowledged'); } catch (e) { console.log('[SHUTDOWN] Backend did not respond, force killing'); } // Wait a moment, then force kill await new Promise(r => setTimeout(r, 500)); killAllBackendProcesses(); } ``` --- ## 🟑 Bug 2.4.1c: Dominant Path Too Slow **Symptoms:** - Detailed preset: 351ms/point (should be <50ms) - Log shows: `line_bldgs=646, refl_bldgs=302` - 600+ buildings checked per point = too slow **Root cause:** Spatial index returns all buildings on path, no distance limit. **File:** `backend/app/services/dominant_path_service.py` **Fix β€” limit buildings by distance:** ```python # At the start of find_dominant_paths() or equivalent: MAX_BUILDINGS_FOR_REFLECTION = 100 MAX_DISTANCE_FROM_PATH = 500 # meters def _filter_buildings_by_distance(buildings, tx_point, rx_point, max_count=100, max_distance=500): """ Filter buildings to only those close to the TX-RX path. Sort by distance to path midpoint, take top N. """ if len(buildings) <= max_count: return buildings # Calculate midpoint mid_lat = (tx_point[0] + rx_point[0]) / 2 mid_lon = (tx_point[1] + rx_point[1]) / 2 # Calculate distance to midpoint for each building def distance_to_midpoint(building): blat = building.get('centroid_lat', building.get('lat', mid_lat)) blon = building.get('centroid_lon', building.get('lon', mid_lon)) # Simple Euclidean approximation (fast) dlat = (blat - mid_lat) * 111000 dlon = (blon - mid_lon) * 111000 * 0.7 # rough cos correction return dlat*dlat + dlon*dlon # squared distance, no sqrt needed # Sort by distance buildings_with_dist = [(b, distance_to_midpoint(b)) for b in buildings] buildings_with_dist.sort(key=lambda x: x[1]) # Filter by max distance (squared) max_dist_sq = max_distance * max_distance filtered = [b for b, d in buildings_with_dist if d <= max_dist_sq] # Take top N return filtered[:max_count] # In the main calculation: def calculate_dominant_path(tx, rx, buildings, spatial_idx, ...): # Get buildings from spatial index line_buildings = spatial_idx.query_line(tx.lat, tx.lon, rx.lat, rx.lon) # FILTER to reduce count line_buildings = _filter_buildings_by_distance( line_buildings, (tx.lat, tx.lon), (rx.lat, rx.lon), max_count=MAX_BUILDINGS_FOR_REFLECTION, max_distance=MAX_DISTANCE_FROM_PATH ) # Same for reflection buildings refl_buildings = spatial_idx.query_point(mid_lat, mid_lon, buffer_cells=5) refl_buildings = _filter_buildings_by_distance( refl_buildings, (tx.lat, tx.lon), (rx.lat, rx.lon), max_count=MAX_BUILDINGS_FOR_REFLECTION, max_distance=MAX_DISTANCE_FROM_PATH ) _clog(f"Filtered: {len(line_buildings)} line, {len(refl_buildings)} refl (from {original_count})") # Continue with calculation... ``` **Expected improvement:** - Before: 600-700 buildings β†’ 351ms/point - After: 100 buildings β†’ ~50ms/point - **7x speedup** for Detailed preset --- ## 🟑 Bug 2.4.1d: GPU Not Detected in PyInstaller **Symptoms:** ```json "gpu": {"available": false, "name": null, "memory_mb": null} ``` - RTX 4060 present but not detected - CuPy not bundled in exe **Root cause:** CuPy has complex CUDA dependencies that PyInstaller doesn't auto-detect. **Option A: Document manual CuPy install (RECOMMENDED for now)** ```python # backend/app/services/gpu_service.py # At module level, add clear message: GPU_INSTALL_INSTRUCTIONS = """ GPU acceleration requires manual CuPy installation: 1. Check your CUDA version: nvidia-smi 2. Install matching CuPy: # For CUDA 12.x: pip install cupy-cuda12x # For CUDA 11.x: pip install cupy-cuda11x 3. Restart RFCP Note: GPU is optional. CPU calculations work fine. """ try: import cupy as cp # ... detection code ... except ImportError: print("[GPU] CuPy not installed β€” using CPU/NumPy") print("[GPU] To enable GPU acceleration:") print("[GPU] pip install cupy-cuda12x") ``` **Option B: Add CuPy to PyInstaller (complex, large file size)** If we want CuPy bundled, add to `installer/rfcp-server.spec`: ```python hiddenimports=[ # ... existing ... 'cupy', 'cupy._core', 'cupy._core._kernel', 'cupy.cuda', 'cupy.cuda.runtime', 'cupy.cuda.driver', # Many more cupy submodules... ], # Also need to include CUDA DLLs binaries=[ # This gets complicated β€” need CUDA toolkit DLLs ], ``` **Recommendation:** Start with Option A (manual install), add Option B later if needed. GPU is nice-to-have, not critical. --- ## 🟒 Feature 2.4.1e: Elevation Layer Toggle **Current state:** - Elevation layer exists but always visible as green overlay - No toggle in UI **File:** `frontend/src/App.tsx` or settings panel **Fix:** ```tsx // In settings/layer controls section:

Map Layers

{showElevation && (
setElevationOpacity(parseFloat(e.target.value))} /> {Math.round(elevationOpacity * 100)}%
)}
``` **In Map.tsx:** ```tsx {showElevation && ( )} ``` --- ## πŸ“ Files to Modify | File | Changes | |------|---------| | `backend/app/services/parallel_coverage_service.py` | Add `_kill_worker_processes()`, cleanup in finally block | | `backend/app/api/routes/coverage.py` | Force cleanup on timeout | | `backend/app/api/routes/system.py` | Add `/shutdown` endpoint | | `backend/requirements.txt` | Add `psutil>=5.9.0` | | `desktop/main.js` | Replace `killBackend()` with `killAllBackendProcesses()`, add graceful shutdown | | `backend/app/services/dominant_path_service.py` | Add `_filter_buildings_by_distance()`, limit to 100 buildings | | `backend/app/services/gpu_service.py` | Add install instructions in log | | `frontend/src/App.tsx` | Add elevation toggle if missing | | `frontend/src/components/map/Map.tsx` | Conditional elevation layer | --- ## πŸ§ͺ Testing ### Test 1: Memory Cleanup ```bash # Start app # Run Detailed preset (will timeout) # Check Task Manager β€” should be only 1 rfcp-server.exe after timeout # RAM should return to normal (~200MB) ``` ### Test 2: App Close ```bash # Start RFCP via installer (not debug) # Click X # Check Task Manager β€” NO rfcp-server.exe should remain ``` ### Test 3: Dominant Path Speed ```bash # Run test-coverage.bat # Detailed preset should complete in <120s (was 300s timeout) # Log should show "Filtered: 100 line, 100 refl" ``` ### Test 4: Elevation Toggle ```bash # Open app # Find elevation checkbox in settings # Toggle on/off β€” layer should appear/disappear # Adjust opacity β€” should change transparency ``` --- ## βœ… Success Criteria - [ ] After timeout: only 1 rfcp-server.exe, RAM < 500MB - [ ] After close (X): 0 rfcp-server.exe processes - [ ] Detailed preset: completes in <120s (not timeout) - [ ] Detailed preset: ~50ms/point (not 350ms) - [ ] Elevation layer: toggleable on/off - [ ] GPU message: clear install instructions in console --- ## πŸ“ˆ Expected Performance After Fixes | Preset | Before | After | Improvement | |--------|--------|-------|-------------| | Fast | 0.03s | 0.03s | β€” | | Standard | 37s | 35s | 1.1x | | Detailed | 300s (timeout) | ~90s | 3x+ | --- ## πŸ”œ Next Steps After 2.4.1 fixes: - Phase 2.5: Fun facts loading screen - Phase 2.5: LOS visualization (rays showing blocked paths) - Phase 2.6: Multi-site support improvements