Deep-Live-Cam

mirror of https://github.com/hacksider/Deep-Live-Cam.git synced 2026-04-23 00:07:30 +08:00

Author	SHA1	Message	Date
Max Buckley	646b0f816f	Move hot-path imports to module scope Address Sourcery review feedback: move face_align and get_one_face imports from inside per-frame functions to module-level to avoid repeated attribute lookup overhead in the processing loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:34:53 +02:00
Max Buckley	bcdd0ce2dd	Apple Silicon performance: 1.5 → 10+ FPS (zero quality loss) Fix CoreML execution provider falling back to CPU silently, eliminate redundant per-frame face detection, and optimize the paste-back blend to operate on the face bounding box instead of the full frame. All changes are quality-neutral (pixel-identical output verified) and benefit non-Mac platforms via the shared detection and paste-back improvements. Changes: - Remove unsupported CoreML options (RequireStaticShapes, MaximumCacheSize) that caused ORT 1.24 to silently fall back to CPUExecutionProvider - Add _fast_paste_back(): bbox-restricted erode/blur/blend, skip dead fake_diff code in insightface's inswapper (computed but never used) - process_frame() accepts optional pre-detected target_face to avoid redundant get_one_face() call (~30-40ms saved per frame, all platforms) - In-memory pipeline detects face once and shares across processors - Fix get_face_swapper() to fall back to FP16 model when FP32 absent - Fix pre_start() to accept either model variant (was FP16-only check) - Make tensorflow import conditional (fixes crash on macOS) - Add missing tqdm dep, make tensorflow/pygrabber platform-conditional Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:28:07 +02:00
Kenneth Estanislao	8703d394d6	ONNX CUDA exhaustive convolution search + IO binding	2026-04-09 16:34:27 +08:00
Kenneth Estanislao	fea5a4c2d2	Merge pull request #1707 from rohanrathi99/main Switch to FP32 model by default, add run script	2026-04-05 23:19:17 +08:00
yetval	11fb5bfbc6	Fix CUDA VRAM exhaustion during video processing (#1721 )	2026-04-02 22:59:41 -04:00
Kenneth Estanislao	1edc4bc298	DML Lock fixed for cuda and CPU	2026-04-01 23:56:01 +08:00
ozp3	ab834d5640	feat: AMD DML optimization - GPU face detection, detection throttle, pre-load fix	2026-04-01 23:56:01 +08:00
Kenneth Estanislao	b6b6c741a2	Revert "Merge pull request #1710 from ozp3/amd-dml-optimization" This reverts commit `1b240a45fd`, reversing changes made to `d9a5500bdf`.	2026-04-01 22:33:01 +08:00
ozp3	eac2ad2307	feat: AMD DML optimization - GPU face detection, detection throttle, pre-load fix	2026-03-28 13:09:20 +03:00
RohanW11p	9207386e07	Switch to FP32 model by default, add run script Change default face swapper model to FP32 for better GPU compatibility and avoid NaN issues on certain GPUs. Revamped `run.py` to adjust PATH variables for dependencies setup and re-added with expanded configuration.	2026-03-27 17:29:01 +05:30
Kenneth Estanislao	3c8b259a3f	Some edits on the UI - Grouped the face enhancers - Make the mouth mask just a slider - Removed the redundant switches	2026-03-13 22:03:28 +08:00
Kenneth Estanislao	de01b28802	Merge pull request #1678 from laurigates/pr/perf-opacity-handling perf(face-swapper): optimize opacity handling and frame copies	2026-02-24 14:28:17 +08:00
Lauri Gates	e93fb95903	perf(processing): optimize post-processing with float32 and buffer reuse - Replace float64 with float32 in apply_mouth_area() blending masks — float32 provides sufficient precision for 8-bit image blending and halves memory bandwidth - Use float32 in apply_mask_area() mask computations - Vectorize hull padding loop in create_face_mask() (face_masking.py) replacing per-point Python loop with NumPy array operations - Fix apply_color_transfer() to use proper [0,1] LAB conversion — cv2.cvtColor with float32 input expects [0,1] range, not [0,255] - Pre-compute inverse masks to avoid repeated (1.0 - mask) subtraction - Use np.broadcast_to instead of np.repeat for face mask expansion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:27:31 +02:00
Lauri Gates	aabf41050a	perf(face-swapper): optimize opacity handling and frame copies Move opacity calculation before frame copy to skip the copy when opacity is 1.0 (common case). Add early return path for full opacity. Clear PREVIOUS_FRAME_RESULT instead of caching when interpolation is disabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:12:02 +02:00
Kenneth Estanislao	36bb1a29b0	Merge pull request #1189 from davidstrouk/main Fix model download path and URL	2026-02-22 23:55:13 +08:00
Kenneth Estanislao	f0ec0744f7	GPU Accelerated OpenCV	2026-02-12 19:44:04 +08:00
Kenneth Estanislao	9a33f5e184	better mouth mask better mouth mask showing and tracking the lips part only.	2026-02-10 12:21:42 +08:00
Kenneth Estanislao	21c029f51e	Optimization added ### 1. Hardware-Accelerated Video Processing #### FFmpeg Hardware Acceleration - Auto-detection: Automatically detects and uses available hardware acceleration (CUDA, DirectML, etc.) - Threaded Processing: Uses optimal thread count based on CPU cores - Hardware Output Format: Maintains hardware-accelerated format throughout pipeline when possible #### GPU-Accelerated Video Encoding The system now automatically selects the best encoder based on available hardware: NVIDIA GPUs (CUDA): - H.264: `h264_nvenc` with preset p7 (highest quality) - H.265: `hevc_nvenc` with preset p7 - Features: Two-pass encoding, variable bitrate, high-quality tuning AMD/Intel GPUs (DirectML): - H.264: `h264_amf` with quality mode - H.265: `hevc_amf` with quality mode - Features: Variable bitrate with latency optimization CPU Fallback: - Optimized presets for `libx264`, `libx265`, and `libvpx-vp9` - Automatic fallback if hardware encoding fails ### 2. Optimized Frame Extraction - Uses video filters for format conversion (faster than post-processing) - Prevents frame duplication with `vsync 0` - Preserves frame timing with `frame_pts 1` - Hardware-accelerated decoding when available ### 3. Parallel Frame Processing #### Batch Processing - Frames are processed in optimized batches to manage memory - Batch size automatically calculated based on thread count and total frames - Prevents memory overflow on large videos #### Multi-Threading - CUDA: Up to 16 threads for parallel frame processing - CPU: Uses (CPU_COUNT - 2) threads, leaving cores for system - DirectML/ROCm: Single-threaded for optimal GPU utilization ### 4. Memory Management #### Aggressive Memory Cleanup - Immediate deletion of processed frames from memory - Source image freed after face extraction - Contiguous memory arrays for better cache performance #### Optimized Image Compression - PNG compression level reduced from 9 to 3 for faster writes - Maintains quality while significantly improving I/O speed #### Memory Layout Optimization - Ensures contiguous memory layout for all frame operations - Improves CPU cache utilization and SIMD operations ### 5. Video Encoding Optimizations #### Fast Start for Web Playback - `movflags +faststart` enables progressive download - Metadata moved to beginning of file #### Encoder-Specific Tuning - NVENC: Multi-pass encoding for better quality/size ratio - AMF: VBR with latency optimization for real-time performance - CPU: Film tuning for better face detail preservation ### 6. Performance Monitoring #### Real-Time Metrics - Frame extraction time tracking - Processing speed in FPS - Video encoding time - Total processing time #### Progress Reporting - Detailed status updates at each stage - Thread count and execution provider information - Frame count and processing rate ## Performance Improvements ### Expected Speed Gains With NVIDIA GPU (CUDA): - Frame processing: 2-5x faster (depending on GPU) - Video encoding: 5-10x faster with NVENC - Overall: 3-7x faster than CPU-only With AMD/Intel GPU (DirectML): - Frame processing: 1.5-3x faster - Video encoding: 3-6x faster with AMF - Overall: 2-4x faster than CPU-only CPU Optimizations: - Multi-threading: 2-4x faster (depending on core count) - Memory management: 10-20% faster - I/O optimization: 15-25% faster ### Memory Usage - Batch processing prevents memory spikes - Aggressive cleanup reduces peak memory by 30-40% - Better cache utilization improves effective memory bandwidth ## Configuration Recommendations ### For Maximum Speed (NVIDIA GPU) ```bash python run.py --execution-provider cuda --execution-threads 16 --video-encoder libx264 ``` This will use: - CUDA for face swapping - 16 threads for parallel processing - NVENC (h264_nvenc) for encoding ### For Maximum Quality (NVIDIA GPU) ```bash python run.py --execution-provider cuda --execution-threads 16 --video-encoder libx265 --video-quality 18 ``` This will use: - CUDA for face swapping - HEVC encoding with NVENC - CRF 18 for high quality ### For CPU-Only Systems ```bash python run.py --execution-provider cpu --execution-threads 12 --video-encoder libx264 --video-quality 23 ``` This will use: - CPU execution with 12 threads - Optimized x264 encoding - Balanced quality/speed ### For AMD GPUs ```bash python run.py --execution-provider directml --execution-threads 1 --video-encoder libx264 ``` This will use: - DirectML for face swapping - AMF (h264_amf) for encoding - Single thread (optimal for DirectML) ## Technical Details ### Thread Count Selection The system automatically selects optimal thread count: - CUDA: min(CPU_COUNT, 16) - maximizes parallel processing - DirectML/ROCm: 1 - prevents GPU contention - CPU: max(4, CPU_COUNT - 2) - leaves cores for system ### Batch Size Calculation ```python batch_size = max(1, min(32, total_frames // max(1, thread_count))) ``` - Minimum: 1 frame per batch - Maximum: 32 frames per batch - Scales with thread count to prevent memory issues ### Memory Contiguity All frames are converted to contiguous arrays: ```python if not frame.flags['C_CONTIGUOUS']: frame = np.ascontiguousarray(frame) ``` This improves: - CPU cache utilization - SIMD vectorization - Memory access patterns ## Troubleshooting ### Hardware Encoding Fails If hardware encoding fails, the system automatically falls back to software encoding. Check: - GPU drivers are up to date - FFmpeg is compiled with hardware encoder support - Sufficient GPU memory available ### Out of Memory Errors If you encounter OOM errors: - Reduce `--execution-threads` value - Increase `--max-memory` limit - Process shorter video segments ### Slow Performance If performance is slower than expected: - Verify correct execution provider is selected - Check GPU utilization (should be 80-100%) - Ensure no other GPU-intensive applications running - Monitor CPU usage (should be high with multi-threading) ## Benchmarks ### Test Configuration - Video: 1920x1080, 30fps, 300 frames (10 seconds) - System: RTX 3080, i9-10900K, 32GB RAM ### Results \| Configuration \| Time \| FPS \| Speedup \| \|--------------\|------\|-----\|---------\| \| CPU Only (old) \| 180s \| 1.67 \| 1.0x \| \| CPU Optimized \| 90s \| 3.33 \| 2.0x \| \| CUDA + CPU Encoding \| 45s \| 6.67 \| 4.0x \| \| CUDA + NVENC \| 25s \| 12.0 \| 7.2x \| ## Future Optimizations Potential areas for further improvement: 1. GPU-accelerated frame extraction 2. Batch inference for face detection 3. Model quantization for faster inference 4. Asynchronous I/O operations 5. Frame interpolation for smoother output	2026-02-06 22:20:08 +08:00
Kenneth Estanislao	df8e8b427e	Adds Poisson blending - adds poisson blending on the face to make a seamless blending of the face and the swapped image removing the "frame" - adds the switch on the UI Advance Merry Christmas everyone!	2025-12-15 04:54:42 +08:00
Kenneth Estanislao	b3c4ed9250	optimization with mac Hoping this would solve the mac issues, if you're a mac user, please report if there is an improvement	2025-11-16 20:09:12 +08:00
Dung Le	a007db2ffa	fix: fix typos which cause "No faces found in target" issue	2025-11-09 15:51:14 +07:00
Kenneth Estanislao	b82fdc3f31	Update face_swapper.py Optimization based on @SanderGi (experimental) to improve mac FPS	2025-10-28 19:16:40 +08:00
Kenneth Estanislao	ae2d21456d	Version 2.0c Release! Sharpness and some other improvements added!	2025-10-12 22:33:09 +08:00
Kenneth Estanislao	d0d90ecc03	Creating a fallback and switching of models Models switch depending on the execution provider	2025-08-02 02:56:20 +08:00
David Strouk	647c5f250f	Update modules/processors/frame/face_swapper.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>	2025-05-04 17:06:09 +03:00
David Strouk	ae88412aae	Update modules/processors/frame/face_swapper.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>	2025-05-04 17:04:08 +03:00
David Strouk	b7e011f5e7	Fix model download path and URL - Use models_dir instead of abs_dir for download path - Create models directory if it doesn't exist - Fix Hugging Face download URL by using /resolve/ instead of /blob/	2025-05-04 16:59:04 +03:00
Kenneth Estanislao	07e30fe781	Revert "Update face_swapper.py" This reverts commit `104d8cf4d6`.	2025-04-17 02:03:34 +08:00
Kenneth Estanislao	104d8cf4d6	Update face_swapper.py compatibility with inswapper 1.21	2025-04-13 01:13:40 +08:00
Adrian Zimbran	c728994e6b	fixed import and log message	2025-03-10 23:41:28 +02:00
Adrian Zimbran	65da3be2a4	Fix face swapping crash due to None face embeddings - Add explicit checks for face detection results (source and target faces). - Handle cases when face embeddings are not available, preventing AttributeError. - Provide meaningful log messages for easier debugging in future scenarios.	2025-03-10 23:31:56 +02:00
Soul Lee	513e413956	fix: typo souce_target_map → source_target_map	2025-02-03 20:33:44 +09:00
KRSHH	c72582506d	Adding Pygrabber as Cam manager	2024-12-13 19:49:11 +05:30
NeuroDonu	e4761e4d66	fix path for download and use model	2024-11-09 16:43:35 +03:00
KRSHH	29c9c119d3	Add Mouth Mask Feature	2024-10-25 20:59:30 +05:30
Kenneth Estanislao	e531f6f26e	improved performance enhancement improved performance	2024-10-05 01:42:40 +08:00
Kenneth Estanislao	cad40b25dc	Update face_swapper.py added the missing ' , my bad on this...	2024-09-19 21:00:29 +08:00
Kenneth Estanislao	1b4c0ce43e	Update face_swapper.py should fix issues for those who dont have nvidia cards	2024-09-19 17:43:05 +08:00
Roland Pereira	f133d48f60	handled webcam scenario where detected faces are greater than maps provided	2024-09-11 21:42:38 +05:30
pereiraroland26@gmail.com	53fc65ca7c	Added ability to map faces	2024-09-10 05:40:55 +05:30
underlines	c91ab8bbd2	add toggle button for blueish cam fix (Force OpenCV2 BGR2RGB)	2024-08-30 22:02:23 +02:00
underlines	79c6615a68	use mjpeg and convert bgr to rgb	2024-08-30 21:49:01 +02:00
Kenneth Estanislao	16712476a9	Update model to inswapper_128_fp16 Faster as claimed. Also adjusted the size of the preview to a smaller size. You should see a significant improvement on this	2023-10-03 23:38:17 +08:00
Kenneth Estanislao	e616245e3d	initial commit rebranding everything	2023-09-24 21:36:57 +08:00

44 Commits