FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-22 16:07:51 +08:00

Author	SHA1	Message	Date
YuBaoku	b7b4fe6a69	[Docs][CI] Fix prebuilt wheel installation and update Docs (#7289 ) * [CI] Fix prebuilt wheel installation and update Docs * [CI] Update Dockerfile.gpu to restrict SM80/86/89/90, CUDA 12.6 and Python 3.10 * Update nvidia_gpu.md * Update nvidia_gpu.md * Revise NVIDIA GPU installation instructions Updated installation instructions for PaddlePaddle and FastDeploy to remove specific CUDA version mentions and clarify support for multiple GPU architectures. --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-04-10 10:31:12 +08:00
gongweibao	f6adcc00e2	Remove BUILD_WHEEL=2 (Python-only quick install) mode from build.sh (#6781 ) This mode used rsync to sync .py files to site-packages, bypassing the standard wheel build process. Remove it to simplify the build script and reduce maintenance burden. Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 11:46:18 +08:00
gongweibao	2541462f7e	[Feature][Docs] Add Python-only quick install mode (BUILD_WHEEL=2) to build.sh (#6503 ) * add pythononly func * add * add more feature * add safe check * add rsync check * add * add * refine docs * add installation * add installation	2026-02-26 16:17:41 +08:00
wangyifei	84a1780814	[build] support build sm 80,86,89,90 to one whl package (#6173 ) * support build sm 80,86,89,90 to one whl package * create tmp dir before build custom ops in FD_UNIFY_BUILD mode * typo fix * ignore exceptions in xpu ..	2026-01-26 11:30:02 +08:00
xiaozude	c06a6234b9	[Metax] optimize mla attention (#5258 )	2025-12-09 11:18:19 +08:00
Nyakku Shigure	f88c159de1	[BugFix] Exit if neither modern nor legacy wheel dir not found (#5367 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-04 16:45:48 +08:00
megemini	c06cfe2447	【Hackathon 9th No.109】[CppExtension] 添加 `fastdeploy_ops` 目录到 `package_data` 以支持现代打包方式 (#5156 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: SigureMo <sigure.qaq@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-22 01:32:06 +08:00
megemini	c35e540c18	【Hackathon 9th No.109】[CppExtension] Support build Custom OP in setuptools 80+ (#4977 ) * 添加对现代Python打包方法的兼容性支持 * [CppExtension] 优化构建脚本逻辑并更新.gitignore	2025-11-17 11:46:27 +08:00
Echo-Nie	a5e949d9d0	[Feature] Enhance build script, add pre_wheel logic (#4729 ) * Enhance build script, add pre_wheel logic Updated copyright year and added precompiled wheel installation logic. * update the nvidia_gpu.md, add pre_wheel description * fix zh .md * update the url, automatically detect CUDA and SM * Fix GPU architecture string formatting in build.sh * Change default for FD_USE_PRECOMPILED to 0 * fix build.sh * add ./dist, pre-wheel path * simplify the process,just save the whl * del pre_wheel dir * fix function name, extract_ops_from_precompiled_wheel * fix docs * add default commitID in docs --------- Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-13 19:03:52 +08:00
fmiao2372	f1b5392e20	[Intel HPU] Support intel hpu platform (#4161 ) * [Intel HPU] Support intel hpu platform * fix some issues * apply precommit and move AttentionBackend_HPU * fix format issue * correct ops import * fix ci issue * update code in layers * fix code style issue * remove dense tp moe ep mode * fix enc_dec_block_num * fix rebase issue * rename hpu to gaudi in readme * rename ForwardMeta_HPU to HPUForwardMeta	2025-09-24 12:27:50 +08:00
zhupengyang	9d0074a91a	[xpu] add ep custom ops (#3911 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-09-10 12:22:50 +08:00
chen	f0f00a6025	[OPs] Universal optimization and Fix early_stop cuda 700 (#3375 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details * delete nonzero * delete setup_ops_base.py * check if * check gcp infer_seed.cpu() * fix repetition_early_stopper_kernel cuda 700	2025-08-14 22:40:44 +08:00
Kane2011	b4fef2cf29	[MetaxGPU] Support FastDeploy on metax gpu (#3241 ) * [MetaxGPU] Support FastDeploy on metax gpu * Update metax_worker.py 1. change worker log; 2. remove custom allreduce, adapt it later; 3. remove cuda graph; * Update __init__.py 1. remove metax's key work comment * Update __init__.py 1. remove metax's key word comment; 2. add fused_moe_kernel_paddle import --------- Co-authored-by: yongqiangma <xing.wo@163.com>	2025-08-13 11:11:54 +08:00
freeliuzc	2e81792d64	[fix] fix 'force-reinstall all-depe-packages in build' (#2837 )	2025-07-14 16:50:54 +08:00
lifulll	1f28bdf994	dcu adapter ernie45t (#2756 ) Co-authored-by: lifu <lifu@sugon.com> Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-09 18:56:27 +08:00
celsowm	771e71a24d	Feat/blackwell sm100 support (#2670 ) * Add initial support for NVIDIA Blackwell (SM100) architecture This change introduces initial support for the NVIDIA Blackwell GPU architecture, specifically targeting SM100 (Compute Capability 10.x) with '100a' architecture-specific features (e.g., for CUTLASS). Key changes: - Updated custom_ops/setup_ops.py to generate appropriate gencode flags (arch=compute_100a,code=sm_100a) when '100' is specified in FD_BUILDING_ARCS. Requires CUDA 12.9+. - Updated custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h: - Added CutlassTileConfigSM100 enum (with placeholder tile shapes). - Added BLACKWELL to CandidateConfigTypeParam. - Updated CutlassGemmConfig struct with is_sm100 flag, tile_config_sm100, and new constructor for SM100. - Modified toString() and fromString() for SM100 support. - Updated custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu: - Added get_candidate_tiles_sm100() (with placeholder tiles). - Added placeholder mcast support functions for SM100. - Updated get_candidate_configs() to include SM100 paths using the BLACKWELL flag and new SM100 config types. - Updated build.sh with comments to guide users on specifying '100' for Blackwell in FD_BUILDING_ARCS. Further work: - Optimal CUTLASS tile configurations for SM100 need to be researched and updated in cutlass_heuristic.cu. - Kernel auto-generation scripts in custom_ops/utils/ may need SM100-specific versions if Blackwell's hardware features for FP8/TMA differ significantly from SM90. - Compatibility of third-party libraries (CUTLASS v3.8.0, DeepGEMM) with Blackwell should be fully verified. * Feat: Implement detailed Blackwell (SM100) CUTLASS heuristics This change integrates specific, expert-provided CUTLASS heuristic configurations for the NVIDIA Blackwell (SM100) GPU architecture, replacing previous placeholders. This includes: - Updated `custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h`: - Populated `CutlassTileConfigSM100` enum with specific tile shapes (e.g., CtaShape64x64x128B, CtaShape128x128x128B) suitable for SM100. - Added `FP4_ONLY` to `CandidateConfigTypeParam` for new FP4 paths. - Updated `custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu`: - Implemented `get_candidate_tiles_sm100` with detailed logic for selecting tile configurations based on GROUPED_GEMM and FP4_ONLY flags, using the new SM100 tile enums. - Implemented `supports_mcast_along_m_sm100` and `supports_mcast_along_n_sm100` with specific tile checks for Blackwell. - Updated the `sm == 100` (Blackwell) block in `get_candidate_configs` to use these new helper functions and accurately populate candidate kernel configurations for various cluster shapes. - `custom_ops/setup_ops.py` remains configured to compile for `arch=compute_100a,code=sm_100a` with CUDA 12.9+ for these features. This aligns the codebase with heuristic configurations similar to those in upstream TensorRT-LLM / CUTLASS for Blackwell, enabling more performant kernel selection on this new architecture. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-09 15:29:42 +08:00
EnflameGCU	f4f1d8de44	Support for non-CUDA builds (#2750 ) Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-09 11:48:40 +08:00
EnflameGCU	d0f4d6ba3a	[GCU] Support gcu platform (#2702 ) baseline: `e7fa57ebae` Co-authored-by: yongqiangma <xing.wo@163.com>	2025-07-08 13:00:52 +08:00
liddk1121	1b54a2831e	Adapt for iluvatar gpu (#2684 )	2025-07-07 16:53:14 +08:00
Yuanle Liu	91528f1af9	remove redundant install whl of fastdeploy (#2726 ) * remove redundant install * remove redundant install	2025-07-06 23:49:37 -07:00
YuBaoku	b37585e693	[BugFix] fix paddle_git_commit_id error (#2714 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details * set git identity to avoid merge failure in CI * add ci cases * [CI] Add validation for MTP and CUDAGraph * [BugFix] fix paddle_git_commit_id error	2025-07-04 22:16:37 +08:00
gaoziyuan	a5ae88ded9	[feature]add fd whl version info (#2698 )	2025-07-04 14:12:42 +08:00
Jiang-Jia-Jun	05c670e593	[Sync] Update to latest code (#2679 ) * [Sync] Update to latest code * Add new code files * Add new code files * update code * Try to fix build.sh * Try to fix build.sh * Update code * Update requirements.txt * Update code --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-03 15:43:53 +08:00
Yuanle Liu	9a14ab6572	add --force-reinstall --no-cache-dir when pip install fastdeploy*.whl (#2682 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-02 05:32:20 -07:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

26 Commits