FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
AIbin	1fb8194191	[OP][Models][Optimization] 优化 RoPE CUDA kernel 并更新 DeepSeek V3 配置 (#7359 ) * dsk del prefill mask * dsk support 1M+ seq_len rope * update rope tests * Replace max_position_embeddings with max_model_len * 1D grid: gridDim.x has a maximum size of 2^31-1, far exceeding the actual number of tokens.	2026-04-13 19:12:36 +08:00
Zhang Yulong	738c658c54	[Benchmark] Update seed argument handling in benchmark_serving.py (#7356 )	2026-04-13 16:05:50 +08:00
周周周	a6f0055d51	add ips check (#7352 ) * commit * commit --------- Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-04-13 15:24:22 +08:00
liuruyan	b34708604c	[TI-consistent] support quant use pow2scale (#7308 ) * support quant use pow2scale * fix * fix	2026-04-13 00:01:53 -07:00
AIbin	6213ad5340	[Docs][BugFix] fix mla log (#7243 ) * [Docs] Fix Chinese punctuation issues	2026-04-13 12:15:43 +08:00
Nyako Shigure	d659099415	[Cleanup] Replace torch proxy alias with public compat API (#7348 )	2026-04-13 11:43:26 +08:00
Jiajun Ji	cb03958b52	[XPU] Refactor get_padding_offset to single kernel. (#7029 ) * [XPU] Refactor get_padding_offset to single kernel. * add unittest. * fix codestyle. * remove cum_offsets_now. * remove max_len.	2026-04-13 11:04:50 +08:00
Jiang-Jia-Jun	26d6a20c2f	[Optim] Remove IPCLock between CacheManager and WorkerProcess (#7299 ) * [Optim] Remove IPCLock between CacheManager and WorkerProcess * Update envs.py * Update worker_process.py --------- Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com>	2026-04-12 13:59:34 +08:00
周周周	225fc8d222	use self.hidden_size not use self.fd_config.model_config.hidden_size (#7340 )	2026-04-11 22:39:43 +08:00
chen	4982aa000e	[RL]moe bf16 ep support paddle batch_gemm (#7337 ) * moe bf16 ep support paddle batch_gemm	2026-04-11 21:51:12 +08:00
AIbin	ba01d7a823	[Optimization] [OP] [Models] dsk del prefill mask (#7313 ) * dsk del prefill mask * dsk support 1M+ seq_len rope * update rope tests	2026-04-11 19:32:27 +08:00
JYChen	076ab07528	[RL] change glm rope_emb calculation (#7316 ) * change glm rope_emb calculation * glm without EnforceFmulRN * fix ci	2026-04-11 18:36:28 +08:00
YuBaoku	fcf8b1336d	[CI] Fix nightly test error and add container cleanup in build_rl (#7335 ) * [CI] Fix nightly test error and add container cleanup in build_rl	2026-04-11 12:14:46 +08:00
Jiaxin Sui	6e5de2fd6d	[XPU][CI]Update xtdk version in download_dependencies.sh (#7320 )	2026-04-11 00:26:48 +08:00
YuBaoku	1269eda2f9	[CI] Ensure container cleanup after job to avoid resource leakage (#7315 ) * [CI] Ensure container cleanup after job to avoid resource leakage * [CI] Use prebuilt wheels to install xgrammar==0.1.19 and torch==2.6.0	2026-04-10 22:32:18 +08:00
sunxin	00005c92e0	[BugFix] Fix mtp empty run issue in overlap schedule and EP model (#7300 )	2026-04-10 03:29:45 -07:00
zhangbo9674	627f0d9cc8	[RL] change rms norm for glm (#7269 ) * change rms norm for glm * refine code * refine code * refine code	2026-04-10 01:02:37 -07:00
K11OntheBoat	870dbac370	Use triton qk_norm both in Prefill and Decode (#7213 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-04-10 15:44:01 +08:00
YuBaoku	5c9fa43150	[Docs] Update Release Note (#7302 )	2026-04-10 15:26:53 +08:00
yinwei	4aecaa70ba	[XPU][Docs] Update Release Note (#7262 ) * update * update docs * update docs * update commit * update commit --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-04-10 15:22:16 +08:00
bukejiyu	14d46181b8	[Loader] add multi-thread model loading (#6877 ) * multi-thread-loader * fix ut	2026-04-09 23:40:15 -07:00
GoldPancake	c1fb3112f8	[FDConfig] Support CLI args for quantization params and add cudagraph validation (#7281 ) * refactor quant cli param	2026-04-10 14:13:42 +08:00
Zhang Yulong	7614175e13	Disable fixed random seed in benchmark_dataset.py (#7263 ) Commented out the random seed initialization to allow for varied randomness in benchmarks.	2026-04-10 13:56:14 +08:00
Jiang-Jia-Jun	e327673737	Update nvidia_gpu.md	2026-04-10 13:53:04 +08:00
ming1753	734fbcffde	[BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug (#7221 )	2026-04-10 11:31:51 +08:00
AIbin	3c54a41131	[Docs][Feature]add fastdeploy-llm-integration skill & research-report skill (#7287 ) * add fastdeploy-llm-integration skill & research-report skill	2026-04-10 11:24:23 +08:00
YuBaoku	b7b4fe6a69	[Docs][CI] Fix prebuilt wheel installation and update Docs (#7289 ) * [CI] Fix prebuilt wheel installation and update Docs * [CI] Update Dockerfile.gpu to restrict SM80/86/89/90, CUDA 12.6 and Python 3.10 * Update nvidia_gpu.md * Update nvidia_gpu.md * Revise NVIDIA GPU installation instructions Updated installation instructions for PaddlePaddle and FastDeploy to remove specific CUDA version mentions and clarify support for multiple GPU architectures. --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-04-10 10:31:12 +08:00
YuBaoku	ee73623c76	[CI] Set high-risk OOM tests for sequential execution (#7268 )	2026-04-09 22:22:57 +08:00
YuBaoku	924690b791	[CI] Add no_proxy configuration for docker execution (#7283 )	2026-04-09 19:20:33 +08:00
lizexu123	613f92ee8f	[Feature] support nvfp4 tbo (#7259 )	2026-04-09 17:29:39 +08:00
AIbin	fcaf614133	[Docs]add dsk-3.2 doc (#7278 ) * add dsk-3.2 doc	2026-04-09 17:28:25 +08:00
周周周	1782872d61	add deep_ep hopper test (#7206 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-04-09 17:23:54 +08:00
fxyfxy777	39ff38aba1	[OP]Unify MoE op with moe_permute path for bf16 GLM (#7164 )	2026-04-09 16:17:56 +08:00
Jiang-Jia-Jun	33682c6749	[Docs] Update docs for release/2.5 (#7267 ) * Update docs for release/2.5 * Update English docs for release/2.5 - Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link - Update docs/get_started/installation/nvidia_gpu.md: - Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support - paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives - fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option - Update docs/zh/get_started/installation/nvidia_gpu.md: - Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1 Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Clarify --extra-index-url usage in installation docs Add note explaining that --extra-index-url is only for downloading fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed from the Paddle source specified by -i. Applied to both Chinese and English nvidia_gpu.md installation guides. Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Update nvidia_gpu.md --------- Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-04-09 16:07:18 +08:00
cloudforge1	85c6773e6c	[CI]【Hackathon 10th Spring No.33】config 单测补充 (#6730 ) * [CI]【Hackathon 10th Spring No.33】config 单测补充 * fix test_commit_config: reset fields before partial-file test * [CI]【Hackathon 10th Spring No.33】boost delta coverage for architecture helper branches * [CI]【Hackathon 10th Spring No.33】add version attr to model config mock * [CI]【Hackathon 10th Spring No.33】add mrope, runner validation, tail_layer coverage * [CI]【Hackathon 10th Spring No.33】boost: cover 96 more lines (FDConfig assertions, guided decoding, env branches) * [CI]【Hackathon 10th Spring No.33】config unit test * [CI]【Hackathon 10th Spring No.33】cover expert parallel branch * fix: reset commit hash before _load_from_version_file test; block cuda import via setitem(None) * refactor: convert to unittest.TestCase style per reviewer request --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by: Tao Luo <luotao02@baidu.com>	2026-04-09 14:28:54 +08:00
cloudforge1	cefc724607	[CI]【Hackathon 10th Spring No.29】engine unit test (#6771 ) * [CI]【Hackathon 10th Spring No.29】engine unit test Merge with upstream test_engine.py (PR #7083) and add comprehensive coverage for LLMEngine: lifecycle, worker signals, requests, utils, stop_profile, and start error handling. * fix: add deploy_modality to _make_cfg() — Copilot review --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-04-09 13:45:59 +08:00
Jiaxin Sui	80d5d9fd32	[XPU][CI] lock xvllm version for fix bug (#7264 ) * Remove duplicate NICs from environment variables * Update version for xvllm in download_dependencies.sh	2026-04-09 12:44:27 +08:00
Bingoo	3d2326c1b9	[BugFix] detection jinja2 (#7251 ) * detection jinja2 * format	2026-04-09 11:30:16 +08:00
xiaoxiaohehe001	51efe27d76	[BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn (#7210 ) * [BugFix] fix_flash_mask_attn_sm90 * [BugFix] fix_flash_mask_attn_sm90 * [BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn * [BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn	2026-04-09 11:05:10 +08:00
JYChen	43ace7af25	[RL] support moe-topk use topk_reduce_func (#7218 ) * support moe-topk use topk_reduce_func * fix ep error * fix ut * fix ut	2026-04-09 11:01:03 +08:00
ShaneGZhu	7005404ce3	[DeepSeekV3.2][Graph Optimization]Remove synchronous operation to avoid capture fail and unnecessary contiguous in DSA Backend (#7253 ) * Delete contiguous ops. * fix scale * Delete unnecessary comments * fix style	2026-04-09 11:00:13 +08:00
AIbin	48d2bbeb74	fix dsa (#7252 )	2026-04-08 20:21:38 +08:00
Longzhi Wang	b262419db1	Revert "[Other] support video_fps args for video bench (#7077 )" (#7254 ) This reverts commit `938e7dd881`. Co-authored-by: TBD1 <798934910@qq.com>	2026-04-08 20:13:57 +08:00
chenjian	427efadaee	[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (#7159 ) * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * fix	2026-04-08 19:30:54 +08:00
Jiajun Ji	9b970de029	[XPU] Add TP broadcast after sampling in XPU model runner to ensure consistent results across ranks. (#7096 )	2026-04-08 19:26:53 +08:00
3em0	3749457476	[BugFix] fix multimodal hasher hash collision risk when ndarray shape or dtype differs (#7185 ) numpy tobytes() only serializes raw element bytes without encoding shape or dtype metadata. This means arrays with identical raw bytes but different shapes (e.g. (6,4) vs (4,6)) or different dtypes (e.g. float32 vs uint8 reinterpretation of same memory) produce the same SHA-256 digest, leading to silent cache collisions in ProcessorCacheManager / EncoderCacheManager / PrefixCacheManager. Prepend a "{shape}\|{dtype}\|" header to the byte payload before hashing so that shape and dtype participate in the digest. Added test cases for shape and dtype sensitivity.	2026-04-08 04:26:02 -07:00
Jiaxin Sui	fbc3aa93de	[XPU][CI] Remove duplicate NICs from environment variables (#7244 )	2026-04-08 19:14:15 +08:00
RichardWooSJTU	771d42c90b	[TBO] Apply tbo to gpu_model_runner (#7165 ) * apply tbo in gpu_model_runner * fix	2026-04-08 16:55:17 +08:00
YuBaoku	4cd574cf90	[CI] Reduce execution time for ngram kernel tests (#7242 )	2026-04-08 16:54:46 +08:00
Bingoo	043f2a16e3	support moe for sm103 (#7238 )	2026-04-08 15:52:39 +08:00

1 2 3 4 5 ...

5082 Commits