YuBaoku
1e08ee74e5
[CI] Modify 4-card container startup config and move test case ( #7363 )
2026-04-13 05:23:49 -07:00
freeliuzc
31e2a8bbad
[Speculative Decoding] Support mtp super ultra overlap in pd-split mode with insert_task overlap ( #7323 )
...
* support mtp overlap in pd-split mode with insert_task overlap
2026-04-13 19:41:17 +08:00
JYChen
5ddd1af756
remove fa4 requirements ( #7143 )
2026-04-13 19:24:20 +08:00
AIbin
1fb8194191
[OP][Models][Optimization] 优化 RoPE CUDA kernel 并更新 DeepSeek V3 配置 ( #7359 )
...
* dsk del prefill mask
* dsk support 1M+ seq_len rope
* update rope tests
* Replace max_position_embeddings with max_model_len
* 1D grid: gridDim.x has a maximum size of 2^31-1, far exceeding the actual number of tokens.
2026-04-13 19:12:36 +08:00
Zhang Yulong
738c658c54
[Benchmark] Update seed argument handling in benchmark_serving.py ( #7356 )
2026-04-13 16:05:50 +08:00
周周周
a6f0055d51
add ips check ( #7352 )
...
* commit
* commit
---------
Co-authored-by: “liuruian” <liuruian@baidu.com >
2026-04-13 15:24:22 +08:00
liuruyan
b34708604c
[TI-consistent] support quant use pow2scale ( #7308 )
...
* support quant use pow2scale
* fix
* fix
2026-04-13 00:01:53 -07:00
AIbin
6213ad5340
[Docs][BugFix] fix mla log ( #7243 )
...
* [Docs] Fix Chinese punctuation issues
2026-04-13 12:15:43 +08:00
Nyako Shigure
d659099415
[Cleanup] Replace torch proxy alias with public compat API ( #7348 )
2026-04-13 11:43:26 +08:00
Jiajun Ji
cb03958b52
[XPU] Refactor get_padding_offset to single kernel. ( #7029 )
...
* [XPU] Refactor get_padding_offset to single kernel.
* add unittest.
* fix codestyle.
* remove cum_offsets_now.
* remove max_len.
2026-04-13 11:04:50 +08:00
Jiang-Jia-Jun
26d6a20c2f
[Optim] Remove IPCLock between CacheManager and WorkerProcess ( #7299 )
...
* [Optim] Remove IPCLock between CacheManager and WorkerProcess
* Update envs.py
* Update worker_process.py
---------
Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com >
2026-04-12 13:59:34 +08:00
周周周
225fc8d222
use self.hidden_size not use self.fd_config.model_config.hidden_size ( #7340 )
2026-04-11 22:39:43 +08:00
chen
4982aa000e
[RL]moe bf16 ep support paddle batch_gemm ( #7337 )
...
* moe bf16 ep support paddle batch_gemm
2026-04-11 21:51:12 +08:00
AIbin
ba01d7a823
[Optimization] [OP] [Models] dsk del prefill mask ( #7313 )
...
* dsk del prefill mask
* dsk support 1M+ seq_len rope
* update rope tests
2026-04-11 19:32:27 +08:00
JYChen
076ab07528
[RL] change glm rope_emb calculation ( #7316 )
...
* change glm rope_emb calculation
* glm without EnforceFmulRN
* fix ci
2026-04-11 18:36:28 +08:00
YuBaoku
fcf8b1336d
[CI] Fix nightly test error and add container cleanup in build_rl ( #7335 )
...
* [CI] Fix nightly test error and add container cleanup in build_rl
2026-04-11 12:14:46 +08:00
Jiaxin Sui
6e5de2fd6d
[XPU][CI]Update xtdk version in download_dependencies.sh ( #7320 )
2026-04-11 00:26:48 +08:00
YuBaoku
1269eda2f9
[CI] Ensure container cleanup after job to avoid resource leakage ( #7315 )
...
* [CI] Ensure container cleanup after job to avoid resource leakage
* [CI] Use prebuilt wheels to install xgrammar==0.1.19 and torch==2.6.0
2026-04-10 22:32:18 +08:00
sunxin
00005c92e0
[BugFix] Fix mtp empty run issue in overlap schedule and EP model ( #7300 )
2026-04-10 03:29:45 -07:00
zhangbo9674
627f0d9cc8
[RL] change rms norm for glm ( #7269 )
...
* change rms norm for glm
* refine code
* refine code
* refine code
2026-04-10 01:02:37 -07:00
K11OntheBoat
870dbac370
Use triton qk_norm both in Prefill and Decode ( #7213 )
...
Co-authored-by: “liuruian” <liuruian@baidu.com >
2026-04-10 15:44:01 +08:00
YuBaoku
5c9fa43150
[Docs] Update Release Note ( #7302 )
2026-04-10 15:26:53 +08:00
yinwei
4aecaa70ba
[XPU][Docs] Update Release Note ( #7262 )
...
* update
* update docs
* update docs
* update commit
* update commit
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-04-10 15:22:16 +08:00
bukejiyu
14d46181b8
[Loader] add multi-thread model loading ( #6877 )
...
* multi-thread-loader
* fix ut
2026-04-09 23:40:15 -07:00
GoldPancake
c1fb3112f8
[FDConfig] Support CLI args for quantization params and add cudagraph validation ( #7281 )
...
* refactor quant cli param
2026-04-10 14:13:42 +08:00
Zhang Yulong
7614175e13
Disable fixed random seed in benchmark_dataset.py ( #7263 )
...
Commented out the random seed initialization to allow for varied randomness in benchmarks.
2026-04-10 13:56:14 +08:00
Jiang-Jia-Jun
e327673737
Update nvidia_gpu.md
2026-04-10 13:53:04 +08:00
ming1753
734fbcffde
[BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug ( #7221 )
2026-04-10 11:31:51 +08:00
AIbin
3c54a41131
[Docs][Feature]add fastdeploy-llm-integration skill & research-report skill ( #7287 )
...
* add fastdeploy-llm-integration skill & research-report skill
2026-04-10 11:24:23 +08:00
YuBaoku
b7b4fe6a69
[Docs][CI] Fix prebuilt wheel installation and update Docs ( #7289 )
...
* [CI] Fix prebuilt wheel installation and update Docs
* [CI] Update Dockerfile.gpu to restrict SM80/86/89/90, CUDA 12.6 and Python 3.10
* Update nvidia_gpu.md
* Update nvidia_gpu.md
* Revise NVIDIA GPU installation instructions
Updated installation instructions for PaddlePaddle and FastDeploy to remove specific CUDA version mentions and clarify support for multiple GPU architectures.
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-04-10 10:31:12 +08:00
YuBaoku
ee73623c76
[CI] Set high-risk OOM tests for sequential execution ( #7268 )
2026-04-09 22:22:57 +08:00
YuBaoku
924690b791
[CI] Add no_proxy configuration for docker execution ( #7283 )
2026-04-09 19:20:33 +08:00
lizexu123
613f92ee8f
[Feature] support nvfp4 tbo ( #7259 )
2026-04-09 17:29:39 +08:00
AIbin
fcaf614133
[Docs]add dsk-3.2 doc ( #7278 )
...
* add dsk-3.2 doc
2026-04-09 17:28:25 +08:00
周周周
1782872d61
add deep_ep hopper test ( #7206 )
...
Co-authored-by: “liuruian” <liuruian@baidu.com >
2026-04-09 17:23:54 +08:00
fxyfxy777
39ff38aba1
[OP]Unify MoE op with moe_permute path for bf16 GLM ( #7164 )
2026-04-09 16:17:56 +08:00
Jiang-Jia-Jun
33682c6749
[Docs] Update docs for release/2.5 ( #7267 )
...
* Update docs for release/2.5
* Update English docs for release/2.5
- Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link
- Update docs/get_started/installation/nvidia_gpu.md:
- Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support
- paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives
- fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option
- Update docs/zh/get_started/installation/nvidia_gpu.md:
- Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clarify --extra-index-url usage in installation docs
Add note explaining that --extra-index-url is only for downloading
fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed
from the Paddle source specified by -i. Applied to both Chinese and
English nvidia_gpu.md installation guides.
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Update nvidia_gpu.md
---------
Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-04-09 16:07:18 +08:00
cloudforge1
85c6773e6c
[CI]【Hackathon 10th Spring No.33】config 单测补充 ( #6730 )
...
* [CI]【Hackathon 10th Spring No.33】config 单测补充
* fix test_commit_config: reset fields before partial-file test
* [CI]【Hackathon 10th Spring No.33】boost delta coverage for architecture helper branches
* [CI]【Hackathon 10th Spring No.33】add version attr to model config mock
* [CI]【Hackathon 10th Spring No.33】add mrope, runner validation, tail_layer coverage
* [CI]【Hackathon 10th Spring No.33】boost: cover 96 more lines (FDConfig assertions, guided decoding, env branches)
* [CI]【Hackathon 10th Spring No.33】config unit test
* [CI]【Hackathon 10th Spring No.33】cover expert parallel branch
* fix: reset commit hash before _load_from_version_file test; block cuda import via setitem(None)
* refactor: convert to unittest.TestCase style per reviewer request
---------
Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com >
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
Co-authored-by: Tao Luo <luotao02@baidu.com >
2026-04-09 14:28:54 +08:00
cloudforge1
cefc724607
[CI]【Hackathon 10th Spring No.29】engine unit test ( #6771 )
...
* [CI]【Hackathon 10th Spring No.29】engine unit test
Merge with upstream test_engine.py (PR #7083 ) and add comprehensive
coverage for LLMEngine: lifecycle, worker signals, requests, utils,
stop_profile, and start error handling.
* fix: add deploy_modality to _make_cfg() — Copilot review
---------
Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com >
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-04-09 13:45:59 +08:00
Jiaxin Sui
80d5d9fd32
[XPU][CI] lock xvllm version for fix bug ( #7264 )
...
* Remove duplicate NICs from environment variables
* Update version for xvllm in download_dependencies.sh
2026-04-09 12:44:27 +08:00
Bingoo
3d2326c1b9
[BugFix] detection jinja2 ( #7251 )
...
* detection jinja2
* format
2026-04-09 11:30:16 +08:00
xiaoxiaohehe001
51efe27d76
[BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn ( #7210 )
...
* [BugFix] fix_flash_mask_attn_sm90
* [BugFix] fix_flash_mask_attn_sm90
* [BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn
* [BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn
2026-04-09 11:05:10 +08:00
JYChen
43ace7af25
[RL] support moe-topk use topk_reduce_func ( #7218 )
...
* support moe-topk use topk_reduce_func
* fix ep error
* fix ut
* fix ut
2026-04-09 11:01:03 +08:00
ShaneGZhu
7005404ce3
[DeepSeekV3.2][Graph Optimization]Remove synchronous operation to avoid capture fail and unnecessary contiguous in DSA Backend ( #7253 )
...
* Delete contiguous ops.
* fix scale
* Delete unnecessary comments
* fix style
2026-04-09 11:00:13 +08:00
AIbin
48d2bbeb74
fix dsa ( #7252 )
2026-04-08 20:21:38 +08:00
Longzhi Wang
b262419db1
Revert "[Other] support video_fps args for video bench ( #7077 )" ( #7254 )
...
This reverts commit 938e7dd881 .
Co-authored-by: TBD1 <798934910@qq.com >
2026-04-08 20:13:57 +08:00
chenjian
427efadaee
[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 ( #7159 )
...
* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1
* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1
* fix
2026-04-08 19:30:54 +08:00
Jiajun Ji
9b970de029
[XPU] Add TP broadcast after sampling in XPU model runner to ensure consistent results across ranks. ( #7096 )
2026-04-08 19:26:53 +08:00
3em0
3749457476
[BugFix] fix multimodal hasher hash collision risk when ndarray shape or dtype differs ( #7185 )
...
numpy tobytes() only serializes raw element bytes without encoding shape
or dtype metadata. This means arrays with identical raw bytes but
different shapes (e.g. (6,4) vs (4,6)) or different dtypes (e.g.
float32 vs uint8 reinterpretation of same memory) produce the same
SHA-256 digest, leading to silent cache collisions in
ProcessorCacheManager / EncoderCacheManager / PrefixCacheManager.
Prepend a "{shape}|{dtype}|" header to the byte payload before hashing
so that shape and dtype participate in the digest.
Added test cases for shape and dtype sensitivity.
2026-04-08 04:26:02 -07:00
Jiaxin Sui
fbc3aa93de
[XPU][CI] Remove duplicate NICs from environment variables ( #7244 )
2026-04-08 19:14:15 +08:00