YuBaoku
cae2709eff
[CI] Update stable test workflow and run.sh script ( #6352 )
2026-02-05 11:01:15 +08:00
GoldPancake
183b8d325a
[RL] Support GLM MTP RL Model ( #6267 )
2026-02-04 20:14:35 +08:00
luukunn
765df94e6c
[Optimization]update prompt & prompt_token_ids ( #6334 )
...
* fix prompt
* add unit test
* add unit test
* fix
2026-02-04 20:08:01 +08:00
JYChen
bf78a48eb3
[Others] add mock unittest for sm100 FP8 inference ( #6273 )
...
* add unittest
* use new file
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-04 17:39:15 +08:00
sunxin
ef47e6eb46
[Others]skip to_tensor ( #6342 )
2026-02-04 17:25:19 +08:00
Zhang Yulong
26ba019e66
Update README.md ( #6343 )
2026-02-04 15:57:18 +08:00
MingkunZhang
43e3886ef9
[Metax][CI] fix run_ci_metax.sh error ( #6341 )
2026-02-04 15:43:48 +08:00
MingkunZhang
e109fb9a0e
[Metax][Fix] fix issues based #6259 ( #6338 )
2026-02-03 23:21:35 -08:00
chenjian
90db0bdd0d
[Optimize] Optimize ttft for ep ( #6098 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
2026-02-04 15:03:29 +08:00
mouxin
6e96bd0bd2
[Feature] Fix counter release logic & update go-router download URL ( #6280 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Fix counter release logic
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update token counter logic and docs
* [Feature] Update token counter logic and docs
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-04 15:02:38 +08:00
fxyfxy777
36547cfdb3
[Feature] FD_USE_PHI_FP8_QUANT ( #6320 )
...
* add ut
* add use_fd_quant env
* rm mask_per_token_quant
* add make ops list
* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true
* modify comments
* use bool type
* Add function declaration
2026-02-03 22:33:03 -08:00
MingkunZhang
2ffcb3d9ed
[Metax][CI] update ci test files ( #6340 )
2026-02-04 13:58:07 +08:00
sunxin
9b0a82cfa9
[Model Runner] Support overlap schedule ( #6259 )
2026-02-04 10:49:44 +08:00
周周周
6225439778
add PADDLE_ENFORCE ( #6321 )
2026-02-04 10:47:19 +08:00
xunyoyo
8225e694c9
[CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 ( #6286 )
...
* Add wint2 MoE backend tests
* Align wint2 test dtypes for cutlass apply
* Use bfloat16 input in wint2 test
* Stub moe_expert_reduce in wint2 test
* Use 2 experts in wint2 test
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-04 10:46:26 +08:00
Zhang Yulong
16d03c3127
update ( #6335 )
2026-02-03 21:59:32 +08:00
Jiang-Jia-Jun
793dac0f9d
Modify Nightly Build installation commands for fastdeploy
...
Update the installation instructions for the Nightly Build of fastdeploy to use the cu126 index for both SM86/89 and SM80/90 architectures.
2026-02-03 20:24:27 +08:00
Jiang-Jia-Jun
829139a5e5
Fix Nightly build installation URLs for fastdeploy-gpu
...
Updated installation instructions for the latest Nightly build of fastdeploy-gpu to use the correct URLs for CUDA 12.6.
2026-02-03 20:24:19 +08:00
RAM
5b22e5dfe7
[RL] R3 Support Fused Put the Routing of All Layers ( #6099 )
...
* fused put routing
* fix bug
* [draft commit]dynamic dtype
* fix async put & numpy bug
* fix unit8 test case
2026-02-03 04:13:16 -08:00
CSWYF3634076
722ca87db6
[Others] lazy write log when writing ( #6323 )
2026-02-03 20:11:13 +08:00
xiegegege
51c6fa8afc
[CE]add 21b cpu cache ,glm mtp,glm for rl config ( #6328 )
2026-02-03 20:10:47 +08:00
ddchenhao66
faade7d0ab
[BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled ( #6309 )
2026-02-03 19:49:01 +08:00
JYChen
c745a22420
[Feature] Support Ernie FP8 on sm100 ( the fixed version) ( #6304 )
2026-02-03 17:47:38 +08:00
kesmeey
73952a3b67
add tests ( #6243 )
...
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-03 17:02:36 +08:00
bukejiyu
12d4b4cb87
[Feature]Support reorder ids to split prefill and decodes ( #5779 )
...
* support reorder ids
* perfect code
* fix
* fix unittest
* delete code
* fix
* add python api
* delete custom op
* update algorithm
* fix swap
* support condense
* support condense
* support mtp
* delete code
* update
* update
* update
* update
* update for other platfrom
* update
* fix
* fix mtp
* fix ut
* update
* fix ut
* update ut
* fix
* fix encoder_cache
* fix ci
* fix
* fix vl
* Fix performance regression
* fix
* fix
* fix mtp
* fix index->req_id mapping
* fix ut
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-03 00:28:02 -08:00
周周周
cbdb2462ea
cp 1131 tbo to develop ( #6281 )
2026-02-03 15:23:23 +08:00
周周周
8277b95fa6
remove speculate_get_padding_offset op ( #6308 )
2026-02-03 15:18:12 +08:00
Moonchild1227
39dc4b0c2e
[Feature] [KVCache] support file_store kv cache backend ( #6188 )
...
* fix(examples): comment out stop.sh to avoid error when script is missing
* feat: add file_store support for cache manager
* [fix] fix multi gpu transfer
* [fix] fix global kvcache transfer
* [Feature] [KVCache] support file_store kv cache backend
* chore: update FileStore according to PR comments
* fix: remove comments
* fix: add swap_cache_layout for file store
* fix: remove rank key
* fix: Switch KV cache storage to pure file mode
* Temporarily disable support for Tensor types
* fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR
* fixx: Simplify cache_transfer_manager.py
* fix: fix syntax bug
* fix: Simplify file_store.py
* fix: Use the key directly as the filename
* fix: Simplify set()
* fix: Simplify cache_transfer_manager.py & file_store.py
* fix: Only support load to cpu buffer
* feat: add FileStore backend for cache transfer
* fix: guard zmq import
2026-02-03 14:37:58 +08:00
zccjjj
ee77ff9ebe
[config] fix assert message ( #6310 )
2026-02-03 14:37:46 +08:00
Jingfeng Wu
4760835789
Fix heartbeat signal's sleeptime error ( #6241 )
2026-02-03 14:28:51 +08:00
xjkmfa
e27a7cc5b0
[Benchmark] Ce qwen3 vl ( #6288 )
...
* [CE]qwen3-vl
2026-02-03 14:17:28 +08:00
fxyfxy777
f3413c4caa
[BugFix] fix fused_mask_swiglu_fp8_quant bug ( #6316 )
...
* optimize mask_quant op speed up 1.5
* fix calculate sequence
* add fused
* rm log
* push kernel code
* add ut
* accuracy ok
* add ue8m0
* add ut
* add merge develop
* rm ut of mask_per_token_quant
* Revert "[Optimize] optimize mask_quant & swiglu (#6222 )"
This reverts commit 2ada119a38 .
* add block_size
* pre-commit
2026-02-03 13:54:12 +08:00
ApplEOFDiscord
6563b8307c
[Bug Fix] fix tokenizer oom ( #6287 )
...
* fix tokenizer oom
* fix unit test
2026-02-03 11:27:11 +08:00
GoldPancake
fb374238e1
Revert "[RL] Support GLM MTP RL Model ( #6223 )" ( #6301 )
...
This reverts commit af6c84d48d .
2026-02-02 14:08:13 +08:00
fxyfxy777
2ada119a38
[Optimize] optimize mask_quant & swiglu ( #6222 )
...
* optimize mask_quant op speed up 1.5
* fix calculate sequence
* add fused
* rm log
* push kernel code
* add ut
* accuracy ok
* add ue8m0
* add ut
* add merge develop
* rm ut of mask_per_token_quant
2026-02-02 13:52:38 +08:00
xunyoyo
25656455ee
[CI] 【Hackathon 10th Spring No.38】功能模块 fastdeploy/entrypoints/openai/serving_completion.py单测补充 ( #6227 )
...
* Add serving completion tests
* test: tighten serving completion coverage
2026-02-02 12:53:04 +08:00
chenjian
af1b1d2d56
[Feature] Support report token index by attention store ( #6285 )
...
* [Feature] Support report token index by attention store
* fix format
2026-02-02 10:41:11 +08:00
kesmeey
afee0b9c5e
[CI] 【Hackathon 10th Spring No.30】功能模块 fastdeploy/inter_communicator/engine_worker_queue.py单测补充 ( #6102 )
...
* test: add comprehensive tests for EngineWorkerQueue to improve code coverage
* style: format tests/inter_communicator/test_e2w_queue.py with black
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-01-30 21:37:29 +08:00
xiaozude
030647521a
[Metax] adapt to the latest develop ( #6282 )
2026-01-29 23:21:20 -08:00
xunyoyo
18ebce9dec
[CI] 【Hackathon 10th Spring No.41】功能模块 fastdeploy/entrypoints/llm.py 单测补充 ( #6108 )
...
* Add LLM entrypoint tests for coverage
* test: streamline llm entrypoint coverage
* test: format llm tests
2026-01-30 12:58:10 +08:00
JYChen
6c685c9474
Revert "[Feature] Support Ernie FP8 on sm100 ( #5593 )" ( #6275 )
...
This reverts commit eb80724b71 .
2026-01-30 11:22:01 +08:00
chenjian
292bab7e6d
[BugFix] Fix bug for enable output caching ( #6226 )
...
* [BugFix] Fix bug for enable output caching
* fix
* Fix
* fix
* fix ci
2026-01-30 10:55:36 +08:00
mouxin
506f1545cd
[Feature] Enhance Router with /v1/completions, docs, scripts, and version info ( #5966 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-01-30 10:28:48 +08:00
MingkunZhang
c4abb01f9c
[Metax][Fix] fix 'get_token_penalty_multi_scores' input error based (PaddlePaddle#6069) ( #6266 )
2026-01-29 19:24:36 +08:00
Zhang Yulong
f3c12be4d2
Update _build_linux_rl.yml ( #6274 )
2026-01-29 19:10:47 +08:00
YuBaoku
bb7c1d13e1
[CI] Remove --ipc=host and --pid=host from _stable_test.yml ( #6270 )
2026-01-29 17:06:06 +08:00
Ryan
5e78c1ac87
[Graph Optimization] Support CUDAGraph for P/PD mixed Batch using SOT subgraph spliting mode ( #6196 )
...
* refine comment && refine variable name
* replace comment
2026-01-29 16:29:54 +08:00
周周周
e237313797
[BugFix] allow return code 250 in tests/distributed/test_fusedmoe_ep_entry.py ( #6269 )
2026-01-29 16:00:03 +08:00
yuxuan
44b52701f6
[Feature] Support NVFP4 MoE on SM100 ( #6003 )
...
* fp4 dense
* [WIP] support nvfp4, dense part
* [wip] developing loading qwen model
* loading
* update
* dense fp4 OK, cudagraph error
* [WIP] moe forward part
* with flashinfer-backend
* qwen3_moe_fp4
* update
* support flashinfer-cutlass moe, qwen3-moe-fp4 OK
* support ernie4.5-fp4
* fix load error
* add some ut
* add docs
* fix CLA, test
* fix the apply() in ModelOptNvFp4FusedMoE
* fix CodeStyle
* del the PADDLE_COMPATIBLE_API
* fix broken url: nvidia_gpu.md
* fix docs
* fix token_ids
* fix CI in Hopper
* move flashinfer imports inside the function
* fix model_runner
Removed the logic for generating random padding IDs.
* Remove skip condition for CUDA version in nvfp4 test
* add test for nvfp4
* fix according to review
* Add Chinese translation link to NVFP4 documentation
* del flashinfer.py
* fix unittest
---------
Co-authored-by: zoooo0820 <zoooo0820@qq.com >
Co-authored-by: bukejiyu <395822456@qq.com >
2026-01-29 14:16:07 +08:00
JYChen
eb80724b71
[Feature] Support Ernie FP8 on sm100 ( #5593 )
...
* Deepgemm暂时可用版本
* dense部分 e8m0 ok
* EB模型E8M0跑通的版本
* code check
* support 21b-tp2, dev_paddle
* 单机4.5T ep OK的版本
* 修复删除的代码,单机4.5T ep(非cudagraph)
* eb tp
* Support SM100 block-wise FP8 inference
* refine codes, support deepgemm on sm100
* add thirdparty PFCC/DeepGEMM
* fix ep decode
* 使用deepep ue8m0, 解决精度问题
* 修复FP8 TP精度
* Deepgemm升级适配Hopper逻辑
* add ue8m0 kernel
* add ue8m0 kernel
* fix custom_ops/gpu_ops/cpp_extensions.cc
* eb 输出正常
* eb5 text is right
* 目测精度一致
* 自测精度对齐
* 替换masked_per_token_quant, ep精度OK
* 性能提升约30%
* 暂时跑通ep但是有问题
* 自测一致
* rm test fun
* fix ep event
* 图优化算子更新Deepgemm
* fix build
* 暂时绕过deepgemm CI编译问题
* 根据SM区分deepgemm版本
* remove useless code
---------
Co-authored-by: ckl117 <ckl117@163.com >
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
Co-authored-by: fxyfxy777 <fxyfxy777@163.com >
2026-01-29 13:49:54 +08:00