FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
YuBaoku	cae2709eff	[CI] Update stable test workflow and run.sh script (#6352 )	2026-02-05 11:01:15 +08:00
GoldPancake	183b8d325a	[RL] Support GLM MTP RL Model (#6267 )	2026-02-04 20:14:35 +08:00
luukunn	765df94e6c	[Optimization]update prompt & prompt_token_ids (#6334 ) * fix prompt * add unit test * add unit test * fix	2026-02-04 20:08:01 +08:00
JYChen	bf78a48eb3	[Others] add mock unittest for sm100 FP8 inference (#6273 ) * add unittest * use new file --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-04 17:39:15 +08:00
sunxin	ef47e6eb46	[Others]skip to_tensor (#6342 )	2026-02-04 17:25:19 +08:00
Zhang Yulong	26ba019e66	Update README.md (#6343 )	2026-02-04 15:57:18 +08:00
MingkunZhang	43e3886ef9	[Metax][CI] fix run_ci_metax.sh error (#6341 )	2026-02-04 15:43:48 +08:00
MingkunZhang	e109fb9a0e	[Metax][Fix] fix issues based #6259 (#6338 )	2026-02-03 23:21:35 -08:00
chenjian	90db0bdd0d	[Optimize] Optimize ttft for ep (#6098 ) * optimize ttft * fix * fix * fix ci * fix ci * fix * fix bug * fix * add comments * fix ci * fix	2026-02-04 15:03:29 +08:00
mouxin	6e96bd0bd2	[Feature] Fix counter release logic & update go-router download URL (#6280 ) * [Doc] Update prerequisites in the documentation * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info * [Feature] Fix counter release logic * [Feature] Update go-router download URL * [Feature] Update go-router download URL * [Feature] Update go-router download URL * [Feature] Update go-router download URL * [Feature] Update token counter logic and docs * [Feature] Update token counter logic and docs --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-02-04 15:02:38 +08:00
fxyfxy777	36547cfdb3	[Feature] FD_USE_PHI_FP8_QUANT (#6320 ) * add ut * add use_fd_quant env * rm mask_per_token_quant * add make ops list * USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true * modify comments * use bool type * Add function declaration	2026-02-03 22:33:03 -08:00
MingkunZhang	2ffcb3d9ed	[Metax][CI] update ci test files (#6340 )	2026-02-04 13:58:07 +08:00
sunxin	9b0a82cfa9	[Model Runner] Support overlap schedule (#6259 )	2026-02-04 10:49:44 +08:00
周周周	6225439778	add PADDLE_ENFORCE (#6321 )	2026-02-04 10:47:19 +08:00
xunyoyo	8225e694c9	[CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 (#6286 ) * Add wint2 MoE backend tests * Align wint2 test dtypes for cutlass apply * Use bfloat16 input in wint2 test * Stub moe_expert_reduce in wint2 test * Use 2 experts in wint2 test --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-04 10:46:26 +08:00
Zhang Yulong	16d03c3127	update (#6335 )	2026-02-03 21:59:32 +08:00
Jiang-Jia-Jun	793dac0f9d	Modify Nightly Build installation commands for fastdeploy Update the installation instructions for the Nightly Build of fastdeploy to use the cu126 index for both SM86/89 and SM80/90 architectures.	2026-02-03 20:24:27 +08:00
Jiang-Jia-Jun	829139a5e5	Fix Nightly build installation URLs for fastdeploy-gpu Updated installation instructions for the latest Nightly build of fastdeploy-gpu to use the correct URLs for CUDA 12.6.	2026-02-03 20:24:19 +08:00
RAM	5b22e5dfe7	[RL] R3 Support Fused Put the Routing of All Layers (#6099 ) * fused put routing * fix bug * [draft commit]dynamic dtype * fix async put & numpy bug * fix unit8 test case	2026-02-03 04:13:16 -08:00
CSWYF3634076	722ca87db6	[Others] lazy write log when writing (#6323 )	2026-02-03 20:11:13 +08:00
xiegegege	51c6fa8afc	[CE]add 21b cpu cache ,glm mtp,glm for rl config (#6328 )	2026-02-03 20:10:47 +08:00
ddchenhao66	faade7d0ab	[BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled (#6309 )	2026-02-03 19:49:01 +08:00
JYChen	c745a22420	[Feature] Support Ernie FP8 on sm100 ( the fixed version) (#6304 )	2026-02-03 17:47:38 +08:00
kesmeey	73952a3b67	add tests (#6243 ) Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-03 17:02:36 +08:00
bukejiyu	12d4b4cb87	[Feature]Support reorder ids to split prefill and decodes (#5779 ) * support reorder ids * perfect code * fix * fix unittest * delete code * fix * add python api * delete custom op * update algorithm * fix swap * support condense * support condense * support mtp * delete code * update * update * update * update * update for other platfrom * update * fix * fix mtp * fix ut * update * fix ut * update ut * fix * fix encoder_cache * fix ci * fix * fix vl * Fix performance regression * fix * fix * fix mtp * fix index->req_id mapping * fix ut --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com> Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-03 00:28:02 -08:00
周周周	cbdb2462ea	cp 1131 tbo to develop (#6281 )	2026-02-03 15:23:23 +08:00
周周周	8277b95fa6	remove speculate_get_padding_offset op (#6308 )	2026-02-03 15:18:12 +08:00
Moonchild1227	39dc4b0c2e	[Feature] [KVCache] support file_store kv cache backend (#6188 ) * fix(examples): comment out stop.sh to avoid error when script is missing * feat: add file_store support for cache manager * [fix] fix multi gpu transfer * [fix] fix global kvcache transfer * [Feature] [KVCache] support file_store kv cache backend * chore: update FileStore according to PR comments * fix: remove comments * fix: add swap_cache_layout for file store * fix: remove rank key * fix: Switch KV cache storage to pure file mode * Temporarily disable support for Tensor types * fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR * fixx: Simplify cache_transfer_manager.py * fix: fix syntax bug * fix: Simplify file_store.py * fix: Use the key directly as the filename * fix: Simplify set() * fix: Simplify cache_transfer_manager.py & file_store.py * fix: Only support load to cpu buffer * feat: add FileStore backend for cache transfer * fix: guard zmq import	2026-02-03 14:37:58 +08:00
zccjjj	ee77ff9ebe	[config] fix assert message (#6310 )	2026-02-03 14:37:46 +08:00
Jingfeng Wu	4760835789	Fix heartbeat signal's sleeptime error (#6241 )	2026-02-03 14:28:51 +08:00
xjkmfa	e27a7cc5b0	[Benchmark] Ce qwen3 vl (#6288 ) * [CE]qwen3-vl	2026-02-03 14:17:28 +08:00
fxyfxy777	f3413c4caa	[BugFix] fix fused_mask_swiglu_fp8_quant bug (#6316 ) * optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant * Revert "[Optimize] optimize mask_quant & swiglu (#6222)" This reverts commit `2ada119a38`. * add block_size * pre-commit	2026-02-03 13:54:12 +08:00
ApplEOFDiscord	6563b8307c	[Bug Fix] fix tokenizer oom (#6287 ) * fix tokenizer oom * fix unit test	2026-02-03 11:27:11 +08:00
GoldPancake	fb374238e1	Revert "[RL] Support GLM MTP RL Model (#6223 )" (#6301 ) This reverts commit `af6c84d48d`.	2026-02-02 14:08:13 +08:00
fxyfxy777	2ada119a38	[Optimize] optimize mask_quant & swiglu (#6222 ) * optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant	2026-02-02 13:52:38 +08:00
xunyoyo	25656455ee	[CI] 【Hackathon 10th Spring No.38】功能模块 fastdeploy/entrypoints/openai/serving_completion.py单测补充 (#6227 ) * Add serving completion tests * test: tighten serving completion coverage	2026-02-02 12:53:04 +08:00
chenjian	af1b1d2d56	[Feature] Support report token index by attention store (#6285 ) * [Feature] Support report token index by attention store * fix format	2026-02-02 10:41:11 +08:00
kesmeey	afee0b9c5e	[CI] 【Hackathon 10th Spring No.30】功能模块 fastdeploy/inter_communicator/engine_worker_queue.py单测补充 (#6102 ) * test: add comprehensive tests for EngineWorkerQueue to improve code coverage * style: format tests/inter_communicator/test_e2w_queue.py with black --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-01-30 21:37:29 +08:00
xiaozude	030647521a	[Metax] adapt to the latest develop (#6282 )	2026-01-29 23:21:20 -08:00
xunyoyo	18ebce9dec	[CI] 【Hackathon 10th Spring No.41】功能模块 fastdeploy/entrypoints/llm.py 单测补充 (#6108 ) * Add LLM entrypoint tests for coverage * test: streamline llm entrypoint coverage * test: format llm tests	2026-01-30 12:58:10 +08:00
JYChen	6c685c9474	Revert "[Feature] Support Ernie FP8 on sm100 (#5593 )" (#6275 ) This reverts commit `eb80724b71`.	2026-01-30 11:22:01 +08:00
chenjian	292bab7e6d	[BugFix] Fix bug for enable output caching (#6226 ) * [BugFix] Fix bug for enable output caching * fix * Fix * fix * fix ci	2026-01-30 10:55:36 +08:00
mouxin	506f1545cd	[Feature] Enhance Router with /v1/completions, docs, scripts, and version info (#5966 ) * [Doc] Update prerequisites in the documentation * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-01-30 10:28:48 +08:00
MingkunZhang	c4abb01f9c	[Metax][Fix] fix 'get_token_penalty_multi_scores' input error based (PaddlePaddle#6069) (#6266 )	2026-01-29 19:24:36 +08:00
Zhang Yulong	f3c12be4d2	Update _build_linux_rl.yml (#6274 )	2026-01-29 19:10:47 +08:00
YuBaoku	bb7c1d13e1	[CI] Remove --ipc=host and --pid=host from _stable_test.yml (#6270 )	2026-01-29 17:06:06 +08:00
Ryan	5e78c1ac87	[Graph Optimization] Support CUDAGraph for P/PD mixed Batch using SOT subgraph spliting mode (#6196 ) * refine comment && refine variable name * replace comment	2026-01-29 16:29:54 +08:00
周周周	e237313797	[BugFix] allow return code 250 in tests/distributed/test_fusedmoe_ep_entry.py (#6269 )	2026-01-29 16:00:03 +08:00
yuxuan	44b52701f6	[Feature] Support NVFP4 MoE on SM100 (#6003 ) * fp4 dense * [WIP] support nvfp4, dense part * [wip] developing loading qwen model * loading * update * dense fp4 OK, cudagraph error * [WIP] moe forward part * with flashinfer-backend * qwen3_moe_fp4 * update * support flashinfer-cutlass moe, qwen3-moe-fp4 OK * support ernie4.5-fp4 * fix load error * add some ut * add docs * fix CLA, test * fix the apply() in ModelOptNvFp4FusedMoE * fix CodeStyle * del the PADDLE_COMPATIBLE_API * fix broken url: nvidia_gpu.md * fix docs * fix token_ids * fix CI in Hopper * move flashinfer imports inside the function * fix model_runner Removed the logic for generating random padding IDs. * Remove skip condition for CUDA version in nvfp4 test * add test for nvfp4 * fix according to review * Add Chinese translation link to NVFP4 documentation * del flashinfer.py * fix unittest --------- Co-authored-by: zoooo0820 <zoooo0820@qq.com> Co-authored-by: bukejiyu <395822456@qq.com>	2026-01-29 14:16:07 +08:00
JYChen	eb80724b71	[Feature] Support Ernie FP8 on sm100 (#5593 ) * Deepgemm暂时可用版本 * dense部分 e8m0 ok * EB模型E8M0跑通的版本 * code check * support 21b-tp2, dev_paddle * 单机4.5T ep OK的版本 * 修复删除的代码,单机4.5T ep(非cudagraph) * eb tp * Support SM100 block-wise FP8 inference * refine codes, support deepgemm on sm100 * add thirdparty PFCC/DeepGEMM * fix ep decode * 使用deepep ue8m0, 解决精度问题 * 修复FP8 TP精度 * Deepgemm升级适配Hopper逻辑 * add ue8m0 kernel * add ue8m0 kernel * fix custom_ops/gpu_ops/cpp_extensions.cc * eb 输出正常 * eb5 text is right * 目测精度一致 * 自测精度对齐 * 替换masked_per_token_quant, ep精度OK * 性能提升约30% * 暂时跑通ep但是有问题 * 自测一致 * rm test fun * fix ep event * 图优化算子更新Deepgemm * fix build * 暂时绕过deepgemm CI编译问题 * 根据SM区分deepgemm版本 * remove useless code --------- Co-authored-by: ckl117 <ckl117@163.com> Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: fxyfxy777 <fxyfxy777@163.com>	2026-01-29 13:49:54 +08:00

1 2 3 4 5 ...

4571 Commits