Zhang Yulong
30db3e9d8f
[benchmark] update tools ( #7512 )
2026-04-20 19:40:17 +08:00
Zhang Yulong
738c658c54
[Benchmark] Update seed argument handling in benchmark_serving.py ( #7356 )
2026-04-13 16:05:50 +08:00
Zhang Yulong
7614175e13
Disable fixed random seed in benchmark_dataset.py ( #7263 )
...
Commented out the random seed initialization to allow for varied randomness in benchmarks.
2026-04-10 13:56:14 +08:00
Zhang Yulong
f422f835e8
[benchmark] update tools ( #7211 )
2026-04-07 16:25:44 +08:00
xiegegege
209e5cf7f4
[CE]add 21b mooncake yaml ( #7033 )
...
* [CE]add 21b cpu cache ,glm mtp,glm for rl config
* [CE]add 21b tp2 yaml
* [CE]add 21b mooncake yaml
* add fastdeploy benchmark,paddletest-155
* [CE] adjust vl wint4 config
* [CE]add glm mtp with updatemodel config
* [CE]fix
* fix
* test
* test
* test
---------
Co-authored-by: xiegegege <>
2026-03-26 20:01:05 +08:00
Zhang Yulong
6f5aa883f7
[benchmark] update benchmark tools ( #6991 )
...
* [benchmark] update benchmark tools
* [benchmark] update benchmark tools
2026-03-24 20:56:27 +08:00
Zhang Yulong
2b10ebc1f1
[benchmark] Refactor debug logging and payload handling ( #6949 )
...
* Refactor debug logging and payload handling
* Update backend_request_func.py
2026-03-20 15:04:10 +08:00
Zhang Yulong
3a4e139f65
[Benchmark] fix multi turn ( #6948 )
2026-03-20 13:22:30 +08:00
xjkmfa
3b203994e2
[Benchmark] Update Qwen3 vl 32k yaml ( #6946 )
2026-03-20 11:48:53 +08:00
xjkmfa
a81116ad90
[Benchmark] Update Qwen3 vl dense yaml ( #6945 )
2026-03-20 11:26:47 +08:00
Zhang Yulong
051bbbeead
[Benchmark] Update backend_request_func.py ( #6575 )
2026-02-28 19:51:55 +08:00
Zhang Yulong
ce8123cb7f
[Benchmark] Update backend_request_func.py ( #6566 )
2026-02-28 14:54:30 +08:00
Zhang Yulong
ff20a3cc02
[benchmark] update tool call ( #6519 )
2026-02-26 17:06:54 +08:00
Zhang Yulong
96bfa0d5b9
[benchmark] Update benchmark_serving.py ( #6467 )
2026-02-11 20:10:46 +08:00
Zhang Yulong
02c61f8346
[Benchmark] Update backend_request_func.py ( #6441 )
2026-02-10 19:58:50 +08:00
Zhang Yulong
66c9e11998
[benchmark] update tools ( #6437 )
2026-02-10 17:48:55 +08:00
Zhang Yulong
26ba019e66
Update README.md ( #6343 )
2026-02-04 15:57:18 +08:00
Zhang Yulong
16d03c3127
update ( #6335 )
2026-02-03 21:59:32 +08:00
xiegegege
51c6fa8afc
[CE]add 21b cpu cache ,glm mtp,glm for rl config ( #6328 )
2026-02-03 20:10:47 +08:00
xjkmfa
e27a7cc5b0
[Benchmark] Ce qwen3 vl ( #6288 )
...
* [CE]qwen3-vl
2026-02-03 14:17:28 +08:00
ophilia-lee
1705d0af7a
[benchmark]支持SGLang/VLLM获取cached tokens ( #6240 )
...
* benchmark工具支持受限解码场景指定response_format
* Update backend_request_func.py
output.success判断兼容思考内容超长截断时回复内容为空的情况
* Update benchmark_serving.py
更新benchmark_metrics
* 支持Completions接口
* 支持Completions接口
* 支持Completions接口
* [Benchmark]支持Completions接口
* [Benchmark]支持Completions接口
* [Benchmark]async_request_eb_openai_completions 调大aiohttp 默认读 buffer size至4M,解决streaming 返回块过大报Chunk too big问题
* [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题
* [Benchmark]支持获取vLLM/SGLang cached_tokens
[Benchmark]支持获取vLLM/SGLang cached_tokens
* [benchmark]支持SGLang/VLLM获取cached tokens
[benchmark]支持SGLang/VLLM获取cached tokens
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-01-27 14:57:20 +08:00
xiegegege
e22c4e29bb
[CE]add paddleocr config yaml ( #6097 )
2026-01-19 20:07:42 +08:00
jc
e911ac2ce7
[BugFix] Refine the preparation of cpu and storage cache ( #5777 )
...
* Refine the preparation of cpu and storage cache
* fix error
* fix error
* up
* fix
* up docs
* fix unittest
* remove debug info
2026-01-05 10:13:30 +08:00
Zhang Yulong
2da32f2a35
Update benchmark_serving.py ( #5861 )
2026-01-04 20:07:56 +08:00
ophilia-lee
d5f5dc4f6e
[Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题 ( #5771 )
...
* benchmark工具支持受限解码场景指定response_format
* Update backend_request_func.py
output.success判断兼容思考内容超长截断时回复内容为空的情况
* Update benchmark_serving.py
更新benchmark_metrics
* 支持Completions接口
* 支持Completions接口
* 支持Completions接口
* [Benchmark]支持Completions接口
* [Benchmark]支持Completions接口
* [Benchmark]async_request_eb_openai_completions 调大aiohttp 默认读 buffer size至4M,解决streaming 返回块过大报Chunk too big问题
* [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-25 19:36:11 +08:00
Juncai
412867fd99
[Feature] Support KV Cache Storage ( #5571 )
...
* Support Mooncake Store
* up
* up
* add op
* fix conflict
* fix error
* up for comments
* avoid thread lock
* up
* fix unittest
* fix unittest
* remove debug info
* consider tp_size > 1
* add default rdma_nics
* add utils
* up
* fix error
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-25 16:30:35 +08:00
ophilia-lee
99258e19c8
[Benchmark]支持Completions接口 ( #5700 )
...
* benchmark工具支持受限解码场景指定response_format
* Update backend_request_func.py
output.success判断兼容思考内容超长截断时回复内容为空的情况
* Update benchmark_serving.py
更新benchmark_metrics
* 支持Completions接口
* 支持Completions接口
* 支持Completions接口
* [Benchmark]支持Completions接口
* [Benchmark]支持Completions接口
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-23 19:46:23 +08:00
Zhang Yulong
48f3e9797e
Update backend_request_func.py ( #5633 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-18 16:21:34 +08:00
Zhang Yulong
c89a62e550
Update backend_request_func.py ( #5631 )
2025-12-18 14:20:17 +08:00
Zhang Yulong
f45c131ddf
update ( #5625 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-17 21:38:14 +08:00
xiegegege
97e340eb14
[CE]add pd router and wint4 tp4 config ( #5554 )
2025-12-15 15:25:14 +08:00
tianlef
13cc7dacfd
[Doc]add text/vl cinn ce config ( #5532 )
2025-12-12 16:16:06 +08:00
Zhang Yulong
510b82173a
[Benchmark] Update benchmark ( #5496 )
...
* update benchmark
* update benchmark
2025-12-11 11:53:12 +08:00
SunLei
5fb93d84f5
[Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools ( #5418 )
...
* feat(fmq): add ZMQ-based FMQ implementation and benchmark tools
* move FMQ_CONFIG_JSON to envs
* fix top_p_candidates (#5400 )
Co-authored-by: freeliuzc <lzc842650834@gmail.com >
* [RL] Support Rollout Routing Replay (#5321 )
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
* [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374 )
* fix multi-inputs
* fix threshold
* fix threshold
* fix
* [BugFix]remove _execute_empty_input (#5396 )
* Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 )
This reverts commit 96d2d4877b .
* [New][RL] Support Rollout Routing Replay (#5405 )
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
* Revert "Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 )"
This reverts commit c45e064f3d .
* Fix XPU and NPU bug
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
* bf16 deepseek (#5379 )
* fix deepseek (#5410 )
* Update tests/inter_communicator/test_fmq_factory.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update benchmarks/benchmark_fmq.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/inter_communicator/fmq.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: GoldPancake <56388518+Deleter-D@users.noreply.github.com >
Co-authored-by: freeliuzc <lzc842650834@gmail.com >
Co-authored-by: RAM <gstian5555@outlook.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com >
Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
2025-12-08 22:04:49 +08:00
xiegegege
b7e1e6c953
[CE]change yaml name
2025-12-04 19:14:11 +08:00
tianlef
04d35ace5e
[CE]add wint4 ep ( #5355 )
2025-12-03 15:17:47 +08:00
Zhang Yulong
5b49142988
update ( #5298 )
2025-11-28 18:29:16 +08:00
xiegegege
eae34a416c
[benchmark]add qwen3-235b pd+ep yaml ( #5225 )
2025-11-25 19:53:30 +08:00
tianlef
de43577a7c
[Docs] add ebvlthinking yaml ( #5120 )
2025-11-19 15:27:46 +08:00
Zhang Yulong
83532e1d01
[Benchmark] Enhance benchmark output logging ( #4682 )
...
* Enhance benchmark output logging
Add print statements to display the number of discarded outputs before and after filtering.
* Update benchmark_serving.py
2025-11-06 16:53:31 +08:00
Juncai
08ca0f6aea
[Feature] [PD] add simple router and refine splitwise deployment ( #4709 )
...
* add simple router and refine splitwise deployment
* fix
2025-11-06 14:56:02 +08:00
zhang-prog
4c2ad15258
add paddleocr_vl benchmark ( #4833 )
...
* add paddleocr_vl benchmark
* fix
* fix
* fix
* fix
2025-11-05 19:37:45 +08:00
ophilia-lee
412097c1b8
benchmark工具支持受限解码场景指定response_format ( #4718 )
2025-10-31 12:26:24 +08:00
Ryan
28de91b50f
[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B ( #4645 )
...
* 45TVL support sot+CUDAGraph
* mv unitest from ce_deploy 2 e2e
* add test_EB_VL_Lite_sot_serving
* rm useless line
* add openai_client
* fix unitest && reduce computing resources
2025-10-31 11:38:43 +08:00
kxz2002
a2870ed4a9
[Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” ( #4668 )
...
* parser register name unify
* change ernie_x1 to ernie-x1
* change ernie4_5_vl to ernie-45-vl
* fix unit test
2025-10-31 10:45:27 +08:00
xjkmfa
19df1aec2b
[Docs] add Qwen25vl yaml ( #4662 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* 【CE】add qwen25-vl
* 【CE】add qwen25-vl
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2025-10-29 17:39:40 +08:00
RAM
86d5006a57
[Graph Optimization][Speculative Decoding] Update yaml and fix typo ( #4612 )
2025-10-28 11:43:26 +08:00
ophilia-lee
70aa7423f8
benchmark工具适配SGLang框架 ( #4607 )
...
* benchmark工具适配SGLang框架
* benchmark工具适配SGLang框架
* benchmark工具适配SGLang框架
2025-10-27 18:52:56 +08:00
tianlef
2676a918f0
[Doc]fix deepseek ce ( #4560 )
2025-10-23 14:09:11 +08:00
tianlef
153f15db39
[Doc]add deepseek wint4 ce ( #4517 )
2025-10-21 16:41:51 +08:00