Commit Graph

78 Commits

Author SHA1 Message Date
Zhang Yulong 30db3e9d8f [benchmark] update tools (#7512) 2026-04-20 19:40:17 +08:00
Zhang Yulong 738c658c54 [Benchmark] Update seed argument handling in benchmark_serving.py (#7356) 2026-04-13 16:05:50 +08:00
Zhang Yulong 7614175e13 Disable fixed random seed in benchmark_dataset.py (#7263)
Commented out the random seed initialization to allow for varied randomness in benchmarks.
2026-04-10 13:56:14 +08:00
Zhang Yulong f422f835e8 [benchmark] update tools (#7211) 2026-04-07 16:25:44 +08:00
xiegegege 209e5cf7f4 [CE]add 21b mooncake yaml (#7033)
* [CE]add 21b cpu cache ,glm mtp,glm for rl config

* [CE]add 21b tp2 yaml

* [CE]add 21b mooncake yaml

* add fastdeploy benchmark,paddletest-155

* [CE] adjust vl wint4 config

* [CE]add glm mtp with updatemodel config

* [CE]fix

* fix

* test

* test

* test

---------

Co-authored-by: xiegegege <>
2026-03-26 20:01:05 +08:00
Zhang Yulong 6f5aa883f7 [benchmark] update benchmark tools (#6991)
* [benchmark] update benchmark tools

* [benchmark] update benchmark tools
2026-03-24 20:56:27 +08:00
Zhang Yulong 2b10ebc1f1 [benchmark] Refactor debug logging and payload handling (#6949)
* Refactor debug logging and payload handling

* Update backend_request_func.py
2026-03-20 15:04:10 +08:00
Zhang Yulong 3a4e139f65 [Benchmark] fix multi turn (#6948) 2026-03-20 13:22:30 +08:00
xjkmfa 3b203994e2 [Benchmark] Update Qwen3 vl 32k yaml (#6946) 2026-03-20 11:48:53 +08:00
xjkmfa a81116ad90 [Benchmark] Update Qwen3 vl dense yaml (#6945) 2026-03-20 11:26:47 +08:00
Zhang Yulong 051bbbeead [Benchmark] Update backend_request_func.py (#6575) 2026-02-28 19:51:55 +08:00
Zhang Yulong ce8123cb7f [Benchmark] Update backend_request_func.py (#6566) 2026-02-28 14:54:30 +08:00
Zhang Yulong ff20a3cc02 [benchmark] update tool call (#6519) 2026-02-26 17:06:54 +08:00
Zhang Yulong 96bfa0d5b9 [benchmark] Update benchmark_serving.py (#6467) 2026-02-11 20:10:46 +08:00
Zhang Yulong 02c61f8346 [Benchmark] Update backend_request_func.py (#6441) 2026-02-10 19:58:50 +08:00
Zhang Yulong 66c9e11998 [benchmark] update tools (#6437) 2026-02-10 17:48:55 +08:00
Zhang Yulong 26ba019e66 Update README.md (#6343) 2026-02-04 15:57:18 +08:00
Zhang Yulong 16d03c3127 update (#6335) 2026-02-03 21:59:32 +08:00
xiegegege 51c6fa8afc [CE]add 21b cpu cache ,glm mtp,glm for rl config (#6328) 2026-02-03 20:10:47 +08:00
xjkmfa e27a7cc5b0 [Benchmark] Ce qwen3 vl (#6288)
* [CE]qwen3-vl
2026-02-03 14:17:28 +08:00
ophilia-lee 1705d0af7a [benchmark]支持SGLang/VLLM获取cached tokens (#6240)
* benchmark工具支持受限解码场景指定response_format

* Update backend_request_func.py

output.success判断兼容思考内容超长截断时回复内容为空的情况

* Update benchmark_serving.py

更新benchmark_metrics

* 支持Completions接口

* 支持Completions接口

* 支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]async_request_eb_openai_completions 调大aiohttp 默认读 buffer size至4M,解决streaming 返回块过大报Chunk too big问题

* [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题

* [Benchmark]支持获取vLLM/SGLang cached_tokens

[Benchmark]支持获取vLLM/SGLang cached_tokens

* [benchmark]支持SGLang/VLLM获取cached tokens

[benchmark]支持SGLang/VLLM获取cached tokens

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-27 14:57:20 +08:00
xiegegege e22c4e29bb [CE]add paddleocr config yaml (#6097) 2026-01-19 20:07:42 +08:00
jc e911ac2ce7 [BugFix] Refine the preparation of cpu and storage cache (#5777)
* Refine the preparation of cpu and storage cache

* fix error

* fix error

* up

* fix

* up docs

* fix unittest

* remove debug info
2026-01-05 10:13:30 +08:00
Zhang Yulong 2da32f2a35 Update benchmark_serving.py (#5861) 2026-01-04 20:07:56 +08:00
ophilia-lee d5f5dc4f6e [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题 (#5771)
* benchmark工具支持受限解码场景指定response_format

* Update backend_request_func.py

output.success判断兼容思考内容超长截断时回复内容为空的情况

* Update benchmark_serving.py

更新benchmark_metrics

* 支持Completions接口

* 支持Completions接口

* 支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]async_request_eb_openai_completions 调大aiohttp 默认读 buffer size至4M,解决streaming 返回块过大报Chunk too big问题

* [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 19:36:11 +08:00
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
ophilia-lee 99258e19c8 [Benchmark]支持Completions接口 (#5700)
* benchmark工具支持受限解码场景指定response_format

* Update backend_request_func.py

output.success判断兼容思考内容超长截断时回复内容为空的情况

* Update benchmark_serving.py

更新benchmark_metrics

* 支持Completions接口

* 支持Completions接口

* 支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]支持Completions接口

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-23 19:46:23 +08:00
Zhang Yulong 48f3e9797e Update backend_request_func.py (#5633)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-18 16:21:34 +08:00
Zhang Yulong c89a62e550 Update backend_request_func.py (#5631) 2025-12-18 14:20:17 +08:00
Zhang Yulong f45c131ddf update (#5625)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-17 21:38:14 +08:00
xiegegege 97e340eb14 [CE]add pd router and wint4 tp4 config (#5554) 2025-12-15 15:25:14 +08:00
tianlef 13cc7dacfd [Doc]add text/vl cinn ce config (#5532) 2025-12-12 16:16:06 +08:00
Zhang Yulong 510b82173a [Benchmark] Update benchmark (#5496)
* update benchmark

* update benchmark
2025-12-11 11:53:12 +08:00
SunLei 5fb93d84f5 [Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools (#5418)
* feat(fmq): add ZMQ-based FMQ implementation and benchmark tools

* move FMQ_CONFIG_JSON to envs

* fix top_p_candidates (#5400)

Co-authored-by: freeliuzc <lzc842650834@gmail.com>

* [RL] Support Rollout Routing Replay (#5321)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374)

* fix multi-inputs

* fix threshold

* fix threshold

* fix

* [BugFix]remove _execute_empty_input (#5396)

* Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)

This reverts commit 96d2d4877b.

* [New][RL] Support Rollout Routing Replay (#5405)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* bf16 deepseek (#5379)

* fix deepseek (#5410)

* Update tests/inter_communicator/test_fmq_factory.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update benchmarks/benchmark_fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/inter_communicator/fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: GoldPancake <56388518+Deleter-D@users.noreply.github.com>
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
Co-authored-by: RAM <gstian5555@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
2025-12-08 22:04:49 +08:00
xiegegege b7e1e6c953 [CE]change yaml name 2025-12-04 19:14:11 +08:00
tianlef 04d35ace5e [CE]add wint4 ep (#5355) 2025-12-03 15:17:47 +08:00
Zhang Yulong 5b49142988 update (#5298) 2025-11-28 18:29:16 +08:00
xiegegege eae34a416c [benchmark]add qwen3-235b pd+ep yaml (#5225) 2025-11-25 19:53:30 +08:00
tianlef de43577a7c [Docs] add ebvlthinking yaml (#5120) 2025-11-19 15:27:46 +08:00
Zhang Yulong 83532e1d01 [Benchmark] Enhance benchmark output logging (#4682)
* Enhance benchmark output logging

Add print statements to display the number of discarded outputs before and after filtering.

* Update benchmark_serving.py
2025-11-06 16:53:31 +08:00
Juncai 08ca0f6aea [Feature] [PD] add simple router and refine splitwise deployment (#4709)
* add simple router and refine splitwise deployment

* fix
2025-11-06 14:56:02 +08:00
zhang-prog 4c2ad15258 add paddleocr_vl benchmark (#4833)
* add paddleocr_vl benchmark

* fix

* fix

* fix

* fix
2025-11-05 19:37:45 +08:00
ophilia-lee 412097c1b8 benchmark工具支持受限解码场景指定response_format (#4718) 2025-10-31 12:26:24 +08:00
Ryan 28de91b50f [Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B (#4645)
* 45TVL support sot+CUDAGraph

* mv unitest from ce_deploy 2 e2e

* add test_EB_VL_Lite_sot_serving

* rm useless line

* add openai_client

* fix unitest && reduce computing resources
2025-10-31 11:38:43 +08:00
kxz2002 a2870ed4a9 [Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668)
* parser register name unify

* change ernie_x1 to ernie-x1

* change ernie4_5_vl to ernie-45-vl

* fix unit test
2025-10-31 10:45:27 +08:00
xjkmfa 19df1aec2b [Docs] add Qwen25vl yaml (#4662)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* 【CE】add qwen25-vl

* 【CE】add qwen25-vl

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2025-10-29 17:39:40 +08:00
RAM 86d5006a57 [Graph Optimization][Speculative Decoding] Update yaml and fix typo (#4612) 2025-10-28 11:43:26 +08:00
ophilia-lee 70aa7423f8 benchmark工具适配SGLang框架 (#4607)
* benchmark工具适配SGLang框架

* benchmark工具适配SGLang框架

* benchmark工具适配SGLang框架
2025-10-27 18:52:56 +08:00
tianlef 2676a918f0 [Doc]fix deepseek ce (#4560) 2025-10-23 14:09:11 +08:00
tianlef 153f15db39 [Doc]add deepseek wint4 ce (#4517) 2025-10-21 16:41:51 +08:00