Zero Rains
72bf3dbdfd
[KSM] support keep sampling mask ( #7146 )
...
* [KSM] support keep sampling mask
* Remove Comments
* remove logz_per_batch
* fix the description and checking
2026-04-02 20:30:54 -07:00
qwes5s5
ab185f07f8
test_abort ( #6890 )
2026-03-17 20:50:28 +08:00
Yonghua Li
59ae9576d2
[Cherry-Pick] [BugFix] Fix inaccurate cache hit rate and TTFT after request preemption ( #6626 )
...
* [chore] add has_been_rescheduled flag for requests
* [refactor] rename reschedule to preempted for accuracy and fix cache hit metrics
* [chore] add ttft_s
2026-03-05 16:25:07 +08:00
wangyifei
b7c5daa316
[RL] add pause, update_weights, resume interface for async RL ( #6052 )
...
* support dynamic run_control_request through zmq from apiserver to common_engine
* support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method
* change /is_puased from HTTP POST method to GET method
* add pause、resume、is_paused implementation
* support engine <==> worker communication(request&response)
* support sync weights through RDMA from checkpoint_transfer
* support specified version, rsync_config in update_weights rpc call
* add pause, update_weights, resume interface for async RL
* bug fix: update_weights support using default arguments
* fix typo
* typo fix
* typo fix
* typo fix
* add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all
* add "rsync" to LoadConfig.load_strategy Literal type hints
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* typo fix
* typo fix
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* check version/rsync params
* add error log when version.txt not exists
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* raise specified ValueError when paramters check failed
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* tp barrier after run_control_method
* encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue
* typo fix
* typo fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-01-23 10:18:07 +08:00
kxz2002
6e416c62dd
[Optimization] The pre- and post-processing pipeline do not perform dict conversion ( #5494 )
...
* to_request_for_infer initial commit
* refact to from_chat_completion_request
* preprocess use request initial commit
* bugfix
* processors refact to using request
* bug fix
* refact Request from_generic_request
* post process initial commit
* bugfix
* postprocess second commit
* bugfix
* serving_embedding initial commit
* serving_reward initial commit
* bugfix
* replace function name
* async_llm initial commit
* offline initial commit and fix bug
* bugfix
* fix async_llm
* remove add speculate_metrics into data
* fix logprobs bug
* fix echo bug
* fix bug
* fix reasoning_max_tokens
* bugfix
* bugfix and modify unittest
* bugfix and modify unit test
* bugfix
* bugfix
* bugfix
* modify unittest
* fix error when reasong_content is none for text_processor
* remove some unnessary logic
* revert removed logic
* implement add and set method for RequestOutput and refact code
* modify unit test
* modify unit test
* union process_request and process_request_obj
* remove a unit test
* union process_response and process_response_obj
* support qwen3_vl_processor
* modify unittest and remove comments
* fix prompt_logprobs
* fix codestyle
* add v1
* v1
* fix unit test
* fix unit test
* fix pre-commit
* fix
* add process request
* add process request
* fix
* fix
* fix unit test
* fix unit test
* fix unit test
* fix unit test
* fix unit test
* remove file
* add unit test
* add unit test
* add unit test
* fix unit test
* fix unit test
* fix
* fix
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
Co-authored-by: luukunn <981429396@qq.com >
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
2026-01-22 00:50:52 +08:00
qwes5s5
b2a2e11551
[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. ( #5320 )
...
* request disconnect
* request disconnect
* fix bug
* fix bug--amend
---------
Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com >
2026-01-16 11:46:13 +08:00
jc
e911ac2ce7
[BugFix] Refine the preparation of cpu and storage cache ( #5777 )
...
* Refine the preparation of cpu and storage cache
* fix error
* fix error
* up
* fix
* up docs
* fix unittest
* remove debug info
2026-01-05 10:13:30 +08:00
Juncai
412867fd99
[Feature] Support KV Cache Storage ( #5571 )
...
* Support Mooncake Store
* up
* up
* add op
* fix conflict
* fix error
* up for comments
* avoid thread lock
* up
* fix unittest
* fix unittest
* remove debug info
* consider tp_size > 1
* add default rdma_nics
* add utils
* up
* fix error
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-25 16:30:35 +08:00
memoryCoderC
be3be4913a
[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM ( #5195 )
...
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM
* [Optimization] refactor(chat_handler,completion_handler): rename class
2025-12-25 16:28:15 +08:00
ming1753
85db9d5e56
[Others] reschedule preempt task support optional func ( #5649 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [Others] reschedule preempt task support optional func
* fix bug
* fix bug
2025-12-23 20:45:52 +08:00
xiaolei373
a30b4da260
[Feature] Tracing: Fine-Grained Tracing for Request Latency Part1 ( #5458 )
2025-12-16 16:36:09 +08:00
GoldPancake
909059c60a
[Feature] Support for request-level speculative decoding metrics monitoring. ( #5518 )
...
* support spec metrics monitor per request
* fix bug
* remove debug log
* fix ut bugs
2025-12-12 12:22:18 +08:00
chen
76649b45c1
[Optimization] compulte real max_logprobs in batch ( #5430 )
2025-12-09 14:15:05 +08:00
zhouchong
5d9b5e4a5b
[Engine] [Feature] Refactor async_llm:cross-process with EngineService,based on zmq communication ( #4868 )
...
* Refactor async_llm:cross-process with EngineService
* fix: async_llm output process
* fix: return prompt_token_ids and prompt_tokens in first res
* optimize common_engine start func
2025-12-09 10:53:40 +08:00
Daci
2f208db4e9
[Feature] Multimodal Model P / D Separation ( #5323 )
...
* RouterArgs port str -> int
* fix race condition [is_fetching] causing multiple fetch requests
* bugfix: Delete duplicate input_ids tensor creation
* mm pd splitwise json -> pickle5; multimodal_inputs only pos id;
debuglog f to %s
* fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...
* update cr
* Apply suggestions from code review
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* pre-commit fix
* rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-09 10:47:42 +08:00
Juncai
a8ffc22032
[BugFix] fix init RequestOutput ( #5419 )
...
* fix init RequestOutput
* up
* fix
* fix
2025-12-09 10:20:22 +08:00
Juncai
80efe98f8d
[PD Disaggregation] Add timestamp for analyzing splitwise deployment ( #5317 )
...
* Add timestamp for analyzing splitwise deployment
* up
* up
* up
* up
* up
* up
* fix format
* fix
2025-12-08 10:08:44 +08:00
kevin
c9d7f9e7c3
[BugFix] fix async download bug ( #5349 )
...
* fix async download bug
* update log
* Revert "update log"
This reverts commit 5816e602f4 .
* update code
* fix mtp bug
2025-12-05 18:59:12 +08:00
ming1753
5f8d4aedea
[Feature] support audio tts ( #5333 )
2025-12-03 21:06:48 +08:00
lizexu123
c563eca791
[Feature] support reward model ( #5301 )
...
* Your commit message here
* add test
* update develop
* support reward
* support enable_chunk_prefill
* support bingfa
* support convert is reward
* update test
* delete print
* fix enable_thinking
* add document
* fix place
* fix test
* fix
* support enable_prefix_caching
* add no-enable_prefix-caching test
* fix
* support enable_prefix_caching
* delete print
* fix document
* fix
* fix test
* fix document and delete chinese
* udpate
* enable_thinking
* fix test
2025-12-02 14:55:31 +08:00
qwes5s5
117980dd4e
[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. ( #5089 )
...
* add prompt logprobs
* Merge prompt_logprobs_tensors and prompt_logprobs
* fix param check
* trigger ci
* fix unitest
* fix logprobs bug
2025-12-02 13:49:51 +08:00
kevin
8aec3acc8c
fix mm type bug ( #5300 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-29 20:48:14 +08:00
SunLei
c424e08dc5
[Speculative Decoding] split draft_tokens into standalone post-processing path ( #5205 )
...
* refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs
* Restore Request.__repr__ implementation
* ci
* add envs
* fix unittest
2025-11-27 11:22:41 +08:00
Yonghua Li
cead6b26fa
[Metrics] Update time_to_first_token to include tokenization & queue time, and remove redundant metrics ( #4993 )
...
* [update] update time_to_first_tokens to include queue time, and remove first_token_latency and infer_latency
* [doc] update docs
* [ci] fix test
* [chore] delete redundant code
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-11-26 14:42:17 +08:00
kevin
df2be1cf16
[BugFix] fix mm_positions type error ( #5182 )
...
* fix mm_positions type error
* update code
* update code
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-11-25 19:28:18 +08:00
chenjian
3ea1b44a58
[Optimization] Improve perf for fd response token with internal adapter ( #4992 )
...
* [Optimize] Improve perf for fd response token with internal adapter
* fix
* fix bug
* fix ci
* fix ci
* fix ci
* fix ci
2025-11-21 19:02:03 +08:00
Jiang-Jia-Jun
d2298dcb0c
[Polish] Simplify __repr__ method in Request class ( #5153 )
...
Remove detailed string representation for Request class.
2025-11-21 17:21:06 +08:00
kevin
109d48e456
[Feature] support async download features ( #5003 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support async download features
* add test case
* update code
2025-11-19 22:23:36 +08:00
Juncai
36822fa49c
[PD Disaggregation] remove splitwise deployment on single node and refine the code ( #4891 )
...
* remove splitwise deployment on single node and refine the code
* up
* up
* up
* add test
* up
2025-11-14 09:56:53 +08:00
qwes5s5
a2d06118e1
[Logprobs]Support prompt_logprobs and max_logprobs ( #4897 )
...
* add prompt logprobs
* trigger ci
* fix unitest
* Update fastdeploy/config.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/entrypoints/llm.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/engine/sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix max_logprobs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-12 19:29:48 +08:00
chenjian
78895e2c7d
[Bug Fix] fix bug for PD EP ( #4823 )
...
* fix bug for PD EP
* fix
* optimize perf for engine worker queue
* fix bug
* fix internode ll two stage
* fix for ci
* fix bug
2025-11-10 15:33:29 +08:00
Juncai
08ca0f6aea
[Feature] [PD] add simple router and refine splitwise deployment ( #4709 )
...
* add simple router and refine splitwise deployment
* fix
2025-11-06 14:56:02 +08:00
chenjian
cc8f5312f5
[Feature] Add timestamp for profiler ( #4726 )
...
* [Feature] Add timestamp for profiler
* fix bug for offine inference
* fix for ci
* fix
* fix ci
2025-11-05 12:04:59 +08:00
kxz2002
7dc9d9885e
[BugFix] fix offline llm chat "enable_thinking" is always "False" ( #4686 )
...
* fix enable_thinking
* recover ernie4_5_vl_processor
2025-10-30 19:45:41 +08:00
ApplEOFDiscord
14f8cddaf1
[Feature] add mm token usage ( #4570 )
...
* add mm token usage
* fix unit test
* fix unit test
* fix unit test
* fix model path
* fix unit test
* fix unit test
* fix unit test
* remove uncomment
* change var name
* fix code style
* fix code style
* fix code style
* fix code style
* fix unit test
2025-10-29 14:37:12 +08:00
xiaolei373
14e7d88ea4
[feature] support reward api ( #4518 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: SunLei <sunlei5788@gmail.com >
2025-10-29 00:20:28 +08:00
kevin
8aab4e367f
[Feature] mm support prefix cache ( #4134 )
...
* support mm prefix caching
* update code
* fix mm_hashes
* support encoder cache
* add encoder cache
* update code
* update encoder cache
* fix features bug
* fix worker bug
* support processor cache, need to optimize yet
* refactor multimodal data cache
* update code
* update code
* update v1 scheduler
* update code
* update code
* update codestyle
* support turn off processor cache and encoder cache
* update pre-commit
* fix code
* solve review
* update code
* update code
* update test case
* set processor cache in GiB
* update test case
* support mm prefix caching for qwen model
* fix code style check
* update pre-commit
* fix unit test
* fix unit test
* add ci test case
* fix rescheduled bug
* change text_after_process to prompt_tokens
* fix unit test
* fix chat template
* change model path
* [EP] fix adapter bugs (#4572 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* fix v1 hang bug (#4573 )
* fix import image_ops error on some platforms (#4559 )
* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )
* add collect-env
* del files
* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578 )
* add new branch for sot
* reorder
* fix batch bug
* [XPU]Moe uses a new operator (#4585 )
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
* [Feature] Support Paddle-OCR (#4396 )
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
* [DataProcessor] add reasoning_tokens into usage info (#4520 )
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
* perf: Optimize task queue communication from engine to worker (#4531 )
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clean up ports after processing results (#4587 )
* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593 )
* [Others] api server exits when worker process is dead (#3271 )
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
* update docs
---------
Co-authored-by: ApplEOFDiscord <wwy640130@163.com >
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com >
Co-authored-by: Ryan <zihaohuang@aliyun.com >
Co-authored-by: yyssys <atyangshuang@foxmail.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com >
Co-authored-by: SunLei <sunlei5788@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com >
2025-10-27 17:39:51 +08:00
SunLei
dc1a9c7287
perf: Optimize task queue communication from engine to worker ( #4531 )
...
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-25 22:45:38 +08:00
RichardWooSJTU
5a8c60454e
[BugFix] Fix decode_type which has been deleted in req and optimize token client retry scheme ( #4564 )
2025-10-23 05:08:10 -07:00
SunLei
809c1ac7ec
feat: add post-processing step for pool_output ( #4462 )
...
* feat: add post-processing step for pool_output
* bugfix
* fix: test_serving_embedding
* fix test_request_to_batch_dicts
* fix: code style
2025-10-21 20:24:26 +08:00
SunLei
ee915220bd
[Speculative Decoding] Add draft_logprobs Support for Speculative Decode MTP ( #4467 )
...
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* fix: postprocess for speculative decode
* test: test_speculative_decoding_use_logprobs
* fix: test_completion_echo
* fix test_max_streaming_tokens
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-21 14:57:50 +08:00
chenjian
670aaa3f83
[Bug fix] Fix pd for x1 thinking ( #4433 )
2025-10-16 12:03:45 +08:00
SunLei
b4b579a7ed
Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. ( #4344 )
...
* feat: add OpenAIServing
* feat: add ZmqOpenAIServing & OpenAIServingEmbedding
* feat: Refine the basic ServingEngine class and introduce ServingContext
* fix: codestyle
* fix: request
* fix: pooling_params
* feat: _process_chat_template_kwargs
* feat: support batch request
* feat: pooling_params verify & default parameters
---------
Co-authored-by: sunlei1024 <sunlei1024@example.com >
2025-10-15 19:42:59 +08:00
chenjian
918ccdb123
[Feature] Support pd ep deployment with yiyan adapter ( #4029 )
...
* [Feature] Support mixed deployment with yiyan adapter in release2.2
* fix metrics
* add unit test
* add unit test
* add unit test
* Support pd ep deployment with yiyan adapter
* Support pd ep deployment with yiyan adapter
* refactor cache messager
* support scheduler v1 in PD
* suppport pd v1 + chunk prefill
* suppport pd v1 + chunk prefill
* add eplb
* support eplb
* support eplb
* support eplb
* support v1
* fix
* fix
* fix bug
* remove eplb support
* support prefix cache in P
* fix bug
* fix bug
* support one stop in V1
* fix bug
* fix ci
* fix ci
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-22 16:41:38 +08:00
RichardWooSJTU
91912cc2e1
fix t2i ( #4163 )
...
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-19 18:07:13 +08:00
RichardWooSJTU
f36a388ffe
fix response processsors ( #3826 )
...
* fix response processsors
* fix ci
* fix ut
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-04 16:01:25 +08:00
RichardWooSJTU
0989788b29
support extend block tables ( #3824 )
2025-09-04 14:39:04 +08:00
kevin
1908465542
[Feature] mm and thinking model support structred output ( #2749 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* mm support structured output
* update code
* update code
* update format
* update code
* update code
* add enable_thinking default
* update code
* add structured_outputs test case
* add ci install xgrammar
* add ci timeout time
* update test for structured_outputs
* update code
* add error traceback info
* update error msg
* update structred output code
* update code
* update code
* update config
* update torch version
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-02 16:21:09 +08:00
Yuanle Liu
4957908275
add input_processor plugin ( #3657 )
...
* add input_processor plugin
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2025-08-28 22:53:57 +08:00
luukunn
9c129813f9
[Feature] add custom chat template ( #3251 )
...
* add custom chat_template
* add custom chat_template
* add unittest
* fix
* add docs
* fix comment
* add offline chat
* fix unit test
* fix unit test
* fix
* fix pre commit
* fix unit test
* add unit test
* add unit test
* add unit test
* fix pre_commit
* fix enable_thinking
* fix pre commit
* fix pre commit
* fix unit test
* add requirements
2025-08-18 16:34:08 +08:00