FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
jackyYang6	634d23a38a	[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow (#6934 ) * [Bugfix] Align thinking_budget behavior with ERNIE reasoning flow * [Docs] Fix thinking_budget markdown formatting * [Test] Align ernie thinking budget test with process_request_dict	2026-03-23 14:15:55 +08:00
luukunn	33e79f922a	[Optimization]Optimize CPU utilization (#6950 ) * Optimize CPU utilization	2026-03-22 23:02:39 +08:00
luukunn	f4a79d4c00	[Optimization]Unified data processing for online and offline (#6891 ) * remove process_request * fix chat * fix unit test * remove process response * fix unit test * fix offline decode * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix sampling_params --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-19 21:56:09 +08:00
sunxin	33e01f22a8	[Feature][Sampling] Extend top-k_top-p sampling to all backends and unify greedy decoding with top_k=1 (#6894 ) * update sampling * fix * fix * fix mtp * fix test	2026-03-19 01:43:10 -07:00
gongweibao	fb6c56dfd5	[BugFix][DataProcessor] Force top_k=1 for greedy decoding when temperature=0 (#6748 ) * [BugFix] Force top_k=1 for greedy decoding when temperature=0 When temperature is set to 0 (greedy decoding), only setting temperature to a small epsilon is insufficient — the sampling kernel may still pick non-top-1 tokens. Explicitly set top_k=1 in all processors to guarantee argmax behavior. Additionally, add argmax fast-path in top_k_top_p_sampling() under FD_DETERMINISTIC_MODE to handle non-rejection sampling backends that ignore top_k parameter. * Extract greedy decoding from FD_DETERMINISTIC_MODE guard top_k=1 → argmax is a correctness optimization, not deterministic-specific. Remove the FD_DETERMINISTIC_MODE guard so all-greedy fast-path and mixed-batch override work unconditionally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update test_torch_model.py --------- Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-18 17:36:43 +08:00
luukunn	fe8d58a094	[Optimization]update request in tool parser&reasoning parser (#6858 ) * update request in tool parser&reasoning parser	2026-03-17 11:51:12 +08:00
gongweibao	a6351dea0b	[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 ) * init * init * fix format * add * add files * add ut * fix some * add ut * add more * add * fix pre-commit * fix pre-commit * fix cover * skip long seq * add * add * fix * remove not need * fix set attr * fix comments * fix comments * fix failed tests --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-16 21:32:43 +08:00
CSWYF3634076	97a4b3631e	[Processor]add qwen3vl prompt_token_ids support (#6764 ) * [Processor]add qwen3vl prompt_token_ids support * [Processor]add qwen3vl prompt_token_ids support unittest * [Processor]add qwen3vl prompt_token_ids support precommit	2026-03-11 15:08:56 +08:00
bukejiyu	cffa8c246c	[Others]update paddleformer 1.0.0 (#6496 ) * update paddleformer 1.0.0 * update	2026-03-11 15:06:29 +08:00
gongweibao	1e49855b0f	[BugFix][DataProcessor] Add validate_model_path to fail fast on bad model path or unreachable network (#6713 ) * fix * add more endpoint * fix some --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-08 12:36:32 +08:00
luukunn	caf73e8131	[Feature]add reasoning effort (#6656 ) * add reasoning_effort * fix log * fix reasoning_effort * add reasoning_effort level * fix valid_parameters * fix valid_parameters * fix * fix unit test * add unit test * add unit test	2026-03-06 14:16:02 +08:00
Yuanle Liu	6d3fede240	[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 ) * Initial plan * Migrate PRs #6311, #6129, #6305 to develop and merge unit tests Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix * update * fix * fix ci * fix ci * Initial plan * test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add disable-thinking case to test_chat_with_response_max_tokens Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add both reasoning_max_tokens and response_max_tokens case Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix ci * fix ci * fix ci --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2026-02-25 21:36:50 +08:00
Wanglongzhi2001	14ea7243e1	[Feature] support mm_processor_kwargs for flexible model	2026-02-25 14:34:33 +08:00
jackyYang6	a29ee57e15	[Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367 ) * feat: add thinking budget logits processor * add unittest * fix pre-commit * add unittest * docs: clarify operator-level vs logits processor usage and conflict guidance --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-25 14:17:09 +08:00
jackyYang6	38c3e02470	fix paddleformers fallback (#6465 )	2026-02-23 15:29:13 +08:00
kevin	d60daca4a8	[Feature] consider multimodal model when dummy run (#6045 ) * add mm do profile * updata code * update code * update code * update code * update test case * update code * update code * fix xpu bug * update code * add mm do profile * update test case * update code	2026-02-09 17:49:55 +08:00
luukunn	765df94e6c	[Optimization]update prompt & prompt_token_ids (#6334 ) * fix prompt * add unit test * add unit test * fix	2026-02-04 20:08:01 +08:00
ApplEOFDiscord	6563b8307c	[Bug Fix] fix tokenizer oom (#6287 ) * fix tokenizer oom * fix unit test	2026-02-03 11:27:11 +08:00
luukunn	8635d8880d	bug fix tool_calls (#6166 )	2026-01-23 10:49:27 +08:00
luukunn	6b968a76f1	【Optimization】update data_processor & add tool parser plugins (#6096 ) * update data_processor * fix unit test * fix unit test * add unit test * add tool parser plugins * fix tool call * fix tool call * fix tool call * fix unit test * fix unit test * add unit test * fix unit test * fix unit test * fix unit test	2026-01-22 17:17:32 +08:00
kxz2002	6e416c62dd	[Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494 ) * to_request_for_infer initial commit * refact to from_chat_completion_request * preprocess use request initial commit * bugfix * processors refact to using request * bug fix * refact Request from_generic_request * post process initial commit * bugfix * postprocess second commit * bugfix * serving_embedding initial commit * serving_reward initial commit * bugfix * replace function name * async_llm initial commit * offline initial commit and fix bug * bugfix * fix async_llm * remove add speculate_metrics into data * fix logprobs bug * fix echo bug * fix bug * fix reasoning_max_tokens * bugfix * bugfix and modify unittest * bugfix and modify unit test * bugfix * bugfix * bugfix * modify unittest * fix error when reasong_content is none for text_processor * remove some unnessary logic * revert removed logic * implement add and set method for RequestOutput and refact code * modify unit test * modify unit test * union process_request and process_request_obj * remove a unit test * union process_response and process_response_obj * support qwen3_vl_processor * modify unittest and remove comments * fix prompt_logprobs * fix codestyle * add v1 * v1 * fix unit test * fix unit test * fix pre-commit * fix * add process request * add process request * fix * fix * fix unit test * fix unit test * fix unit test * fix unit test * fix unit test * remove file * add unit test * add unit test * add unit test * fix unit test * fix unit test * fix * fix --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com> Co-authored-by: luukunn <981429396@qq.com> Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>	2026-01-22 00:50:52 +08:00
CSWYF3634076	e6cdea4492	[Models] Qwen3VL and Qwen3VL-Moe CUDA graph Support (#5962 ) * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v2 * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v3	2026-01-09 17:09:02 +08:00
CSWYF3634076	d8fcb7c07d	[Models] Add Qwen3-VL Moe Model Support (#5913 ) * [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest	2026-01-08 11:36:42 +08:00
CSWYF3634076	deb9698ac5	remove invalid elif branch (#5821 )	2025-12-29 19:21:28 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
memoryCoderC	be3be4913a	[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195 ) * [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM * [Optimization] refactor(chat_handler,completion_handler): rename class	2025-12-25 16:28:15 +08:00
megemini	111955ec0c	[BugFix] 移除重复的 PaddleOCRVLProcessor 初始化代码	2025-12-17 18:58:02 +08:00
luukunn	fbc9bce1e9	[Feature]Optimization of Thinking Pattern Framework (#4302 ) * add model status in vl * add x1 parser * add model_status * fix parser * fix parser * fix parser * fix parser * Revert "fix parser" This reverts commit `300f446d8a`. * fix parser * fix * fix * fix * fix * fix parser * fix unit test * fix unit test * add unit test * fix * fix * add unit test * fix unit test * add unit test * add unit test * fix unit test * fix unit test * fix bug * fix unit test * x1 tool parser * fix unit test * fix unit test * fix unit test * fix n * fix unit test * add unit test * add unit test * remove pring	2025-12-10 16:17:06 +08:00
ming1753	7c72383efa	[BugFix] fix decode time sleep bug (#5461 ) * [BugFix] fix decode time sleep bug * format	2025-12-10 15:48:48 +08:00
lizexu123	95eab9f9ee	[Feature] support stop_token_ids (#5399 ) * support stop_token_ids * fix * delete chinese * support both * delete print	2025-12-09 17:49:12 +08:00
zhouchong	5d9b5e4a5b	[Engine] [Feature] Refactor async_llm:cross-process with EngineService，based on zmq communication (#4868 ) * Refactor async_llm:cross-process with EngineService * fix: async_llm output process * fix: return prompt_token_ids and prompt_tokens in first res * optimize common_engine start func	2025-12-09 10:53:40 +08:00
lizexu123	d4979347ca	[Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374 ) * fix multi-inputs * fix threshold * fix threshold * fix	2025-12-05 20:18:17 +08:00
lizexu123	946025480e	[Bug fix] fix pooling models (#5358 ) * fix * fix * fix test * fix gpu_model_runner --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-04 11:06:30 +08:00
ming1753	70ec1e17c1	[Features] add audio request & fix embedding bug (#5201 ) * [Features] add audio request & fix embedding bug * fix bug	2025-12-01 11:12:17 +08:00
kxz2002	bc118c3d2d	fix prompt_token_ids is None in request dict (#5241 )	2025-11-26 17:10:45 +08:00
kxz2002	2d787590c4	[Feature] The 45VL supports prompt_token_ids + messages input. (#5148 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support prompt_token_ids + messages * fix bug * refact code structure * support cache mm items * refact code structure * delete test cases * modify unit test * add unit test * add unit test * fix append * add check for messages	2025-11-25 23:11:44 +08:00
yangjianfengo1	af715db763	[Scheduler] Support chunk prefill for video input (#5107 ) * add video chunk prefill * add vit_merge=True for test_tokenizer_client.py --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-11-20 16:29:13 +08:00
LiqinruiG	a5cd7c9039	[BugFix] rollback max_tokens and min_tokens when continue to infer (#5082 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * [BugFix] rollback max_tokens and min_tokens when continue to infer * [BugFix] rollback max_tokens and min_tokens when continue to infer * [fix] add more logger info: max_tokens --------- Co-authored-by: liqinrui <liqinrui@baidu.com>	2025-11-19 18:43:42 +08:00
LiqinruiG	33f96ff93a	[BugFix] rollback max_tokens and min_tokens when continue to infer (#5052 ) Co-authored-by: liqinrui <liqinrui@baidu.com>	2025-11-17 14:31:26 +08:00
kxz2002	9703108c28	[BugFix] adjust max_tokens and min_tokens when continue to generate tokens (#5010 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * fix max and min tokens initial commit * fix double subtraction * add unit tests	2025-11-13 23:52:54 +08:00
Yuanle Liu	3dc0ffa46d	[TSP] Support qwen3 moe tsp + cudagraph (#4871 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support qwen3_moe tsp mode * fix * fix * update * update * update * fix * support external_rmsnorm * update * fix	2025-11-10 23:37:51 +08:00
Haonan Luo	2c281e617c	Update Unit Test for PaddleOCR-VL (#4802 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix paddleocr prefix cache bug * add test for paddleocr_vl * disable prefix-caching in ocr * add test for paddleocr_vl * Fix top_p for rejection sampling * add test for ocr processor; fix top_p for rejection sampling * add test for ocr processor; fix top_p for rejection sampling * add test for ocr processor; fix top_p for rejection sampling * add test for ocr processor; fix top_p for rejection sampling * add test for ocr processor; fix top_p for rejection sampling --------- Co-authored-by: ming1753 <ideaminghp@163.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2025-11-04 22:40:15 +08:00
kxz2002	8a40374bfe	[BugFix] Fix ernie4_5_vl_processor.py and qwen_vl_processor.py can not disable thinking (#4762 ) * fix ernie4_5_vl_processor.py and qwen_vl_processor.py * add unit test	2025-11-04 16:00:32 +08:00
luukunn	7b35488779	【DataProcessor】add options thinking_mode (#4735 ) * add thinking_mode * add thinking_mode * add thinking_mode * add thinking_mode * add thinking_mode * add thinking_mode * add unit test	2025-11-03 14:30:07 +08:00
kxz2002	7dc9d9885e	[BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686 ) * fix enable_thinking * recover ernie4_5_vl_processor	2025-10-30 19:45:41 +08:00
Haonan Luo	d7d0112bbf	[CI] Add test for paddleocr_vl (#4627 )	2025-10-30 13:40:04 +08:00
ApplEOFDiscord	14f8cddaf1	[Feature] add mm token usage (#4570 ) * add mm token usage * fix unit test * fix unit test * fix unit test * fix model path * fix unit test * fix unit test * fix unit test * remove uncomment * change var name * fix code style * fix code style * fix code style * fix code style * fix unit test	2025-10-29 14:37:12 +08:00
ming1753	561b9f38d3	[BugFix] fix paddleocr prefix cache bug (#4625 ) * fix paddleocr prefix cache bug * disable prefix-caching in ocr	2025-10-28 21:38:12 +08:00
ming1753	7681375a19	[BugFix] PaddleOCR-VL fix FD_DEBUG type and support v1 loader (#4605 ) * [Bug Fix] PaddleOCRVL fix FD_DEBUG type and support HF model * fix bug * fix bug * fix bug	2025-10-28 09:47:47 +08:00
kevin	8aab4e367f	[Feature] mm support prefix cache (#4134 ) * support mm prefix caching * update code * fix mm_hashes * support encoder cache * add encoder cache * update code * update encoder cache * fix features bug * fix worker bug * support processor cache, need to optimize yet * refactor multimodal data cache * update code * update code * update v1 scheduler * update code * update code * update codestyle * support turn off processor cache and encoder cache * update pre-commit * fix code * solve review * update code * update code * update test case * set processor cache in GiB * update test case * support mm prefix caching for qwen model * fix code style check * update pre-commit * fix unit test * fix unit test * add ci test case * fix rescheduled bug * change text_after_process to prompt_tokens * fix unit test * fix chat template * change model path * [EP] fix adapter bugs (#4572) * Update expert_service.py * Update common_engine.py * Update expert_service.py * fix v1 hang bug (#4573) * fix import image_ops error on some platforms (#4559) * [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558) * add collect-env * del files * [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578) * add new branch for sot * reorder * fix batch bug * [XPU]Moe uses a new operator (#4585) * [XPU]Moe uses a new operator * [XPU]Moe uses a new operator * update response * [Feature] Support Paddle-OCR (#4396) * init * update code * fix code style & disable thinking * adapt for common_engine.update_mm_requests_chunk_size * use 3d rope * use flash_attn_unpadded * opt siglip * update to be compatible with the latest codebase * fix typo * optim OCR performance * fix bug * fix bug * fix bug * fix bug * normlize name * modify xpu rope * revert logger * fix bug * fix bug * fix bug * support default_v1 * optim performance * fix bug --------- Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com> Co-authored-by: zhangyue66 <zhangyue66@baidu.com> * [DataProcessor] add reasoning_tokens into usage info (#4520) * add reasoning_tokens into usage info initial commit * add unit tests * modify unit test * modify and add unit tests * fix unit test * move steam usage to processor * modify processor * modify test_logprobs * modify test_logprobs.py * modify stream reasoning tokens accumulation * fix unit test * perf: Optimize task queue communication from engine to worker (#4531) * perf: Optimize task queue communication from engine to worker * perf: get_tasks to numpy * perf: get_tasks remove to_numpy * fix: request & replace ENV * remove test_e2w_perf.py * fix code style --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Clean up ports after processing results (#4587) * [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593) * [Others] api server exits when worker process is dead (#3271) * [fix] fix terminal hangs when worker process is dead * [chore] change sleep time of monitor * [chore] remove redundant comments * update docs --------- Co-authored-by: ApplEOFDiscord <wwy640130@163.com> Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: yinwei <yinwei_hust@163.com> Co-authored-by: JYChen <zoooo0820@qq.com> Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com> Co-authored-by: Ryan <zihaohuang@aliyun.com> Co-authored-by: yyssys <atyangshuang@foxmail.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com> Co-authored-by: zhangyue66 <zhangyue66@baidu.com> Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com> Co-authored-by: SunLei <sunlei5788@gmail.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com>	2025-10-27 17:39:51 +08:00

1 2 3

107 Commits