[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493)

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

* Initial plan

* Migrate PRs #6311, #6129, #6305 to develop and merge unit tests

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix

* update

* fix

* fix ci

* fix ci

* Initial plan

* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add disable-thinking case to test_chat_with_response_max_tokens

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add both reasoning_max_tokens and response_max_tokens case

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix ci

* fix ci

* fix ci

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

This commit is contained in:

Yuanle Liu

2026-02-25 21:36:50 +08:00

committed by

GitHub

parent e18397134a

commit 6d3fede240

38 changed files with 771 additions and 1690 deletions

									
										tests/distributed/chunked_moe.py
									
		+1
		
												View File
												
				@@ -58,6 +58,7 @@ class MockModelConfig:

				    rope_theta = 1000

				    partial_rotary_factor = 0.5

				    architectures = ["mock"]

				    think_truncate_prompt_ids = [-1]

				class MockCacheConfig: