Support keep sampling mask (#6725)

* naive version

* return list(int)

* fix bug: first_token's sampling mask miss

* pre-commit

* support mtp

* pre-commit

* fix ut

* fix zmq name conflits

* fix ut

* add ut

* fix ut timeout

* optimize performance

* fix

* support top_k mask

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* update comment

* update comment

* update comment

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
Yuanle Liu
2026-03-18 11:07:31 +08:00
committed by GitHub
parent a714c1f8d4
commit 7f5f2113c2
24 changed files with 502 additions and 9 deletions
+1 -1
View File
@@ -255,7 +255,7 @@ jobs:
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
-H "Content-Type: application/json" \
-d "{\"--model\": \"/MODELDATA/ERNIE-4.5-VL-28B-A3B-Thinking\", \"--reasoning-parser\": \"ernie-45-vl-thinking\", \"--tool-call-parser\": \"ernie-45-vl-thinking\", \"--tensor-parallel-size\": 1, \"--quantization\": \"wint4\", \"--max-model-len\": 131072, \"--max-num-seqs\": 32, \"--no-enable-prefix-caching\": true}"
check_service 90
check_service 180
python -m pytest -sv test_prompt_ids.py || TEST_EXIT_CODE=1
popd