FastDeploy/fastdeploy/engine at bd57b1e2a79fd18a1cf7a7c217fcb4e7d9c5c22a - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

Jiang-Jia-Jun bd57b1e2a7 Update args_utils.py

2026-04-22 11:02:26 +08:00

..

[Cherry-Pick][BugFix] Fix real token exceeding max_batched_tokens limit(#7438 ) (#7440 )

2026-04-17 16:18:03 +08:00

__init__.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

args_utils.py

Update args_utils.py

2026-04-22 11:02:26 +08:00

async_llm.py

[Optimization] Enable text-only deployment for multimodal models (#7234 )

2026-04-08 16:32:19 +08:00

common_engine.py

[BugFix] Remove ipc lock to avoid nan (#7312 )

2026-04-12 13:58:19 +08:00

engine.py

[KSM] support keep sampling mask (#7146 )

2026-04-02 20:30:54 -07:00

expert_service.py

[Others] Exit to ensure no residual processes (cpu cache & dp) (#6377 )

2026-02-09 20:38:38 +08:00

kv_cache_interface.py

bug: fix list to List (#4818 )

2025-11-06 16:13:12 +08:00

pooling_params.py

[Feature] support reward model (#5301 )

2025-12-02 14:55:31 +08:00

request.py

[KSM] support keep sampling mask (#7146 )

2026-04-02 20:30:54 -07:00

resource_manager.py

[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 )

2026-01-16 11:46:13 +08:00

sampling_params.py

[Cherry-Pick][OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6511 )

2026-02-26 13:29:38 +08:00

tasks.py

[feature] support reward api (#4518 )

2025-10-29 00:20:28 +08:00