FastDeploy/fastdeploy at 18ae6aa4d6c403f651bebee810fe62b0b503d5ea - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

google-labs-jules[bot] 18ae6aa4d6 perf: avoid unnecessary dtype casting in RMSNorm

Added checks before calling `.astype` in `fastdeploy/model_executor/layers/normalization.py`. In PaddlePaddle, calling `.astype` allocates a new tensor even if it's already the target dtype, avoiding these casts skips memory allocations and kernel launches on the hot path.

2026-04-19 15:16:05 +00:00

..

[CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )

2025-10-24 16:46:45 +08:00

Mooncake storage register local buffer by chunk (#7416 )

2026-04-17 10:39:34 +08:00

fix typo (#7147 )

2026-04-07 16:30:32 +08:00

[Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126 )

2026-04-01 19:13:34 +08:00

[BugFix] Fix real token exceeding max_batched_tokens limit (#7438 )

2026-04-17 16:18:07 +08:00

[Bugfix][RL] fix control request timeout in async update weights pipeline (#7430 )

2026-04-17 16:45:33 +08:00

[Others] Fix typo (#7280 )

2026-04-14 17:28:22 +08:00

[Feature] Fix mixed cache-aware (#7129 )

2026-04-01 19:29:29 +08:00

[Feature] implement log channel separation and request log level system (#7190 )

2026-04-16 15:13:05 +08:00

inter_communicator

[Optim] Remove IPCLock between CacheManager and WorkerProcess (#7299 )

2026-04-12 13:59:34 +08:00

[Feature] implement log channel separation and request log level system (#7190 )

2026-04-16 15:13:05 +08:00

[BugFix] fix speculative gauge metrics in multi api server (#7082 )

2026-03-31 10:52:50 +08:00

perf: avoid unnecessary dtype casting in RMSNorm

2026-04-19 15:16:05 +00:00

[BugFix] fix multimodal hasher hash collision risk when ndarray shape or dtype differs (#7185 )

2026-04-08 04:26:02 -07:00

[Feature] implement log channel separation and request log level system (#7190 )

2026-04-16 15:13:05 +08:00

[Iluvatar] refactor attn and moe code (#6887 )

2026-03-18 10:31:00 +08:00

【Optimization】update data_processor & add tool parser plugins (#6096 )

2026-01-22 17:17:32 +08:00

[Feature]Optimization of Thinking Pattern Framework (#4302 )

2025-12-10 16:17:06 +08:00

fix rl moe gate type (#7393 )

2026-04-14 20:04:04 +08:00

abort requests (#6992 )

2026-03-31 11:02:26 +08:00

[Feature] implement log channel separation and request log level system (#7190 )

2026-04-16 15:13:05 +08:00

[Speculative Decoding] Add MTP logprob support for PD disaggregation (#7442 )

2026-04-17 21:37:38 +08:00

[Optimization] Optimize ttft for prefill pd (#6680 )

2026-03-30 20:36:23 +08:00

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

transformer_utils

[Feature] support pool (#3827 )

2025-09-22 14:09:09 +08:00

[Feature]Report FD statistical information (#5646 )

2026-01-14 17:54:01 +08:00

[Speculative Decoding] Add MTP logprob support for PD disaggregation (#7442 )

2026-04-17 21:37:38 +08:00

__init__.py

[BugFix] fix speculative gauge metrics in multi api server (#7082 )

2026-03-31 10:52:50 +08:00

collect_env.py

feat: add support for API usage with multimodal models (#4548 )

2025-10-28 20:23:46 +08:00

config.py

[Speculative Decoding][BugFix] Fix apply repeat times penalty kernel and change spec default verify strategy (#7467 )

2026-04-18 00:38:01 +08:00

envs.py

[Feature] implement log channel separation and request log level system (#7190 )

2026-04-16 15:13:05 +08:00

import_ops.py

[build] support build sm 80,86,89,90 to one whl package (#6173 )

2026-01-26 11:30:02 +08:00

stop.sh

[PD Disaggregation] Support Qwen3-MoE use PD + EP inference. (#4691 )

2025-11-06 10:32:15 +08:00

test.yaml

[Sync] Update to latest code (#2679 )

2025-07-03 15:43:53 +08:00

utils.py

[Feature] implement log channel separation and request log level system (#7190 )

2026-04-16 15:13:05 +08:00