FastDeploy/fastdeploy at ddec1b07f862142262a4e24170d2e8f46ac8e128 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

google-labs-jules[bot] ddec1b07f8 ⚡ Bolt: [performance improvement] Pre-allocate np.full array for padding lists instead of using slow list concatenations in pad_batch_data

The old implementation uses `[[pad_id] * (max_len - len(inst)) + list(inst) for inst in insts]` to pad list sequences. This performs an $O(N \times \text{max\_len})$ list concatenation, creating many intermediate Python lists and stressing the garbage collector, before finally passing the result to `np.array(..., dtype=np.int64)`.

This change updates it to pre-allocate an empty numpy array (`np.full`) and safely populates it using numpy slicing (`padded_insts[i, :l] = inst`). The change results in a ~2x faster performance. This has been verified to be completely logically equivalent to the original un-modified processor output on a comprehensive set of test cases.

2026-04-13 15:14:37 +00:00

..

…

[Optim] Remove IPCLock between CacheManager and WorkerProcess (#7299 )

2026-04-12 13:59:34 +08:00

fix typo (#7147 )

2026-04-07 16:30:32 +08:00

[Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126 )

2026-04-01 19:13:34 +08:00

[Optim] Remove IPCLock between CacheManager and WorkerProcess (#7299 )

2026-04-12 13:59:34 +08:00

add ips check (#7352 )

2026-04-13 15:24:22 +08:00

[Speculative Decoding] Support mtp super ultra overlap in pd-split mode with insert_task overlap (#7323 )

2026-04-13 19:41:17 +08:00

[Feature] Fix mixed cache-aware (#7129 )

2026-04-01 19:29:29 +08:00

⚡ Bolt: [performance improvement] Pre-allocate np.full array for padding lists instead of using slow list concatenations in pad_batch_data

2026-04-13 15:14:37 +00:00

inter_communicator

[Optim] Remove IPCLock between CacheManager and WorkerProcess (#7299 )

2026-04-12 13:59:34 +08:00

[Feature] Add logging parameters and error output to terminal (#7098 )

2026-04-01 13:18:42 +08:00

[BugFix] fix speculative gauge metrics in multi api server (#7082 )

2026-03-31 10:52:50 +08:00

[Feature] 为 FusedMoE 添加 hidden_size 显式参数支持 (#7361 )

2026-04-13 20:24:58 +08:00

[BugFix] fix multimodal hasher hash collision risk when ndarray shape or dtype differs (#7185 )

2026-04-08 04:26:02 -07:00

Split enable_mm (#7183 )

2026-04-08 11:25:41 +08:00

[Iluvatar] refactor attn and moe code (#6887 )

2026-03-18 10:31:00 +08:00

【Optimization】update data_processor & add tool parser plugins (#6096 )

2026-01-22 17:17:32 +08:00

[Feature]Optimization of Thinking Pattern Framework (#4302 )

2025-12-10 16:17:06 +08:00

fix oom bug, optimize async weight loading and update read step by yaml (#7171 )

2026-04-03 11:05:24 +08:00

abort requests (#6992 )

2026-03-31 11:02:26 +08:00

[Optimization] Optimize ttft for prefill pd (#6680 )

2026-03-30 20:36:23 +08:00

[Speculative Decoding] Support mtp super ultra overlap in pd-split mode with insert_task overlap (#7323 )

2026-04-13 19:41:17 +08:00

[Optimization] Optimize ttft for prefill pd (#6680 )

2026-03-30 20:36:23 +08:00

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

transformer_utils

…

[Feature]Report FD statistical information (#5646 )

2026-01-14 17:54:01 +08:00

[Optimization] 移除 num_blocks 上限限制 (#7241 )

2026-04-13 07:07:41 -07:00

__init__.py

[BugFix] fix speculative gauge metrics in multi api server (#7082 )

2026-03-31 10:52:50 +08:00

collect_env.py

…

config.py

[Loader] add multi-thread model loading (#6877 )

2026-04-09 23:40:15 -07:00

envs.py

[TI-consistent] support quant use pow2scale (#7308 )

2026-04-13 00:01:53 -07:00

import_ops.py

[build] support build sm 80,86,89,90 to one whl package (#6173 )

2026-01-26 11:30:02 +08:00

stop.sh

…

test.yaml

…

utils.py

[Feature] support v1 update/clear api for RL (#6761 )

2026-03-25 19:18:46 +08:00