FastDeploy/fastdeploy at 48cfb608aa96e78b717876398c5deb3eeacb41b2 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

gongweibao 48cfb608aa [FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8 (#6997 )

Most single-GPU and small-model deployments do not need 64MB custom
all-reduce buffers. Lowering the default to 8MB reduces unnecessary
shared memory allocation. Tests that require larger buffers now
explicitly set the value.

Co-authored-by: gongweibao <gognweibao@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-25 17:40:01 +08:00

..

[CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )

2025-10-24 16:46:45 +08:00

[PD Disaggregation] pd + cache_storage support vl model (#6906 )

2026-03-23 15:35:20 +08:00

[PD Disaggregation] Add timestamp for analyzing splitwise deployment (#5317 )

2025-12-08 10:08:44 +08:00

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

[PD Disaggregation] pd + cache_storage support vl model (#6906 )

2026-03-23 15:35:20 +08:00

[Optimization]Optimize CPU utilization (#6950 )

2026-03-22 23:02:39 +08:00

more eplb offline load dtypes (#6435 )

2026-03-02 14:34:20 +08:00

[Feature] Update Counter Release (#6943 )

2026-03-20 10:51:37 +08:00

[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow (#6934 )

2026-03-23 14:15:55 +08:00

inter_communicator

[Optimization] Update ZMQ server (#6735 )

2026-03-19 21:53:16 +08:00

fix debug log (#6766 )

2026-03-12 14:46:01 +08:00

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug (#7005 )

2026-03-25 01:52:06 -07:00

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

test_abort (#6743 )

2026-03-17 14:06:40 +08:00

[Iluvatar] refactor attn and moe code (#6887 )

2026-03-18 10:31:00 +08:00

【Optimization】update data_processor & add tool parser plugins (#6096 )

2026-01-22 17:17:32 +08:00

[Feature]Optimization of Thinking Pattern Framework (#4302 )

2025-12-10 16:17:06 +08:00

[RL][BugFix][Optimization] Support chunked part files loading and fix model path format in IPC snapshot strategy (#6852 )

2026-03-23 16:17:41 +08:00

[PD Disaggregation][RL] Register to router with version and support rdma eager connect for pd (#6718 )

2026-03-17 14:43:35 +08:00

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug (#7005 )

2026-03-25 01:52:06 -07:00

test_abort (#6743 )

2026-03-17 14:06:40 +08:00

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

transformer_utils

[Feature] support pool (#3827 )

2025-09-22 14:09:09 +08:00

[Feature]Report FD statistical information (#5646 )

2026-01-14 17:54:01 +08:00

[Speculative Decoding] refactor MTP and optimize spec-decoding postprocess (#6973 )

2026-03-24 10:19:01 +08:00

__init__.py

[Optimization] Use a separate driver when using Triton with Paddle (#6897 )

2026-03-24 10:56:00 +08:00

collect_env.py

feat: add support for API usage with multimodal models (#4548 )

2025-10-28 20:23:46 +08:00

config.py

[PD Disaggregation][RL] Register to router with version and support rdma eager connect for pd (#6718 )

2026-03-17 14:43:35 +08:00

envs.py

[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8 (#6997 )

2026-03-25 17:40:01 +08:00

import_ops.py

[build] support build sm 80,86,89,90 to one whl package (#6173 )

2026-01-26 11:30:02 +08:00

stop.sh

[PD Disaggregation] Support Qwen3-MoE use PD + EP inference. (#4691 )

2025-11-06 10:32:15 +08:00

test.yaml

[Sync] Update to latest code (#2679 )

2025-07-03 15:43:53 +08:00

utils.py

[Optimization] Use a separate driver when using Triton with Paddle (#6897 )

2026-03-24 10:56:00 +08:00