Commit Graph

5054 Commits

Author SHA1 Message Date
fxyfxy777 9f3b3ce7f5 [Optimization] merge_allreduce (#7039) 2026-04-02 19:52:13 +08:00
bukejiyu f142b486c9 update (#7101) 2026-04-02 16:07:26 +08:00
Longzhi Wang 938e7dd881 [Other] support video_fps args for video bench (#7077) 2026-04-02 10:40:15 +08:00
YuBaoku 7aa213bba9 [CI] Replace ipc=host with shm-size and sysctl configuration (#7138) 2026-04-02 10:33:55 +08:00
YuBaoku db808f2080 [CI] Optimize log cleanup and isolation in unittest (#7132) 2026-04-01 22:07:55 +08:00
Yuanle Liu 1af7f80811 Revert "[BugFix][Speculative Decoding] Correct index calculation in speculate…" (#7133)
This reverts commit ba1aa1edff.
2026-04-01 06:54:23 -07:00
luukunn fa7a84926d [Optimization]Fix tool parser (#7079)
* fix tool parser
2026-04-01 21:20:34 +08:00
Bingoo 410988d9ec [OP] support deepgeem for sm103 (#7073)
* support deepgeem for sm103

* add assert

* modify code style

* add assert

* modify sm version condition

* remove assert
2026-04-01 21:01:09 +08:00
lonelygsh ba1aa1edff [BugFix][Speculative Decoding] Correct index calculation in speculate decoding operators (#7121)
- Fix accept_idx calculation in spec_set_value_by_stop_seqs
- Fix condition check from < to <= for token matching
- Fix accept_tokens indexing logic
- Remove unnecessary -1 in current_step comparison for max_think_len

Co-authored-by: guanshihui] <guanshihui@baidu.com>
2026-04-01 05:36:53 -07:00
cmcamdy 7a2e33098f [XPU] Refactor pre process (#6993)
* [XPU] support speculate_pre_process

* merge develop

* fix codestype

* fix mtp, support cu_seqlens_q_output

* fix mtp, support cu_seqlens_q_output

* fix test

---------

Co-authored-by: lizan1999 <lizan03@baidu.com>
2026-04-01 20:29:55 +08:00
mouxin fba8a51ad1 [Feature] Fix mixed cache-aware (#7129)
* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Fix mixed cache-aware

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-04-01 19:29:29 +08:00
Jingfeng Wu 3b564116d5 [Docs] Add docs for disaggregated deployment (#6700)
* add docs for disaggregated deployment

* pre-commit run for style check

* update docs
2026-04-01 19:27:09 +08:00
yzwu ceaf5df350 [Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126) 2026-04-01 19:13:34 +08:00
luukunn fdfc908e2f [Others] reuse unit test (#7127) 2026-04-01 18:36:00 +08:00
mouxin 6cae9b1f50 [Feature] Config eviction_duration (#7125)
* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-04-01 16:46:21 +08:00
sunxin c29e86fc9d [Feature] Support mtp overlap schedule (#7001) 2026-04-01 14:24:26 +08:00
YuBaoku c6f0c5c3a6 [CI] Optimize test execution with single-GPU parallelism (#7085)
* [CI] Optimize test execution with single-GPU parallelism and log collection

* remove export CUDA_VISIBLE_DEVICES

* fix path error

* fix log_* path and debug

* [CI] Optimize test execution with single-GPU parallelism and log collection
2026-04-01 14:18:40 +08:00
zhouchong 91c832f607 [Feature] Add logging parameters and error output to terminal (#7098) 2026-04-01 13:18:42 +08:00
jc af51fc46d6 [PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation (#7107)
* Write the cache of preempted req to storage

* up

* fix
2026-04-01 13:15:52 +08:00
luukunn 3651113ee5 [DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052)
* remove ENABLE_V1_DATA_PROCESSOR

* fix unit test

* fix unit test
2026-04-01 09:53:41 +08:00
qwes5s5 ee2b965f5f adjust config info (#7054) 2026-03-31 21:26:05 +08:00
Yonghua Li a3cc3aa777 [BugFix] reset exist tasks signal in clear_data (#7111)
* [BugFix] reset exist tasks signal in clear_data

* [Fix] fix stale exist tasks signal after weight update

* [Chore] downgrade detected new requests log to DEBUG level

* [fix] adjust continue place
2026-03-31 21:24:08 +08:00
周周周 fd44bb7cbf cpmmot (#7105)
Co-authored-by: “liuruian” <liuruian@baidu.com>
2026-03-31 16:13:44 +08:00
cloudforge1 5c5dc66aa7 [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 (#6731)
* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充

* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充
---------

Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-03-31 15:29:35 +08:00
YilongGuo dd61e7e421 [Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration (#7086)
Add clear_grpah_opt_backend method that delegates to the underlying model
to clear cuda graph optimization backend.

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-03-31 13:48:25 +08:00
YuBaoku db6e637f4f [CI] Remove skip logic for *.txt-only changes (#7104) 2026-03-31 13:24:50 +08:00
huicongyao dd2aa10ed4 fix cuda graph capture failure in CI test (#7094) 2026-03-31 11:05:51 +08:00
qwes5s5 daa95244f7 abort requests (#6992) 2026-03-31 11:02:26 +08:00
Yonghua Li 6d9739f360 [BugFix] fix speculative gauge metrics in multi api server (#7082) 2026-03-31 10:52:50 +08:00
chenjian 6727df8286 [Optimization] Optimize ttft for prefill pd (#6680)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix

* fix ci

* fix format

* update according to review

* add comment

* fix

* fix format
2026-03-30 20:36:23 +08:00
jackyYang6 05f2d95729 [RL] Adapt async rollout checkpoint update flow (#7042)
* update checkpoint-transfer flow and control update_weights params

* test: add update_weights route validation
2026-03-30 19:19:34 +08:00
yzwu 8789329457 [Iluvatar] Support wi4a16 group_gemm (#7078) 2026-03-30 19:03:51 +08:00
kevin 18062c55bb [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys (#6929)
* [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [BugFix][KVCache] Fix test_get_block_hash_extra_keys_boundary_cases assertions

## Motivation

测试用例 `test_get_block_hash_extra_keys_boundary_cases` 中,Block [4,8) 的
调用错误地传入了 `mm_idx=1`,跳过了 img0[2,5);但 img0 覆盖 token 4,token 4
属于 block [4,8),应被包含在 hash_keys 中。此外,所有 assertEqual 只校验了
hash_keys,未校验返回的 mm_idx 游标。

## Modifications

- `test_get_block_hash_extra_keys_boundary_cases`:
  - 改为链式调用,用上一次返回的 mm_idx 作为下一次入参,模拟真实调用循环
  - Block [4,8) 入参从 `mm_idx=1` 改为沿用上次返回的 `mm_idx=0`,期望值从 `[]` 改为 `["hash-0"]`
  - 所有断言改为 `assertEqual((mm_idx, hash_keys), (...))` 同时校验游标
- `test_get_block_hash_extra_keys_no_overlap_at_boundaries`:
  - Case B 入参从 `mm_idx=1` 改为 `mm_idx=0`(从头遍历,img-a 走 continue)
  - 所有断言增加 mm_idx 校验
- `test_get_block_hash_extra_keys_image_crosses_block_boundary`:
  - 所有断言增加 mm_idx 校验
- `test_get_block_hash_extra_keys_no_mm_inputs`:
  - 断言增加 mm_idx 校验
- `test_get_block_hash_extra_keys_handles_multimodal_segments`:
  - call2、call3 断言增加 mm_idx 校验

## Usage or Command

```bash
python -m pytest tests/cache_manager/test_prefix_cache_manager.py::TestPrefixCacheManagerCoverage -v -k "get_block_hash_extra_keys"
```

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: chengyanfu <chengyanfu@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 17:13:31 +08:00
周周周 76cf5e9496 [append attention] clean code (#7062) 2026-03-30 15:07:53 +08:00
luukunn b9f8873367 [Optimization]Merge Text processor (#7030)
* merge text processor

* update

* fix unit test

* merge messages2ids

* fix unit test

* 删除重复代码

* remove redundant code

* delete code

* fix unit test
2026-03-30 15:02:35 +08:00
Jiang-Jia-Jun 1670b011a5 Revert "[BugFix] Add lock to avoid generating nan when using storage cache (#…" (#7075)
This reverts commit 6d2ab8f2c0.
2026-03-30 14:52:05 +08:00
jc 6d2ab8f2c0 [BugFix] Add lock to avoid generating nan when using storage cache (#7046)
* Add lock to avoid generating nan

* up
2026-03-30 14:50:32 +08:00
zhangbo9674 5c60e2fc6f fix bug in cudagraph (#7069) 2026-03-30 14:24:23 +08:00
mpgemm 1a1d048774 [Feature] Support NVFP4 Flashinfer-cutedsl MoE on SM100 (#6963) 2026-03-30 11:37:04 +08:00
mouxin 61a9079c60 [Feature] Update logging (#7072) 2026-03-30 11:20:27 +08:00
Longzhi Wang 2eea6fa97a [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend (#7028)
* [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend

* add constexpr and code style clean

* add test

* fix code style

* fix test
2026-03-30 11:17:15 +08:00
mpgemm 7a20eaebe8 [Feature] Support cute cpp Encoder FA4 (#7016)
* add cute cpp fa4

* 删掉注释

* 修正合并错误

* sm_version放到函数内

* ci错误
2026-03-30 10:54:56 +08:00
kevin 9765fa7313 [Refactor] Replace --skip-mm-profiling with --deploy-modality text (#7048)
* [Feature] Support --skip-mm-profiling to skip multimodal token overhead in profiling

## Motivation

在多模态模型(如 Qwen2.5-VL、ERNIE4.5-VL 等)部署时,`get_max_chunk_tokens` 会在
基础 token 数之上额外叠加 mm token 数,用于 profiling 阶段预留显存。

某些场景下(如已知图像 token 数较小,或希望节省显存),用户希望跳过该多模态 token
额外开销的计算,直接使用文本 token 数进行 profiling。

## Modifications

- `fastdeploy/engine/args_utils.py`:`EngineArgs` 新增 `skip_mm_profiling: bool = False`
  字段,parser 新增 `--skip-mm-profiling` 启动参数
- `fastdeploy/config.py`:`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`;
  `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断,
  开启后跳过 mm token 叠加,直接返回基础 `num_tokens`

## Usage or Command

启动服务时添加参数:
```bash
--skip-mm-profiling
```

## Checklist

- [x] Add at least a tag in the PR title.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 本功能为配置参数透传,逻辑简单,已有相关 config 单元测试覆盖。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [Refactor] Replace skip_mm_profiling with deploy_modality=text to skip mm profiling

## Motivation

原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠:
当以纯文本模式(`deploy_modality=text`)部署时,本就不需要为多模态 token 预留显存。
引入独立参数增加了配置复杂度,复用 `deploy_modality` 更加直观和一致。

## Modifications

- `fastdeploy/engine/args_utils.py`:删除 `EngineArgs.skip_mm_profiling` 字段及
  `--skip-mm-profiling` 启动参数
- `fastdeploy/config.py`:删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`;
  `FDConfig.get_max_chunk_tokens` 中将条件改为
  `self.deploy_modality != DeployModality.TEXT`,
  当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`,跳过 mm token 叠加

## Usage or Command

```bash
# 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling)
python -m fastdeploy.entrypoints.openai.api_server \
  --deploy-modality text \
  --model /path/to/model \
  ...
```

## Checklist

- [x] Add at least a tag in the PR title.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 本次为参数重构,逻辑等价替换,已有 config 单元测试覆盖。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 19:40:27 -07:00
YuBaoku a7cbe3ff91 [CI] Adapt to codecov action changes for Node.js 24 (#7064) 2026-03-29 16:49:44 +08:00
YuBaoku 842c60809a [CI] Align with Paddle layer_norm kernel update (#7056) 2026-03-27 22:58:01 +08:00
Zhang Yulong f25760f4e6 [CI] Update docker run command in unit test coverage workflow (#7050)
Removed the --ipc=host option from the docker run command.
2026-03-27 19:53:09 +08:00
cmcamdy bf8e9bf81d [XPU] Fix speculate schedule (#7049)
* [BugFix] xpu fix speculate schedule cache kernel

* fix code style
2026-03-27 18:28:17 +08:00
cloudforge1 11ad95ba91 [CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 (#6738)
* [CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充

* [CI]【Hackathon 10th Spring No.43】add mapping and forward branch coverage

---------

Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-03-27 17:15:53 +08:00
fxyfxy777 8ff8236a6f [Optimization] optimize fused_swiglu_fp8_quant_kernel (#7007)
* use sharemem

* B card test

* fix acc error
2026-03-27 16:10:16 +08:00
GoldPancake 6693bcd0e4 [BugFix] fix clear_parameters in draft cudagraph (#7035) 2026-03-27 15:28:50 +08:00