* fix xpu ci bug
* Remove unnecessary blank line in conftest.py
* Update upload-artifact action to version 6
* Update _xpu_8cards_case_test.yml
* fix ci bug
* Change exit code on test failure to 1
* fix ci bug
* fix ci bug
* fix ci bug
* fix ci bug
* Update conftest.py
* [Feature] support v1 update/clear api for RL
* [fix] fix execute_model and add sleep/wakeup api
* [fix] fix mtp and key_prefix
* [chore] move _update_key_prefix to resume method
* [fix] make the interface safe to call multiple times
* [fix] fix some tiny bugs
* [chore] make small changes against pr review
* [docs] add docs for weight update
* [test] add some tests and update docs
* [style] fix code style check
* [test] fix ci
* [fix] fix stale control responses when control method timed out
* [chore] remove unused code
* [chore] fix code style
* [chore] optimize tags and key_prefix
* [test] fix ci
* [chore] fix code style
* [test] fix ci
* [fix] fix ep control
* [fix] fix ep control for engine cache queue
Most single-GPU and small-model deployments do not need 64MB custom
all-reduce buffers. Lowering the default to 8MB reduces unnecessary
shared memory allocation. Tests that require larger buffers now
explicitly set the value.
Co-authored-by: gongweibao <gognweibao@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* support new mtp
* refactor(speculate_decoding and mtp): optimize mtp sturcture logic. Update spec-branch status-process
* fix cuda-graph for spec-decoding
* fix xpu mtp and fix some note
* fix unittest and optmize note
* fix model status update in eos-branch
* [RL] Support chunked part files loading in IPC snapshot strategy
## Motivation
When using IPC snapshot for elastic recovery in RL training, loading a single large pdparams file causes a significant memory spike. This PR refactors `_update_ipc_snapshot` to support loading chunked part files to avoid the memory spike.
## Modifications
Refactored `_update_ipc_snapshot` in `fastdeploy/rl/dynamic_weight_manager.py` with a three-level loading priority:
1. **Chunked part files** (`model_state.tpR{id}.part{N}.pdparams`): Load multiple smaller shards sequentially, freeing memory between each chunk via `gc.collect()` to avoid memory spike.
2. **Single full file** (`model_state.tpR{id}.pdparams`): Legacy single-file loading path (preserved for backward compatibility).
3. **Shared fallback directory** (`/shared_ipc_meta/...`): Oldest legacy fallback path (preserved for backward compatibility).
Also fixed the rank ID in the file name pattern from hardcoded `tp0` to dynamic `paddle.distributed.get_rank()`.
## Checklist
- [ ] Add at least a tag in the PR title.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.
Co-Authored-By: lishuaihui <lishuaihui@baidu.com>
* [RL] Support chunked part files loading in IPC snapshot strategy
## Motivation
When using IPC snapshot for elastic recovery in RL training, loading a single large pdparams file causes a significant memory spike. This PR refactors `_update_ipc_snapshot` to support loading chunked part files to avoid the memory spike.
## Modifications
Refactored `_update_ipc_snapshot` in `fastdeploy/rl/dynamic_weight_manager.py` with a three-level loading priority:
1. **Chunked part files** (`model_state.tpR{id}.part{N}.pdparams`): Load multiple smaller shards sequentially, freeing memory between each chunk via `gc.collect()` to avoid memory spike.
2. **Single full file** (`model_state.tpR{id}.pdparams`): Legacy single-file loading path (preserved for backward compatibility).
3. **Shared fallback directory** (`/shared_ipc_meta/...`): Oldest legacy fallback path (preserved for backward compatibility).
Also fixed the rank ID in the file name pattern from hardcoded `tp0` to dynamic `paddle.distributed.get_rank()`.
## Checklist
- [ ] Add at least a tag in the PR title.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.
Co-Authored-By: lishuaihui <lishuaihui@baidu.com>
* [RL][BugFix] Fix ambiguous model path format and add legacy fallback in IPC snapshot
## Motivation
The previous snapshot file naming `model_state.tp{rank}{id}` concatenated
rank and id without a separator, causing ambiguity (e.g., rank=1, id=234
and rank=12, id=34 both produce `tp1234`). Additionally, after the naming
format is updated, existing checkpoints saved in the old format would fail
to load during elastic recovery, causing unnecessary failures.
## Modifications
- Add dot separator between rank and id in snapshot file name:
`model_state.tp{rank}{id}` → `model_state.tp{rank}.{id}`
- Add Priority 3 legacy fallback to load old-format files
(`model_state.tp0{id}.pdparams`) for backward compatibility during
rolling upgrades
- Update docstring and error message to reflect the new 4-level priority
Co-Authored-By: lishuaihui <lishuaihui@baidu.com>
* [RL][Test] Add unit tests for DynamicWeightManager._update_ipc_snapshot
Cover all 4 loading priority branches (chunked part files, single full
pdparams, legacy format, shared directory fallback) with mock-based
tests to verify correct behavior without filesystem or GPU dependencies.
Co-Authored-By: lishuaihui <lishuaihui@baidu.com>
* [RL][Test] Remove unused import 'call' in test_update_ipc_snapshot.py
Co-Authored-By: lishuaihui <lishuaihui@baidu.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* [RL] Fix snapshot part index to match filename numbering
Parse part index from filename (e.g. .part0.) instead of using
enumerate index, so that logs and src_type stay consistent with
the actual file naming convention.
Co-Authored-By: wikilsh <wiki_hui@qq.com>
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* 【Hackathon 10th Spring No.32】Unit test for load_weight_utils.py
* [CI]【Hackathon 10th Spring No.32】rewrite load_weight_utils unit test
* [CI]【Hackathon 10th Spring No.32】improve load_weight_utils coverage to 83%
- Add test_load_ep_checkpoint_basic: exercises EP checkpoint loading with minimal fixture
- Add test_composite_ep_branch: covers EP path in load_composite_checkpoint
- Add test_get_weight_iterator_unordered: covers unordered sharded safetensors path
* [CI]【Hackathon 10th Spring No.32】align load_weight_utils test with gold standard (tmp_path, split tests)
* [CI]【Hackathon 10th Spring No.32】add coverage tests for load_weight_utils
- Add test_is_layers_grouped: test layers_are_grouped() with grouped, interleaved, and no-layer keys
- Add test_save_model_bf16_cache: exercise save_model decorator with is_checkpoint_bf16=True
- Add test_composite_checkpoint_ep: test load_composite_checkpoint use_ep=True branch
- Add test_composite_checkpoint_rank_mismatch: test tp_size != rank_dirs ValueError
- Add test_composite_checkpoint_kv_quant: test float8_e4m3fn kv_cache path
- Add __main__ block for direct execution
* [CI]【Hackathon 10th Spring No.32】raise load_weight_utils test delta
* [CI]【Hackathon 10th Spring No.32】cover TP sequence-parallel MoE load branches
* test: add load_reordered_experts, pre-sharded, and empty-state tests
---------
Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
* remove process_request
* fix chat
* fix unit test
* remove process response
* fix unit test
* fix offline decode
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix sampling_params
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* add batch zmq send reaponse
* update
* Revert "update"
This reverts commit 0234a25b47.
* update
* remove lock
* fix unit test
* add unit test
* add unit test
* pre commit
* add unit test
* fix unit test
* add unit test
* fix worker>1
* update zmq_worker_pid
* fix unit test
* fix unit test
* fix unit test
* add unit test
* fix unit test
* fix first token time
* fix logprobs
* add unit test
* op
* remore debug log
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>