* 【Hackathon 10th Spring No.32】Unit test for load_weight_utils.py
* [CI]【Hackathon 10th Spring No.32】rewrite load_weight_utils unit test
* [CI]【Hackathon 10th Spring No.32】improve load_weight_utils coverage to 83%
- Add test_load_ep_checkpoint_basic: exercises EP checkpoint loading with minimal fixture
- Add test_composite_ep_branch: covers EP path in load_composite_checkpoint
- Add test_get_weight_iterator_unordered: covers unordered sharded safetensors path
* [CI]【Hackathon 10th Spring No.32】align load_weight_utils test with gold standard (tmp_path, split tests)
* [CI]【Hackathon 10th Spring No.32】add coverage tests for load_weight_utils
- Add test_is_layers_grouped: test layers_are_grouped() with grouped, interleaved, and no-layer keys
- Add test_save_model_bf16_cache: exercise save_model decorator with is_checkpoint_bf16=True
- Add test_composite_checkpoint_ep: test load_composite_checkpoint use_ep=True branch
- Add test_composite_checkpoint_rank_mismatch: test tp_size != rank_dirs ValueError
- Add test_composite_checkpoint_kv_quant: test float8_e4m3fn kv_cache path
- Add __main__ block for direct execution
* [CI]【Hackathon 10th Spring No.32】raise load_weight_utils test delta
* [CI]【Hackathon 10th Spring No.32】cover TP sequence-parallel MoE load branches
* test: add load_reordered_experts, pre-sharded, and empty-state tests
---------
Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
* remove process_request
* fix chat
* fix unit test
* remove process response
* fix unit test
* fix offline decode
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix sampling_params
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* add batch zmq send reaponse
* update
* Revert "update"
This reverts commit 0234a25b47.
* update
* remove lock
* fix unit test
* add unit test
* add unit test
* pre commit
* add unit test
* fix unit test
* add unit test
* fix worker>1
* update zmq_worker_pid
* fix unit test
* fix unit test
* fix unit test
* add unit test
* fix unit test
* fix first token time
* fix logprobs
* add unit test
* op
* remore debug log
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [BugFix] Force top_k=1 for greedy decoding when temperature=0
When temperature is set to 0 (greedy decoding), only setting temperature
to a small epsilon is insufficient — the sampling kernel may still pick
non-top-1 tokens. Explicitly set top_k=1 in all processors to guarantee
argmax behavior.
Additionally, add argmax fast-path in top_k_top_p_sampling() under
FD_DETERMINISTIC_MODE to handle non-rejection sampling backends that
ignore top_k parameter.
* Extract greedy decoding from FD_DETERMINISTIC_MODE guard
top_k=1 → argmax is a correctness optimization, not deterministic-specific.
Remove the FD_DETERMINISTIC_MODE guard so all-greedy fast-path and
mixed-batch override work unconditionally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update test_torch_model.py
---------
Co-authored-by: gongweibao <gognweibao@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* [BugFix] Cap nvcc -t threads to avoid compilation failures on high-core machines
On machines with many cores (e.g. 192), the nvcc -t flag was set to
os.cpu_count(), causing each nvcc process to spawn that many internal
threads. Combined with Paddle's ThreadPoolExecutor launching parallel
compilations (also based on cpu_count), this leads to ~28K+ threads,
resource exhaustion, and silent compilation failures. The linker then
cannot find the missing .o files, but a second build succeeds because
already-compiled objects are cached.
Cap nvcc -t at 4 to keep total parallelism reasonable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: gongweibao <gognweibao@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* [Feature] Register to router with version info for PD disaggregation
Add RegisterManager for PD (Prefill-Decode) disaggregated deployment:
- All instances (Prefill/Decode) register to Router with heartbeat
- Prefill instances fetch Decode instance list from Router
- Prefill instances establish eager RDMA connections to Decode instances
- Register info includes: host_ip, port, role, version, is_paused, connected_decodes
Changes:
- Add RegisterManager class for managing PD registration and RDMA connections
- Add version field to ModelConfig for model version tracking
- Add connected_decodes to register_info for tracking connected Decode instances
- Add FD_ENABLE_PD_RDMA_EAGER_CONNECT environment variable
Test fixes:
- Add None checks for load_config in FDConfig.__init__
- Add version attribute to test mock model configs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refine
* remove test
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>