Commit Graph

15 Commits

Author SHA1 Message Date
luukunn fdfc908e2f [Others] reuse unit test (#7127) 2026-04-01 18:36:00 +08:00
luukunn 3651113ee5 [DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052)
* remove ENABLE_V1_DATA_PROCESSOR

* fix unit test

* fix unit test
2026-04-01 09:53:41 +08:00
jc 950366e58d [PD Disaggregation][RL] Register to router with version and support rdma eager connect for pd (#6718)
* [Feature] Register to router with version info for PD disaggregation

Add RegisterManager for PD (Prefill-Decode) disaggregated deployment:
- All instances (Prefill/Decode) register to Router with heartbeat
- Prefill instances fetch Decode instance list from Router
- Prefill instances establish eager RDMA connections to Decode instances
- Register info includes: host_ip, port, role, version, is_paused, connected_decodes

Changes:
- Add RegisterManager class for managing PD registration and RDMA connections
- Add version field to ModelConfig for model version tracking
- Add connected_decodes to register_info for tracking connected Decode instances
- Add FD_ENABLE_PD_RDMA_EAGER_CONNECT environment variable

Test fixes:
- Add None checks for load_config in FDConfig.__init__
- Add version attribute to test mock model configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refine

* remove test

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 14:43:35 +08:00
ddchenhao66 a502dda1fe [BugFix] fix multi-step mtp bug (#6754) 2026-03-11 10:16:04 +08:00
Yonghua Li e2332a1112 [BugFix] fix num_cpu_blocks computation (#6438)
* [BugFix] fix num_cpu_blocks computation

* [fix] fix syntax and log

* [fix] pre-commit

* [fix] use getattr

* [fix] ci test
2026-02-13 11:05:14 +08:00
kevin d60daca4a8 [Feature] consider multimodal model when dummy run (#6045)
* add mm do profile

* updata code

* update code

* update code

* update code

* update test case

* update code

* update code

* fix xpu bug

* update code

* add mm do profile

* update test case

* update code
2026-02-09 17:49:55 +08:00
kesmeey 73952a3b67 add tests (#6243)
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-03 17:02:36 +08:00
kevin 0e0eaa1c57 [BugFix] fix mm revert bug (#6061)
* fix mm revert bug

* update code
2026-01-16 08:13:34 -08:00
kevin 2d2b156252 [BugFix] fix dyc8 cache bug (#5958)
* fix dyc8 cache bug

* update code
2026-01-08 19:25:47 -08:00
kevin eabd01cd21 [BugFix] fix eb5 prefix bug (#5879)
* fix eb5 prefix bug

* update ci test

* update code

* update code

* update code

* update code

* update code

* update code

* update code
2026-01-06 23:50:39 -08:00
kevin 894f4e312b [FDConfig] disable chunked_mm_input in ernie5 (#5774)
* disable chunked_mm_input in ernie5

* update code

* update code

* update test case

* update testcase

* upate case
2025-12-26 15:31:27 +08:00
Echo-Nie 1b1bfab341 [CI] Add unittest (#5328)
* add test_worker_eplb

* remove tesnsor_wise_fp8

* add copyright
2025-12-09 19:19:42 +08:00
kevin c9d7f9e7c3 [BugFix] fix async download bug (#5349)
* fix async download bug

* update log

* Revert "update log"

This reverts commit 5816e602f4.

* update code

* fix mtp bug
2025-12-05 18:59:12 +08:00
kevin 7454480e07 [Feature] support bos download retry (#5137)
* support bos download retry

* update code

* update code
2025-11-21 10:18:32 +08:00
kevin 109d48e456 [Feature] support async download features (#5003)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support async download features

* add test case

* update code
2025-11-19 22:23:36 +08:00