FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
luukunn	c3d8db85c4	[Optimization] Update ZMQ server (#6735 ) * add batch zmq send reaponse * update * Revert "update" This reverts commit `0234a25b47`. * update * remove lock * fix unit test * add unit test * add unit test * pre commit * add unit test * fix unit test * add unit test * fix worker>1 * update zmq_worker_pid * fix unit test * fix unit test * fix unit test * add unit test * fix unit test * fix first token time * fix logprobs * add unit test * op * remore debug log --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-19 21:53:16 +08:00
Jiang-Jia-Jun	12eb001d0c	Remove comments on multi-mode request handling Removed comments about multi-mode scenarios and request pulling.	2026-03-17 14:49:00 +08:00
jc	950366e58d	[PD Disaggregation][RL] Register to router with version and support rdma eager connect for pd (#6718 ) * [Feature] Register to router with version info for PD disaggregation Add RegisterManager for PD (Prefill-Decode) disaggregated deployment: - All instances (Prefill/Decode) register to Router with heartbeat - Prefill instances fetch Decode instance list from Router - Prefill instances establish eager RDMA connections to Decode instances - Register info includes: host_ip, port, role, version, is_paused, connected_decodes Changes: - Add RegisterManager class for managing PD registration and RDMA connections - Add version field to ModelConfig for model version tracking - Add connected_decodes to register_info for tracking connected Decode instances - Add FD_ENABLE_PD_RDMA_EAGER_CONNECT environment variable Test fixes: - Add None checks for load_config in FDConfig.__init__ - Add version attribute to test mock model configs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refine * remove test --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 14:43:35 +08:00
qwes5s5	3b7507a4c2	test_abort (#6743 )	2026-03-17 14:06:40 +08:00
Jiang-Jia-Jun	d113397b09	Simplify available_blocks assignment logic (#6819 )	2026-03-16 20:12:30 +08:00
freeliuzc	cf7934a4b2	[Speculative Decoding] Unify Spec and non-spec branch (#6685 ) * optimize spec-inference architecture * delete debug log * optimize spec_method usage && fix unit_test * add claude unit-test skill * fix some ugly bug * enhance robustness and bounds check * unify method & spec_method to method to avoid bug * activate CI * fix unit test * Unify logprobs computation for naive and speculative decoding, fix CUDA kernel * fix logprob bug && optimize verify kernel * fix exist_decode() judge	2026-03-10 23:58:44 -07:00
Jiang-Jia-Jun	b05a6c4206	[BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP (#6724 ) * [BugFix] Support to fix NaN bug in EP * Optimze notion for all the funs * Fix potential lock contention failure issues * Update fastdeploy/inter_communicator/ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update envs.py * Update default value for USE_KVCACHE_LOCK Change default value of USE_KVCACHE_LOCK from 1 to 0. * Update worker_process.py * Fix suffix wrong * Update test_prefix_cache_manager.py --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-10 21:55:32 +08:00
1	3a85ecf3bc	[Others] Fix typos in log messages and comments (#6707 ) Fix spelling errors in log messages, docstrings, and comments: - 'occured' -> 'occurred' (8 instances) - 'Recieve'/'recieved' -> 'Receive'/'received' (7 instances) - 'happend' -> 'happened' (3 instances) - 'expet_servic' -> 'expert_service' (2 instances) - 'meas' -> 'means' (1 instance) No functional changes. Only log strings, docstrings, and comments are affected. Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>	2026-03-09 10:26:25 +08:00
Yonghua Li	fa1906bd6f	[BugFix] Fix inaccurate cache hit rate and TTFT after request preemption (#6620 ) * [chore] add has_been_rescheduled flag for requests * [refactor] rename reschedule to preempted for accuracy and fix cache hit metrics * [chore] add ttft_s	2026-03-05 16:25:02 +08:00
ddchenhao66	fa4815b93a	[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server (#6598 ) * [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server * [BugFix] modify request_queue and result_queue of dp scheduler	2026-03-05 10:04:12 +08:00
qwes5s5	375b5b7b21	[Feature]Log Format Normalization and Trace Log Optimization (#6370 ) * log refactor * log refactor 2 * log refactor 3	2026-03-03 11:31:45 +08:00
sunxin	53aaac69da	[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457 ) * gate bf16 * add gate-fp32 * fix * update baseline * update * update * fix	2026-02-26 21:08:46 -08:00
jackyYang6	a29ee57e15	[Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367 ) * feat: add thinking budget logits processor * add unittest * fix pre-commit * add unittest * docs: clarify operator-level vs logits processor usage and conflict guidance --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-25 14:17:09 +08:00
CSWYF3634076	7380bfb476	[BugFix]fix console log metrics waitting queue count (#6432 ) * [BugFix]fix console log metrics waitting queue count * [BugFix]fix console log metrics waitting queue count unittest	2026-02-11 10:51:49 +08:00
chenjian	35c24f3f71	Revert "[Optimize] Optimize ttft for ep (#6098 )" (#6402 ) This reverts commit `90db0bdd0d`.	2026-02-09 19:01:23 +08:00
kevin	d60daca4a8	[Feature] consider multimodal model when dummy run (#6045 ) * add mm do profile * updata code * update code * update code * update code * update test case * update code * update code * fix xpu bug * update code * add mm do profile * update test case * update code	2026-02-09 17:49:55 +08:00
CSWYF3634076	eb8d639fe3	[Engine] apiserver&engine exit when work failed to start (#6322 )	2026-02-09 15:07:40 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00
CSWYF3634076	1c0a2b055f	[Feature] console print statistical metrics (#6339 ) * [Feature] console print statistical data * [Feature] console print statistical data v2 dp_rank * [Feature] console print statistical data v2 unittest * [Feature] console print statistical data v3 unittest	2026-02-05 19:20:36 +08:00
chenjian	90db0bdd0d	[Optimize] Optimize ttft for ep (#6098 ) * optimize ttft * fix * fix * fix ci * fix ci * fix * fix bug * fix * add comments * fix ci * fix	2026-02-04 15:03:29 +08:00
sunxin	9b0a82cfa9	[Model Runner] Support overlap schedule (#6259 )	2026-02-04 10:49:44 +08:00
Jingfeng Wu	4760835789	Fix heartbeat signal's sleeptime error (#6241 )	2026-02-03 14:28:51 +08:00
chenjian	6e9a57b7c1	[Bug fix] Fix multi modal fetch feature (#6095 )	2026-01-28 18:02:26 +08:00
jc	b1698a79cb	[RL] add version to the key of cache storage && refine raising error (#6160 ) * Waiting for cache transfer manager inited * up * up * up * up * up * fix according comments * fix unittest * fix * fix unittest * fix error * pass storage_backend to worker	2026-01-27 10:47:46 +08:00
wangyifei	b7c5daa316	[RL] add pause, update_weights, resume interface for async RL (#6052 ) * support dynamic run_control_request through zmq from apiserver to common_engine * support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method * change /is_puased from HTTP POST method to GET method * add pause、resume、is_paused implementation * support engine <==> worker communication(request&response) * support sync weights through RDMA from checkpoint_transfer * support specified version, rsync_config in update_weights rpc call * add pause, update_weights, resume interface for async RL * bug fix: update_weights support using default arguments * fix typo * typo fix * typo fix * typo fix * add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all * add "rsync" to LoadConfig.load_strategy Literal type hints Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * typo fix * typo fix * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * check version/rsync params * add error log when version.txt not exists Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * raise specified ValueError when paramters check failed Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * tp barrier after run_control_method * encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue * typo fix * typo fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-23 10:18:07 +08:00
Yonghua Li	8d27a523e7	[Feature] [KVCache] support attention_store kv cache backend (#5823 ) * [feat] support attention_store kv cache backend * [fix] fix codestyle * [chore] optimize log * [fix] fix write storage task * [fix] fix read storage * [fix] fix code conflict after merge develop * [fix] fix cache bytes and read task token ids * [chore] add model for cache transfer manager * [chore] add some log * [chore] remove launched_cache_manager_signal * [fix] fix write_back_storage_task match_block_num condition * [fix] fix swap_cost_time * [ci] fix ci * Update fastdeploy/engine/sched/resource_manager_v1.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/cache_transfer_manager.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-22 21:01:23 +08:00
kxz2002	6e416c62dd	[Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494 ) * to_request_for_infer initial commit * refact to from_chat_completion_request * preprocess use request initial commit * bugfix * processors refact to using request * bug fix * refact Request from_generic_request * post process initial commit * bugfix * postprocess second commit * bugfix * serving_embedding initial commit * serving_reward initial commit * bugfix * replace function name * async_llm initial commit * offline initial commit and fix bug * bugfix * fix async_llm * remove add speculate_metrics into data * fix logprobs bug * fix echo bug * fix bug * fix reasoning_max_tokens * bugfix * bugfix and modify unittest * bugfix and modify unit test * bugfix * bugfix * bugfix * modify unittest * fix error when reasong_content is none for text_processor * remove some unnessary logic * revert removed logic * implement add and set method for RequestOutput and refact code * modify unit test * modify unit test * union process_request and process_request_obj * remove a unit test * union process_response and process_response_obj * support qwen3_vl_processor * modify unittest and remove comments * fix prompt_logprobs * fix codestyle * add v1 * v1 * fix unit test * fix unit test * fix pre-commit * fix * add process request * add process request * fix * fix * fix unit test * fix unit test * fix unit test * fix unit test * fix unit test * remove file * add unit test * add unit test * add unit test * fix unit test * fix unit test * fix * fix --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com> Co-authored-by: luukunn <981429396@qq.com> Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>	2026-01-22 00:50:52 +08:00
周周周	97f96e34ca	only update self.exist_prefill_task_signal in v0 (#6064 ) * commit * commit * commit --------- Co-authored-by: xiaoluomi <1037819816@qq.com>	2026-01-16 20:11:55 +08:00
qwes5s5	b2a2e11551	[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 ) * request disconnect * request disconnect * fix bug * fix bug--amend --------- Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-16 11:46:13 +08:00
Yonghua Li	456637002d	[BugFix] fix cache transfer manager updating/clearing (#5930 ) * [fix] fix cache transfer manager updating/clearing * [fix] fix code style * [fix] fix config * [fix] fix engine client * [fix] let worker update kv cache status signal * [fix] update worker process * [fix] fix clear/update for case if comm group is shutdown * [fix] update dynamic weight manager * [fix] fix port * [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting	2026-01-13 05:09:29 -08:00
ming1753	9c559d02d3	[BugFix] Fix insert_zmq_task_to_scheduler break bug (#5960 ) * [BugFix] fix zmq bug * fix bug * formate * fix test bug * fix bug	2026-01-12 19:21:01 -08:00
jc	8d384f9fd8	[PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742 ) * Update usage of pd disaggregation * up * up * up * up * up * up * up * up * up * up dp docs * up * up * up * fix unittest	2026-01-05 17:51:29 +08:00
GoldPancake	23d488c488	[Feature] Entropy calculation support (#5692 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * support entropy * fix bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-23 21:19:47 +08:00
Yonghua Li	0c8c6369ed	[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 ) * [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script	2025-12-17 15:50:42 +08:00
Jiang-Jia-Jun	2ad3bff4ff	[Optim] Optimize costtime in checking tasks in engine-worker-queue (#5580 ) * [Optim] Optimize costtime in checking tasks in engine-worker-queue * Update fastdeploy/engine/common_engine.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-16 19:27:31 +08:00
xiaolei373	a30b4da260	[Feature] Tracing: Fine-Grained Tracing for Request Latency Part1 (#5458 )	2025-12-16 16:36:09 +08:00
Jiang-Jia-Jun	4b3e41c665	[Optim] Improve task-checking performance in engine-worker-queue (#5376 ) * [Optim] Optimize costtime in checking tasks in engine-worker-queue * Update fastdeploy/engine/common_engine.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/inter_communicator/engine_worker_queue.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Docs] Add docstring to set_exist_tasks method (#5382) * Initial plan * Add docstring to set_exist_tasks method Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * [Docs] Add docstring documentation to exist_tasks() method (#5381) * Initial plan * Add comprehensive docstring to exist_tasks() method Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * [Optimization] Conditionally initialize shared memory for single-node deployments only (#5383) * Initial plan * Conditionally initialize exist_tasks_intra_signal for single-node deployments Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Use is_single_node flag for consistent deployment type checking Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Remove redundant None checks in exist_tasks methods Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * format code --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>	2025-12-11 10:33:32 +08:00
Juncai	83ea9646f9	[PD Disaggregation] Unify the disaggregation info and the pd communication (#5438 ) * Unify the disaggregation info and the pd communication * up * up * fix * fix conflict * fix unittest	2025-12-09 14:44:59 +08:00
Nyakku Shigure	e1c4a12e34	[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part (#5223 ) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-09 14:37:00 +08:00
zhouchong	5d9b5e4a5b	[Engine] [Feature] Refactor async_llm:cross-process with EngineService，based on zmq communication (#4868 ) * Refactor async_llm:cross-process with EngineService * fix: async_llm output process * fix: return prompt_token_ids and prompt_tokens in first res * optimize common_engine start func	2025-12-09 10:53:40 +08:00
Juncai	a8ffc22032	[BugFix] fix init RequestOutput (#5419 ) * fix init RequestOutput * up * fix * fix	2025-12-09 10:20:22 +08:00
Juncai	02df3c5097	FD registers to the Router only once. (#5431 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-12-08 22:07:11 +08:00
Juncai	80efe98f8d	[PD Disaggregation] Add timestamp for analyzing splitwise deployment (#5317 ) * Add timestamp for analyzing splitwise deployment * up * up * up * up * up * up * fix format * fix	2025-12-08 10:08:44 +08:00
kevin	c9d7f9e7c3	[BugFix] fix async download bug (#5349 ) * fix async download bug * update log * Revert "update log" This reverts commit `5816e602f4`. * update code * fix mtp bug	2025-12-05 18:59:12 +08:00
Yonghua Li	35846909c7	[fix] fix scheduler hang when input length is very close to max_model_len (#5393 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details	2025-12-05 18:23:42 +08:00
qwes5s5	1aefbef0b3	fix trace log (#5386 )	2025-12-05 14:45:52 +08:00
Yonghua Li	f4119d51b4	[PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197 ) * [fix] support DP via v1 router and decouple DP and EP * [fix] fix scripts * [fix] reset model path * [fix] dp use get_output_ep, fix router port type, update scripts * [merge] merge with latest code * [chore] remove some debug log * [fix] fix code style check * [fix] fix test_multi_api_server for log_dir name * [chore] reduce logs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-04 15:38:43 +08:00
Daci	7dc06cac6e	[BugFix] race condition [is_fetching] causing multiple fetch requests (#5238 ) * RouterArgs port str -> int * fix race condition [is_fetching] causing multiple fetch requests * bugfix: Delete duplicate input_ids tensor creation	2025-11-28 13:41:36 +08:00
kevin	bf30f45738	[BugFix] fix vl performance bug (#5181 ) * fix vl performance bug * update code * update code --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-11-26 21:06:52 +08:00
freeliuzc	ba915e03e1	[BugFix]Fix attention mask bug in D-Node of PD-split mode (#5245 )	2025-11-26 17:56:28 +08:00

1 2 3

103 Commits