Commit Graph

2775 Commits

Author SHA1 Message Date
freeliuzc a39a67334c fix mtp bug in pd-split mode (#2970)
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-23 15:31:16 +08:00
YuBaoku 6c4cfd9359 [CI] add codestyle_check action (#2972)
* [CI] add codestyle_check action

* [CI] Integrate codestyle check via pre-commit in GitHub Actions
2025-07-23 15:21:56 +08:00
lizexu123 9b22b8d2c3 delete max-len (#2959) 2025-07-23 15:11:39 +08:00
Jiang-Jia-Jun 5b59a97030 Update README.md 2025-07-23 13:52:14 +08:00
Jiang-Jia-Jun 475dc6d84e Update README.md 2025-07-23 13:47:31 +08:00
chen ad202272ed 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942) 2025-07-23 13:02:50 +08:00
lizhenyun01 e51f018577 support chunk_prefill in fa3 2025-07-23 12:19:20 +08:00
Ryan 95b5af24db [SOT] Add sot warmup (NVIDIA GPU Only) (#2929)
Deploy GitHub Pages / deploy (push) Has been cancelled
* add sot warmup

* fix code style

* change batch_size list

* add param to config

* rm free_list settings && set sot_warmup_sizes

* finish debug with dynamic dims by type annotations

* add profile_run guard

* rm sth useless
2025-07-22 21:36:14 +08:00
Sunny-bot1 7c5e34e72d [FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS (#2967)
* fix rejection sampling when topp=0

* fix
2025-07-22 05:53:37 -07:00
gaoziyuan dbe6225b33 fix rl config local rank (#2957) 2025-07-22 04:39:54 -07:00
GoldPancake 9b84d51e25 [MTP Fix] Fix code and register cpp operators (#2965) 2025-07-22 19:36:24 +08:00
K11OntheBoat 93bb68aa71 [Feature] Marlin MoE backend supports DeepseekV3 (#2962)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 18:11:15 +08:00
GoldPancake dc67c10a7e [Feature][MTP]Support multi-step MTP (#2952) 2025-07-22 16:26:29 +08:00
luukunn 920e6b3f60 [Fix]fix empty prompt_token_ids,update the parser's triggering condit… (#2891) 2025-07-22 16:13:05 +08:00
Zero Rains 89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
Nyakku Shigure 48e6a0ca26 [SOT] Mark dynamic dims by type annotations (#2771)
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations

* fix conflict of forward_meta

* mark more attn backend

* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS

* auto infer implicit 0 dim dynamic dim

* revert manual marked dims

* revert missing update

* auto infer can use unsafe code in warmup stage

* check -> type_match

* fix codestyle

* restore blank line

* empty commit

* add need_warmup nonlocal;

* add doc for resolver

* add missing type hints

* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
K11OntheBoat e991777757 [Feature] DeepseekV3 use pd_build_static_op (#2948)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 15:03:41 +08:00
李泳桦 2a8a2c06de [fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951) 2025-07-22 14:35:56 +08:00
lifulll 2c6a9e887e native top_p_sampling (#2901) 2025-07-22 14:09:59 +08:00
gaoziyuan 0eedbdaee0 fix import error (#2944) 2025-07-22 14:06:01 +08:00
K11OntheBoat 8020927f50 [BugFix] Rename attention params of deepseekv3 (#2939)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun 56102e91e1 [Polish] Return error message of raw_request (#2946)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-22 10:21:32 +08:00
zhink 0262ef7eb3 custom all reduce support cuda graph (#2938)
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag

* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周 ff4569f135 remove some code in ep.py (#2947) 2025-07-21 22:44:57 +08:00
李泳桦 8a619e9db5 [Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940)
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body

* [fix] return_token_ids not working in curl request

* [test] improve some test cases of return_token_ids and prompt_token_ids

* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
littledgg 2845bde964 [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph (#2936)
* [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph

* Fix: Apply black formatting
2025-07-21 16:25:51 +08:00
Yuanle Liu 2f74e93d7e use dist.all_reduce(min) to sync num_blocks_local (#2933)
* pre-commit all files check

* reduce min num_blocks_local

* fix nranks=1

* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
lizexu123 67990e0572 [Feature] support min_p_sampling (#2872)
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p

* add test_min_p

* fix

* min_p_sampling

* update

* delete vl_gpu_model_runner.py

* fix

* Align usage of min_p with vLLM

* fix

* modified unit test

* fix test_min_sampling

* pre-commit all files

* fix

* fix

* fix

* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
gaoziyuan 95a214ae43 support trainer_degree in name_mapping (#2935) 2025-07-20 23:12:55 -07:00
YuanRisheng bce2c6cd7c rename test dir (#2934) 2025-07-21 14:05:45 +08:00
ltd0924 cc4cec0a74 Update engine_client.py (#2931) 2025-07-21 11:42:16 +08:00
liddk1121 17c5d3a241 [Iluvatar GPU] Add CI scripts (#2876) 2025-07-21 09:44:42 +08:00
周周周 8c5407d9e4 remove cum_offsets from ForwardMeta (#2925)
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-19 23:57:27 +08:00
Zero Rains 25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ZhangYulongg b8676d71a8 update ci cases
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-18 21:44:07 +08:00
ZhangYulongg 43976138de update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg e546e6b1b0 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg 9c8292fb19 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg a5e95013b5 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg 93481a5478 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg eb77b1be6d update ci cases 2025-07-18 21:44:07 +08:00
ming1753 5328daa333 [Bug Fix] fix ep config bug (#2920) 2025-07-18 19:12:56 +08:00
xiaoxiaohehe001 a42fc3f40b [Feature] Support 45tVL EP FP8 Infer. (#2909)
* support_mm_ep_fp8

* support_mm_ep
2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun fbe3547c95 [Feature] Support include_stop_str_in_output in chat/completion (#2910)
* [Feature] Support include_stop_str_in_output in chat/completion

* Add ci test for include_stop_str_in_output

* Update version of openai

* Fix ci test

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-18 16:59:18 +08:00
gaoziyuan 6efad14b95 support vl ori_vacab_size (#2900) 2025-07-18 16:26:14 +08:00
周周周 d306944f4f remove cum_offsets from get_block_shape_and_split_kv_block (#2913)
* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove cum_offsets from get_block_shape_and_split_kv_block

* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
YUNSHEN XIE e81137e581 fix ci workflow (#2896) 2025-07-18 16:01:00 +08:00
RAM cd52dc0f65 [Executor] Fix set capture sizes bug (#2902) 2025-07-18 15:12:19 +08:00
周周周 1339e56282 [XPU] Remove padding_offsets from get_padding_offset.cu (#2911) 2025-07-18 14:16:44 +08:00
YuanRisheng 0eb5dc18d3 [BugFix]Fix sample rejection (#2908)
* fix config

* fix rejection
2025-07-18 13:44:30 +08:00