FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 09:31:48 +08:00

Author	SHA1	Message	Date
freeliuzc	a39a67334c	fix mtp bug in pd-split mode (#2970 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-23 15:31:16 +08:00
YuBaoku	6c4cfd9359	[CI] add codestyle_check action (#2972 ) * [CI] add codestyle_check action * [CI] Integrate codestyle check via pre-commit in GitHub Actions	2025-07-23 15:21:56 +08:00
lizexu123	9b22b8d2c3	delete max-len (#2959 )	2025-07-23 15:11:39 +08:00
Jiang-Jia-Jun	5b59a97030	Update README.md	2025-07-23 13:52:14 +08:00
Jiang-Jia-Jun	475dc6d84e	Update README.md	2025-07-23 13:47:31 +08:00
chen	ad202272ed	【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942 )	2025-07-23 13:02:50 +08:00
lizhenyun01	e51f018577	support chunk_prefill in fa3	2025-07-23 12:19:20 +08:00
Ryan	95b5af24db	[SOT] Add sot warmup (NVIDIA GPU Only) (#2929 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details * add sot warmup * fix code style * change batch_size list * add param to config * rm free_list settings && set sot_warmup_sizes * finish debug with dynamic dims by type annotations * add profile_run guard * rm sth useless	2025-07-22 21:36:14 +08:00
Sunny-bot1	7c5e34e72d	[FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS (#2967 ) * fix rejection sampling when topp=0 * fix	2025-07-22 05:53:37 -07:00
gaoziyuan	dbe6225b33	fix rl config local rank (#2957 )	2025-07-22 04:39:54 -07:00
GoldPancake	9b84d51e25	[MTP Fix] Fix code and register cpp operators (#2965 )	2025-07-22 19:36:24 +08:00
K11OntheBoat	93bb68aa71	[Feature] Marlin MoE backend supports DeepseekV3 (#2962 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-07-22 18:11:15 +08:00
GoldPancake	dc67c10a7e	[Feature][MTP]Support multi-step MTP (#2952 )	2025-07-22 16:26:29 +08:00
luukunn	920e6b3f60	[Fix]fix empty prompt_token_ids,update the parser's triggering condit… (#2891 )	2025-07-22 16:13:05 +08:00
Zero Rains	89a485b69f	[Feature] Support using prefix-caching + cudagraph for inference (#2924 ) * fix the bug in cudagraph+prefix-caching but still have some bug with profile Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397 * add the signal to make sure cache manager launched * fix judge condition * reomove useless control * update control stream * update * fix xpu * change the do_profile flag * update * add new threads to init cache_manager --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-07-22 00:59:45 -07:00
Nyakku Shigure	48e6a0ca26	[SOT] Mark dynamic dims by type annotations (#2771 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details * [SOT] Mark dynamic dims by type annotations * fix conflict of forward_meta * mark more attn backend * fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS * auto infer implicit 0 dim dynamic dim * revert manual marked dims * revert missing update * auto infer can use unsafe code in warmup stage * check -> type_match * fix codestyle * restore blank line * empty commit * add need_warmup nonlocal; * add doc for resolver * add missing type hints * unquote "ForwardMeta"	2025-07-22 00:23:52 -07:00
K11OntheBoat	e991777757	[Feature] DeepseekV3 use pd_build_static_op (#2948 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-07-22 15:03:41 +08:00
李泳桦	2a8a2c06de	[fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951 )	2025-07-22 14:35:56 +08:00
lifulll	2c6a9e887e	native top_p_sampling (#2901 )	2025-07-22 14:09:59 +08:00
gaoziyuan	0eedbdaee0	fix import error (#2944 )	2025-07-22 14:06:01 +08:00
K11OntheBoat	8020927f50	[BugFix] Rename attention params of deepseekv3 (#2939 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun	56102e91e1	[Polish] Return error message of raw_request (#2946 ) Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-22 10:21:32 +08:00
zhink	0262ef7eb3	custom all reduce support cuda graph (#2938 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details * Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag * rename communication_op to communication	2025-07-21 22:52:03 +08:00
周周周	ff4569f135	remove some code in ep.py (#2947 )	2025-07-21 22:44:57 +08:00
李泳桦	8a619e9db5	[Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940 ) * [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body * [fix] return_token_ids not working in curl request * [test] improve some test cases of return_token_ids and prompt_token_ids * [fix] the server responds ok even if request.messages is an empty list	2025-07-21 19:31:14 +08:00
littledgg	2845bde964	[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph (#2936 ) * [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph * Fix: Apply black formatting	2025-07-21 16:25:51 +08:00
Yuanle Liu	2f74e93d7e	use dist.all_reduce(min) to sync num_blocks_local (#2933 ) * pre-commit all files check * reduce min num_blocks_local * fix nranks=1 * pre-commit when commit-msg	2025-07-21 01:23:36 -07:00
lizexu123	67990e0572	[Feature] support min_p_sampling (#2872 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fastdeploy support min_p * add test_min_p * fix * min_p_sampling * update * delete vl_gpu_model_runner.py * fix * Align usage of min_p with vLLM * fix * modified unit test * fix test_min_sampling * pre-commit all files * fix * fix * fix * fix xpu_model_runner.py	2025-07-20 23:17:59 -07:00
gaoziyuan	95a214ae43	support trainer_degree in name_mapping (#2935 )	2025-07-20 23:12:55 -07:00
YuanRisheng	bce2c6cd7c	rename test dir (#2934 )	2025-07-21 14:05:45 +08:00
ltd0924	cc4cec0a74	Update engine_client.py (#2931 )	2025-07-21 11:42:16 +08:00
liddk1121	17c5d3a241	[Iluvatar GPU] Add CI scripts (#2876 )	2025-07-21 09:44:42 +08:00
周周周	8c5407d9e4	remove cum_offsets from ForwardMeta (#2925 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-19 23:57:27 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
ZhangYulongg	b8676d71a8	update ci cases Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-18 21:44:07 +08:00
ZhangYulongg	43976138de	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	e546e6b1b0	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	9c8292fb19	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	a5e95013b5	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	93481a5478	update ci cases	2025-07-18 21:44:07 +08:00
ZhangYulongg	eb77b1be6d	update ci cases	2025-07-18 21:44:07 +08:00
ming1753	5328daa333	[Bug Fix] fix ep config bug (#2920 )	2025-07-18 19:12:56 +08:00
xiaoxiaohehe001	a42fc3f40b	[Feature] Support 45tVL EP FP8 Infer. (#2909 ) * support_mm_ep_fp8 * support_mm_ep	2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun	fbe3547c95	[Feature] Support include_stop_str_in_output in chat/completion (#2910 ) * [Feature] Support include_stop_str_in_output in chat/completion * Add ci test for include_stop_str_in_output * Update version of openai * Fix ci test --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-18 16:59:18 +08:00
gaoziyuan	6efad14b95	support vl ori_vacab_size (#2900 )	2025-07-18 16:26:14 +08:00
周周周	d306944f4f	remove cum_offsets from get_block_shape_and_split_kv_block (#2913 ) * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove padding_offsets from get_padding_offset.cu * remove cum_offsets from get_block_shape_and_split_kv_block * remove cum_offsets from get_block_shape_and_split_kv_block	2025-07-18 16:13:32 +08:00
YUNSHEN XIE	e81137e581	fix ci workflow (#2896 )	2025-07-18 16:01:00 +08:00
RAM	cd52dc0f65	[Executor] Fix set capture sizes bug (#2902 )	2025-07-18 15:12:19 +08:00
周周周	1339e56282	[XPU] Remove padding_offsets from get_padding_offset.cu (#2911 )	2025-07-18 14:16:44 +08:00
YuanRisheng	0eb5dc18d3	[BugFix]Fix sample rejection (#2908 ) * fix config * fix rejection	2025-07-18 13:44:30 +08:00

1 2 3 4 5 ...

2775 Commits