FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
gongweibao	a6351dea0b	[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 ) * init * init * fix format * add * add files * add ut * fix some * add ut * add more * add * fix pre-commit * fix pre-commit * fix cover * skip long seq * add * add * fix * remove not need * fix set attr * fix comments * fix comments * fix failed tests --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-16 21:32:43 +08:00
Jiang-Jia-Jun	d113397b09	Simplify available_blocks assignment logic (#6819 )	2026-03-16 20:12:30 +08:00
Longzhi Wang	5c92f4d0cd	[Feature] Add deepgemm bias epilogue for SM100 (#6857 ) * [Feature] Add deepgemm bias epilogue for SM100 * fix	2026-03-16 20:12:00 +08:00
Jiang-Jia-Jun	bd4b6092dd	Update title and activity section in README_CN.md	2026-03-16 19:21:50 +08:00
Jiang-Jia-Jun	c5f402e7aa	Update title and release note in README_CN.md	2026-03-16 19:17:38 +08:00
AIbin	c9f7f5234e	[Optimization][BugFix]Optimize Deepseek networking code (#6861 ) * update dsk model * update dsk model	2026-03-16 16:52:43 +08:00
ming1753	bb925c605f	[Other] Adjust GPUModelRunner to enhance compatibility (#6851 )	2026-03-16 14:49:19 +08:00
jc	04fde3b227	[PD Disaggregation] Prefill and decode support cache storage (#6768 ) * Prefill and decode support cache storage * up * up * update docs and refine mooncake store * up	2026-03-16 14:44:49 +08:00
mayang002	72ff7bf4cd	[XPU] Fix wrapper files (#6830 ) - Add WRAPPER_CHECK_PTR for pointer validity checks - Add WRAPPER_ASSERT_GT/GE/LE for parameter range validation - Simplify wrapper function calls to direct return pattern	2026-03-16 14:39:40 +08:00
gongweibao	3fabba0dc7	[Feature] Add Triton unified attention kernel for deterministic inference (#6795 ) * [Feature] Add Triton unified attention kernel for deterministic inference Add a Triton-based unified extend attention kernel that processes both prefix (cached) and extend (new) KV tokens through a single kernel with unified kv_indices, ensuring identical accumulation order regardless of cache hit/miss patterns. Key components: - _fwd_kernel_unified: Triton JIT kernel with online softmax, paged KV cache support, and causal masking for prefix+extend - Index building utilities: triton_cumsum_with_zero_prefix, build_kv_indices_from_block_tables, build_unified_kv_indices, _scatter_extend_kv_indices_kernel (all CUDA Graph compatible) - pre_cache_len_concat_triton: GPU-only replacement for C++ op - Reference implementations (_ref variants) for correctness validation - Comprehensive tests: kernel correctness, split invariance, determinism, production-scale, cross-validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Vectorize causal mask in test references for ~26x speedup Replace triple Python for-loop with paddle.where vectorized mask in naive_attention and _build_causal_mask. seq4096 test: 2m39s -> 6s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix cover --------- Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 14:29:45 +08:00
Yonghua Li	7c8c0a3c02	[BugFix] replace ftok with custom_ftok in get_output/save_output ops (#6822 ) * [BugFix] replace ftok with custom_ftok in get_output/save_output ops * [Test] add unit test for custom_ftok * [Chore] create custom_ftok.h * [Chore] reorganize header file * [Fix] fix cache messager msg_queue_id+rank_id conflict	2026-03-16 14:22:18 +08:00
fxyfxy777	4d39232553	[BugFix] add ut for fused_moe_degemm (#6840 ) * add ut * add skip	2026-03-16 12:22:18 +08:00
周周周	091e3c815d	Dsa clean code，add dsk_attn_write_cache baseline (#6855 )	2026-03-16 11:01:14 +08:00
周周周	820eb60ec6	[Others] clean code (#6839 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-03-14 11:09:28 +08:00
yinwei	3f4441b4b7	[XPU]add mtp cudagraph support (#6831 )	2026-03-13 19:46:53 +08:00
cmcamdy	7591e0d6bc	fix eb5 mtp(mix) (#6800 )	2026-03-13 17:36:57 +08:00
周周周	8c1a2827d3	DSA clean code (#6827 )	2026-03-13 16:39:47 +08:00
mouxin	49fe68a518	[Docs] Update Golang Router FAQ (#6829 )	2026-03-13 15:48:36 +08:00
freeliuzc	12f412448b	[Speculative Decoding] Fix speculate stop_seqs and fix accept_num in eos branch (#6825 )	2026-03-12 23:48:24 -07:00
gongweibao	8906e09e0f	[Feature][OP] Add batch-invariant RMSNorm kernel and TP embedding Custom AR path (#6749 ) * [Feature] Add batch-invariant RMSNorm kernel and TP embedding Custom AR path - Add Triton-based rms_norm_batch_invariant kernel for M-invariant RMSNorm - Add linear/linear_v2 tracking wrappers in batch_invariant_mode - Route TP VocabParallelEmbedding through Custom AR instead of NCCL - Increase FD_CUSTOM_AR_MAX_SIZE_MB default from 8 to 64 - Add unit tests for RMSNorm and TP embedding invariance * [Fix] Fix test tolerances for bfloat16 RMSNorm and custom AR buffer size - Relax bfloat16 atol from 1e-3 to 1e-2 for D=3584 in RMSNorm numerical correctness test (0.0078125 diff is expected at bfloat16 precision) - Update test_communication expected buffer size from 8MB to 64MB to match FD_CUSTOM_AR_MAX_SIZE_MB default change in envs.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add RMSNorm layer batch_invariant_mode unit test for coverage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add pragma no cover for Triton kernel and multi-GPU embedding path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 14:34:44 +08:00
fxyfxy777	8eb177147c	[BugFix]rm draft code for glm (#6810 ) * rm draft code for glm * fix baseline * fix baseline 2	2026-03-12 23:26:05 -07:00
AIbin	2b8a5b0d81	update indexer model (#6791 )	2026-03-13 14:11:39 +08:00
kesmeey	d935752be7	[CI] 【Hackathon 10th Spring No.20】功能模块 fastdeploy/engine/common_engine.py 单测补充 (#6292 ) * style: format tests/engine/test_common_engine.py with black * test: expand common engine coverage * test: add coverage helper for common_engine * style: format test_common_engine with pre-commit * Remove test_force_coverage_for_common_engine test * Update common engine coverage tests Expand common engine tests and helpers while aligning setup and cleanup behavior. * Fix test_schedule_request_to_worker_v1 by mocking num_tasks to return 0 * Sync test_common_engine with branch 26 * chore: fix codestyle in common engine tests --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-03-13 13:16:07 +08:00
liufengwei0103	62110045f3	[RL] add stream guard (#6814 ) * add stream guard * format	2026-03-13 11:22:26 +08:00
bukejiyu	586e6f38b1	[Others]Limit transformers version (#6806 )	2026-03-12 20:20:15 -07:00
MingkunZhang	cb5a742298	[Metax][Test] enable paddleocr using cudagraph (#6820 )	2026-03-13 10:47:25 +08:00
mayang002	1f9f889e37	[XPU] refactor: XPU plugin namespace migration (#6799 ) * [XPU] refactor: XPU plugin namespace migration - Migrate wrapper layer namespace from baidu::xpu::api::plugin to fastdeploy::plugin - Migrate kernel layer namespace from xpu3::plugin to fd_xpu3 - Add api:: prefix for types (Context, SUCCESS, XPUIndexType, ctx_guard) - Remove XPU2 support, keep only XPU3 - Update ops/ directory to use new namespace Total: 137 files changed * [XPU] fix: add return value check and correct error messages - Add PADDLE_ENFORCE_XDNN_SUCCESS check for speculate_get_logits and update_attn_mask_offsets - Fix empty error message in draft_model_postprocess - Correct function name in speculate_schedule_cache error message - Update error messages from 'xpu::plugin::' to 'fastdeploy::plugin::'	2026-03-13 10:21:51 +08:00
YuBaoku	d73fd876ba	[CI] Add daily build_linux jobs for CUDA 13.0 (#6809 )	2026-03-12 22:04:58 +08:00
YuBaoku	ab0eacb1ab	[CI] Update _build_linux_rl.yml to use Paddle installation method with URL	2026-03-12 20:37:51 +08:00
huicongyao	2e63d88f7a	[Optimization][Speculative Decoding]Fuse padding sampling params (#6765 ) * optimize speculate pre process unit test * Add CUDA kernel for building sampling params in speculative decoding * init infer seed in device * format code * add unittest & fix * fix * format-code * format-code * fix rebase * . * fix unitest	2026-03-12 05:05:15 -07:00
MingkunZhang	a9ace998db	[Metax][Fix] fix ci error based pr#6805 caused by pr#6685 (#6807 )	2026-03-12 19:30:16 +08:00
yzwu	901b38c936	[Iluvatar] Optimize decode group_gemm and Support cuda graph for ernie (#6803 )	2026-03-12 19:21:17 +08:00
fxyfxy777	250ce40b40	[Feature] use phi permute/unpermute & rm swiglu (#6361 ) * tp文字输出正常 * B eb5 mini文字输出正常 * eb5mini ep B卡文字输出正常 * default use phi moe op * stash * tp H卡正常 * ep ok * rm debug * rm debug tool * rm del ffn_out * rm swiglu * add envs to swiglu * merge dev * fix ci baseline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix ci baseline 2 --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 02:01:57 -07:00
Jiaxin Sui	a3d7979711	[XPU][CI]Rename test_ep4tp1_online.py to run_ep4tp1_online.py (#6805 )	2026-03-12 16:16:20 +08:00
RAM	cdaf6dd400	[RL][Cherry-Pick] Support Fully Async and PrefixCache (#6599 ) * cherry-pick Support Fully Async and PrefixCache step 1 * copy routing_indices_cache.py from 2.4 * cherry-pick [RL] R3 Fix the bug for determining the end of a request (#6388) * cherry-pick [RL] Clear Requests status of R3 (#6569) * delete code * fix rename bug * fix status shape bug * fix ci	2026-03-12 01:13:30 -07:00
mouxin	1ed6073d94	[Feature] Update logging for Golang Router (#6801 )	2026-03-12 15:18:31 +08:00
qwes5s5	e0febf36be	fix debug log (#6766 )	2026-03-12 14:46:01 +08:00
cmcamdy	3543088d3e	[XPU] rm stop nums (#6651 ) * rm stop nums * fix conflict --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-03-12 14:05:58 +08:00
yinwei	7d31a728d1	Add PD+EP cudagraph Support	2026-03-12 13:20:59 +08:00
Jiang-Jia-Jun	1fef825997	Fix environment variable name for KV cache lock	2026-03-12 11:24:07 +08:00
YuBaoku	deff121a5f	[CI] Update _build_linux_rl.yml to use cu129 nighlty	2026-03-11 23:58:07 +08:00
yzwu	f0ab8ee793	[Iluvatar][CI] add triton in requirements_iluvatar.txt (#6788 )	2026-03-11 20:39:03 +08:00
Jiajun Ji	88c4fbf8e1	[XPU] Add speculate_limit_thinking_content_length Op. (#6627 ) * [XPU] Add speculate_limit_thinking_content_length OP for xpu. * add unittest. * format codes. * format codes. * format codes. * Fix unused kernel launch return value. --------- Co-authored-by: cmcamdy <1027740945@qq.com>	2026-03-11 17:30:17 +08:00
RichardWooSJTU	9f0778f991	[Feature] Support EP prefill with num_worst_tokens (#6574 ) * support num worst tokens * support num worst tokens * fix build error * support num worst tokens: fix errors * support num worst tokens: fix feild * support num worst tokens: delete requiements * replace permute and depermute op by pure cuda * replace permute and depermute op by pure cuda * fix ci * fix op * fix nan * fix code style --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-11 17:09:07 +08:00
jc	0466c7e8a8	Set MC_TCP_BIND_ADDRESS for mooncake store (#6782 )	2026-03-11 16:56:39 +08:00
AIbin	1118351b27	[Optimization] Update Deepseekv3.2 model and dsa-indexer networking and add some unitest (#6762 ) * add deepseek model doc * update deepseek model doc * update deepseek model doc * update deepseek model doc * cwb suppor DSK_V32 Model * update DSK_V32_DSA modeling * Ibin Support DSK_DSA * update kernel * update yaml * update requirements * update pre_commit * update model-runner * fix CI bug * del start.sh * fix iluvatar_model_runner * update DSA & add unitest * update import deep_gemm	2026-03-11 15:52:54 +08:00
CSWYF3634076	97a4b3631e	[Processor]add qwen3vl prompt_token_ids support (#6764 ) * [Processor]add qwen3vl prompt_token_ids support * [Processor]add qwen3vl prompt_token_ids support unittest * [Processor]add qwen3vl prompt_token_ids support precommit	2026-03-11 15:08:56 +08:00
bukejiyu	cffa8c246c	[Others]update paddleformer 1.0.0 (#6496 ) * update paddleformer 1.0.0 * update	2026-03-11 15:06:29 +08:00
Yonghua Li	7811eeccaa	[fix] resolve get_save_output_v1 socket name conflicts between multiple instances (#6758 )	2026-03-11 15:02:32 +08:00
freeliuzc	cf7934a4b2	[Speculative Decoding] Unify Spec and non-spec branch (#6685 ) * optimize spec-inference architecture * delete debug log * optimize spec_method usage && fix unit_test * add claude unit-test skill * fix some ugly bug * enhance robustness and bounds check * unify method & spec_method to method to avoid bug * activate CI * fix unit test * Unify logprobs computation for naive and speculative decoding, fix CUDA kernel * fix logprob bug && optimize verify kernel * fix exist_decode() judge	2026-03-10 23:58:44 -07:00

1 2 3 4 5 ...

4818 Commits