Commit Graph

30 Commits

Author SHA1 Message Date
gongweibao a6351dea0b [BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533)
* init

* init

* fix format

* add

* add files

* add ut

* fix some

* add ut

* add more

* add

* fix pre-commit

* fix pre-commit

* fix cover

* skip long seq

* add

* add

* fix

* remove not need

* fix set attr

* fix comments

* fix comments

* fix failed tests

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
2026-03-16 21:32:43 +08:00
AIbin 54581b8653 [BugFix]fix iluvatar_model_runner about dsa_cache (#6733)
* fix iluvatar_model_runner
2026-03-10 16:10:35 +08:00
yzwu 6674131b0b [Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553) 2026-03-02 14:07:17 +08:00
yzwu 60e75ea8e8 [Iluvatar][CI] Fix cannot import get_stop (#6165) 2026-02-10 16:57:23 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
yzwu dc7facaa7f [Iluvatar GPU] fix ci error caused by rebuild_padding param and cuda graph (#4504) 2025-10-21 21:41:41 +08:00
yzwu 4b661512ca [Iluvatar GPU] Adapt VL model (#4313) 2025-10-17 16:13:38 +08:00
yzwu fbdd6b0663 [Iluvatar GPU] Optimze attention and moe performance (#3234) 2025-08-08 10:51:24 +08:00
lizexu123 afff4d37ea [Feature] support seed parameter (#3161)
* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix
2025-08-06 15:20:47 +08:00
lizexu123 b01cfd6007 [BugFix] support real batch_size (#3109)
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
2025-08-05 16:33:54 +08:00
Sunny-bot1 72ef5a9c93 [FIX]fix bad_words when sending requests consecutively (#3197)
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words

* fix log

* fix log
2025-08-04 05:59:41 -07:00
RAM d850660872 [Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989)
* reset decoder_block_shape_q buffer

* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch

* update decode_max_tile_size

* fix pre-commit

* update block_multihead_attn_backend

* update flas attn backend

* update MLA Attention

* update XPU Attention

* update gcu,iluvatar model runner

* Update MTP

* fix MTP bug
2025-07-31 00:09:31 +08:00
bukejiyu db698bda01 qwen loader (#3057) 2025-07-30 19:09:38 +08:00
Sunny-bot1 74aa31d15b [Feature] support bad_words (#3055)
* support bad_words

* support online infer bad_words

* update

* add CI test

* update

* update

* update

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-30 09:31:29 +08:00
Ryan 73cfe1fd37 [SOT] Extend SOT warmup support to new hardware (#3032)
* add new hardware

* add_sot_warmup4new_hardware

* fix conflict

* rm Optional
2025-07-29 22:45:20 +08:00
YuanRisheng 6ccc10ad47 Unify server-side and model-side Config (Part1) (#3018)
* move cache config

* fix mtp
2025-07-28 10:51:52 +08:00
ltd0924 3792345c3a [LLM] update function name (#2985)
* [LLM] update function name
2025-07-24 15:03:40 +08:00
lizhenyun01 29c3292f02 support c4 attn && fix cache 2025-07-24 12:00:52 +08:00
Zero Rains 89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
Zero Rains 25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ming1753 1f15ca21e4 [Feature] support prompt repetition_penalty (#2806)
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 12:05:52 +08:00
YuanRisheng 101ad33332 [BugFix] Fix Configs (#2849)
* fix config

* fix config
2025-07-15 19:50:36 -07:00
RAM 0fad10b35a [Executor] CUDA Graph support padding batch (#2844)
* cuda graph support padding batch

* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.

* Do not insert max_num_seqs when the user specifies a capture list

* Support set graph optimization config from YAML file

* update cuda graph ci

* fix ci bug

* fix ci bug
2025-07-15 19:49:01 -07:00
YuanRisheng 4c7b8bc458 Simplify the Config code (#2770)
* simplify the code

* fix vl

* delete config

* fix

* perfect code

* fix ci

* fix xpu

* fix xpu

* fix server

* resolve conflict

* fix mtp

* resolve conflict

* fix xpu

* fix xpu

* fix vl

* fix log

* fix qwen moe

* fix qwen moe

* fix qwen moe
2025-07-14 19:50:05 +08:00
Sunny-bot1 f6ad26fc08 fix topp default value (#2814)
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-11 17:10:21 +08:00
Sunny-bot1 240d6236bc [Fix]fix top_k_top_p sampling (#2801)
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix topk-topp

* update

* add base_non_truncated
2025-07-10 22:35:10 +08:00
littledgg 59071268b6 [Executor] Move forward_meta.py to fastdeploy/model_executor (#2774)
* Use PEP 563 in attention.py and fix conflict

* merge commit

* Change what was left out last time
2025-07-10 20:36:51 +08:00
Sunny-bot1 e45050cae3 [Feature] support top_k_top_p sampling (#2753)
* support top_k_top_p sampling

* fix

* add api param

* add api para

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2025-07-09 20:58:58 -07:00
RAM 03a74995b8 Clear dead code And supplementary notes (#2757)
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code

* fix bug of forward meta

* Global modification of forward meta

* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
liddk1121 1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00