Commit Graph

27 Commits

Author SHA1 Message Date
gongweibao a6351dea0b [BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533)
* init

* init

* fix format

* add

* add files

* add ut

* fix some

* add ut

* add more

* add

* fix pre-commit

* fix pre-commit

* fix cover

* skip long seq

* add

* add

* fix

* remove not need

* fix set attr

* fix comments

* fix comments

* fix failed tests

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
2026-03-16 21:32:43 +08:00
zccjjj a2072fe20c [XPU] support warmup with ep & remove apply_tp_fused_op (#6289) 2026-02-28 15:40:36 +08:00
yinwei 1e3c35496c [XPU][Graph Optimization] XPU Support CUDAGraph (#6152)
* support cuda graph
2026-01-22 14:41:56 +08:00
zhupengyang 45ebb2efb4 [XPU] support plugin model (#6092) 2026-01-20 13:00:09 +08:00
luukunn 93b7675a64 [Feature]Report FD statistical information (#5646)
* add usage commit

* update envs and xpu

* add requirements

* fix quantization value

* add unit test

* add unit test

* fix unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* fix FD_USAGE_STATS_SERVER

* fix

* fix

* add doc

* add doc

* add doc

* add doc

* add doc

* fix file name
2026-01-14 17:54:01 +08:00
cmcamdy 9f4977eb74 [xpu] support mtp for xpu(mix) (#5274)
* [XPU] support kernel for mtp(base)

* [XPU] support kernel for mtp(base)

* format

* format

* format

* fix gather next token

* fix step && add test

* fix

* mv pre/post process

* add adjust batch / gather next token for mtp

* fix code style

* fix mtp kenrel name

* fix mtp kernel test

* mv xpu pre/post process

* mv xpu pre/post process

* [xpu] support mtp

* fix code style
2025-12-01 11:03:14 +08:00
zhupengyang 2fd254e5b7 support ep+tp at op layer (#4688) 2025-11-05 11:15:57 +08:00
yyssys cd9195d54c [XPU]Modify the xpu memory display unit of log (#4534) 2025-10-22 12:46:01 +08:00
ddchenhao66 14785eb65d [XPU] abstract a hardware-agnostic operator wrapper for prefix cache and specify xpu device id definition (#4455)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-17 14:05:33 +08:00
YuanRisheng 0355235fb9 [FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400)
* delete some attr in parallel config

* delete comment

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-16 20:00:37 +08:00
ddchenhao66 8e392f0ea6 [XPU] support prefix cache (#4423)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-16 11:27:41 +08:00
Lucas 87179cb744 [XPU] support XPU VL model inference (#4030)
* [XPU] support XPU VL model inference

* fix image op import and device check

* rebase develop

* fix perf
2025-09-25 14:34:15 +08:00
zhupengyang 9409665713 [xpu] support ep (#4067)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-15 13:53:11 +08:00
co63oc d6369b4d51 fix typos (#3684) 2025-09-01 17:50:17 +08:00
qw86972190 c83381d650 revert pr (#3481)
Co-authored-by: iosmers <yinwei_hust@163.com>
2025-08-21 14:19:50 +08:00
lizexu123 afff4d37ea [Feature] support seed parameter (#3161)
* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix
2025-08-06 15:20:47 +08:00
lizexu123 b01cfd6007 [BugFix] support real batch_size (#3109)
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
2025-08-05 16:33:54 +08:00
yinwei 3a4db15765 Fix out-of-memory issue during single-XPU deployment (#3133) 2025-08-01 17:12:03 +08:00
Ryan 73cfe1fd37 [SOT] Extend SOT warmup support to new hardware (#3032)
* add new hardware

* add_sot_warmup4new_hardware

* fix conflict

* rm Optional
2025-07-29 22:45:20 +08:00
yinwei f2a528f9ae [XPU] Support kvblock centralized management (#3017) 2025-07-29 10:40:55 +08:00
YuanRisheng 6ccc10ad47 Unify server-side and model-side Config (Part1) (#3018)
* move cache config

* fix mtp
2025-07-28 10:51:52 +08:00
ltd0924 3792345c3a [LLM] update function name (#2985)
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Zero Rains 89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
Zero Rains 25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
YuanRisheng 101ad33332 [BugFix] Fix Configs (#2849)
* fix config

* fix config
2025-07-15 19:50:36 -07:00
yulangz 0350831c2b fix xpu offline demo garbled output (#2763) 2025-07-09 14:51:20 +08:00
Jiang-Jia-Jun 92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00