Zhang Yulong
0eb32bb9c8
add cases ( #3155 )
2025-08-01 18:38:57 +08:00
yangjianfengo1
64d7a3194d
集中式支持fa3 ( #3112 )
2025-08-01 18:03:36 +08:00
YUNSHEN XIE
bdb83e007d
fix ci ( #3141 )
2025-08-01 17:42:26 +08:00
Divano
50db0d7ba9
add case ( #3150 )
...
* add test base class
* fix codestyle
* fix codestyle
* add base chat
2025-08-01 17:30:58 +08:00
Ryan
94264bbf60
[Code Simplification] Refactor Post-processing in VL Model Forward Method ( #2937 )
...
* rm sth useless
* refactor model forward
* mv bool index to kernel
2025-08-01 17:28:07 +08:00
yinwei
3a4db15765
Fix out-of-memory issue during single-XPU deployment ( #3133 )
2025-08-01 17:12:03 +08:00
JYChen
c34088b0fd
fix stop seq unittest ( #3126 )
2025-08-01 16:50:05 +08:00
ming1753
fc5f43c6bc
[Docs] Optimal Deployment ( #2768 )
2025-08-01 11:56:27 +08:00
chen
a2f5cc54f8
moe preprocess op support 160 experts and fused_moe triton kernel name add K ( #3121 )
2025-08-01 10:46:20 +08:00
Divano
1d93565082
[CE] Add base test class for web server testing ( #3120 )
...
* add test base class
* fix codestyle
* fix codestyle
2025-07-31 23:28:50 +08:00
YUNSHEN XIE
e1011e92d9
disable test_cuda_graph.py ( #3124 )
2025-07-31 22:03:48 +08:00
plusNew001
8c63237cfa
[CI] add xpu ci case ( #3111 )
...
* [CI] add xpu ci case
* [CI]Update run_ci_xpu.sh
2025-07-31 22:03:34 +08:00
YUNSHEN XIE
ff6a109b4d
Describe PR diff coverage using JSON file ( #3114 )
...
* Refactored ci pipeline
* update
* Describe PR diff coverage using JSON file
* remove pip cache setting from Approve
* fix
* update
2025-07-31 21:59:20 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
chenjian
fe17410f9c
[BUG] Fix bug for pd in fd ( #3034 )
...
* Fix bug for pd in fd
* Fix bug for pd in fd
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:17:27 +08:00
Zhang Yulong
1a543bca29
Fix test_EB_Lite_serving.py ( #3119 )
...
* Fix test_EB_Lite_serving.py
* fix test_EB_Lite_serving.py
2025-07-31 20:15:25 +08:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
kevin
22cab724e8
[Feature] block scheduler v1 support prefix caching ( #3061 )
...
* block scheduler v1 support prefix cache
* update code
* update code
* fix code bug
* add timeout time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:29:19 +08:00
chenjian
32307283f1
Fix bug for offline inference in scheduler v1 ( #3117 )
2025-07-31 17:54:24 +08:00
YUNSHEN XIE
583eae2fd1
fix ci ( #3106 )
...
* fix ci
* disable test_non_streaming_chat_with_min_tokens
2025-07-31 17:25:08 +08:00
JYChen
1ef38b1563
[doc] best practice for eb45 text models ( #3002 )
...
* [doc] best practice for eb45 text models
* fix docs
2025-07-31 17:21:55 +08:00
Jiang-Jia-Jun
4498058722
Update README.md
2025-07-31 15:33:12 +08:00
Jiang-Jia-Jun
66304cf921
Update sampling.md
2025-07-31 15:02:57 +08:00
yinwei
5b9aec1f10
xpu release 2.0.3 ( #3105 )
2025-07-31 14:26:07 +08:00
YUNSHEN XIE
66c3835a46
add approve ci ( #3093 )
...
* add approve ci
* fix
* fix
2025-07-31 10:10:10 +08:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00
Jiang-Jia-Jun
998968f1e8
[Doc] Update parameters of serving
2025-07-30 22:35:01 +08:00
chenjian
fe0e3f508b
[BUG FIX] Fix bug when preempted request rescheduled ( #3080 )
...
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun
0616c208d2
[Feature] Support include_stop_str_in_output in completion api ( #3096 )
...
* [Feature] Support include_stop_str_in_output in completion api
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 22:18:48 +08:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19
[Feature] support ep in mixed mode ( #3001 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
2025-07-30 20:43:39 +08:00
JYChen
bd29b2aaca
add stop_seqs doc ( #3090 )
2025-07-30 20:36:18 +08:00
Jiang-Jia-Jun
6ead7a3a49
Update setup.py
2025-07-30 20:21:41 +08:00
YUNSHEN XIE
e4ba9a0dde
debug use ( #3095 )
2025-07-30 20:18:36 +08:00
Zhida Hu
3f8a41e68c
[*] fix the memory leak when modify qp to rts failed ( #3051 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-30 19:49:07 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
bukejiyu
db698bda01
qwen loader ( #3057 )
2025-07-30 19:09:38 +08:00
AIbin
28fff1b035
Revert "Add uinttest for moe_ffn_wint2. ( #3037 )" ( #3085 )
...
This reverts commit 327e1943fa .
2025-07-30 19:04:07 +08:00
YuanRisheng
acc5c0aa85
add ci for custom op approve ( #3079 )
2025-07-30 16:50:20 +08:00
zhink
d89b6dd43f
adapter qwen3 moe attr for init ( #3066 )
...
adapter qwen3 moe attr for init
2025-07-30 16:49:28 +08:00
bukejiyu
8e203666d9
w4a8 offline ( #3074 )
...
* w4a8 offline
* update
* update
* update
2025-07-30 16:33:30 +08:00
ming1753
5acde4eb43
[Feature] Multimodal Scheduler V1 ( #3019 )
...
* [Feature] Support multimodal scheduler v1
* remove debug log
* fix bug
* fix format
* modify code
* fix bug
* fix bug
* fix bug
* modify code
2025-07-30 16:05:55 +08:00
Jiang-Jia-Jun
ffa0f4d99b
[Fix] Fix version function ( #3076 )
...
* [Fix] Fix version function
* Fix commit
* Fix commit
* fix code sync
* Update coverage_run.sh
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 16:05:24 +08:00
ltd0924
ecf2fd5b9a
[BugFix] vl encoder tokens dtype problem ( #3069 )
2025-07-30 15:20:53 +08:00
YuanRisheng
eeadbf332a
delete unused unittest ( #3065 )
2025-07-30 15:11:58 +08:00
Yiqun Liu
327e1943fa
Add uinttest for moe_ffn_wint2. ( #3037 )
...
Change-Id: Ifd452527eaf87ea96c3fa4fa9aeb17729b33c2de
2025-07-30 15:03:09 +08:00
Yuan Xiaolan
35935da9e5
support W4A8 EPLB ( #3075 )
2025-07-30 14:34:12 +08:00
Yzc216
159767717d
[Feature] multi source download ( #3072 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
* modify model download path
2025-07-30 14:10:13 +08:00
Zero Rains
4dc130c5a9
[Doc] add repetition early stopping doc ( #3078 )
...
* add repetition early stop doc
* add the early_stop.md
2025-07-29 22:01:57 -07:00