Commit Graph

2742 Commits

Author SHA1 Message Date
Zero Rains 25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ZhangYulongg b8676d71a8 update ci cases
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-18 21:44:07 +08:00
ZhangYulongg 43976138de update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg e546e6b1b0 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg 9c8292fb19 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg a5e95013b5 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg 93481a5478 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg eb77b1be6d update ci cases 2025-07-18 21:44:07 +08:00
ming1753 5328daa333 [Bug Fix] fix ep config bug (#2920) 2025-07-18 19:12:56 +08:00
xiaoxiaohehe001 a42fc3f40b [Feature] Support 45tVL EP FP8 Infer. (#2909)
* support_mm_ep_fp8

* support_mm_ep
2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun fbe3547c95 [Feature] Support include_stop_str_in_output in chat/completion (#2910)
* [Feature] Support include_stop_str_in_output in chat/completion

* Add ci test for include_stop_str_in_output

* Update version of openai

* Fix ci test

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-18 16:59:18 +08:00
gaoziyuan 6efad14b95 support vl ori_vacab_size (#2900) 2025-07-18 16:26:14 +08:00
周周周 d306944f4f remove cum_offsets from get_block_shape_and_split_kv_block (#2913)
* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove cum_offsets from get_block_shape_and_split_kv_block

* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
YUNSHEN XIE e81137e581 fix ci workflow (#2896) 2025-07-18 16:01:00 +08:00
RAM cd52dc0f65 [Executor] Fix set capture sizes bug (#2902) 2025-07-18 15:12:19 +08:00
周周周 1339e56282 [XPU] Remove padding_offsets from get_padding_offset.cu (#2911) 2025-07-18 14:16:44 +08:00
YuanRisheng 0eb5dc18d3 [BugFix]Fix sample rejection (#2908)
* fix config

* fix rejection
2025-07-18 13:44:30 +08:00
sg263 e679567d59 [Trace]fix opentelemetry can not work in uvicorn (#2906)
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

* fix opentelemetry can not work in uvicorn

* move conf to env

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 23:16:45 +08:00
RAM bbe2c5c968 Update GraphOptimizationBackend docs (#2898) 2025-07-17 21:38:18 +08:00
ltd0924 4b14dca1d6 [LLM] delete fixed slots (#2893)
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 19:19:54 +08:00
yulangz c8c280c4d3 [XPU][Doc] fix typo (#2892) 2025-07-17 19:13:54 +08:00
周周周 ddb10ac509 [Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
freeliuzc d49f8fb30a [Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890)
* support chunk_prefill both normal and speculative_decoding(mtp)

* optimize pd-disaggregation config

* fix bug
2025-07-17 17:58:08 +08:00
ming1753 67180c1ff9 [Bug Fix] fix bug of prompt penalty (#2888) 2025-07-17 17:21:37 +08:00
Xintong Yu 273efba76f [Fix] remove misleading variables (#2841)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 16:49:14 +08:00
YUNSHEN XIE 1cfba5ba3e enable CI workflow for pull requests targeting release/* branches (#2887) 2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun 31cab9f87b Update test_openai.py 2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun d3dfa1446c Update test_openai.py 2025-07-17 16:07:07 +08:00
ltd0924 b630031414 [LLM] fix serval bugs (#2878) 2025-07-17 14:21:05 +08:00
LokeZhou f50c25178b [MM_PROCESS] add _extract_labels (#2879) 2025-07-17 14:20:01 +08:00
Yuanle Liu dbb9e2506b Fix rollout_model init (#2881) 2025-07-16 22:36:21 -07:00
ming1753 1f15ca21e4 [Feature] support prompt repetition_penalty (#2806)
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 12:05:52 +08:00
yulangz 7dfd2ea052 [XPU][doc] Update minimal fastdeploy required (#2863)
* [XPU][doc] update minimal fastdeploy required
2025-07-17 11:33:22 +08:00
GoldPancake 42d4001400 [Features] Add speculative metrics (#2857) 2025-07-17 11:08:55 +08:00
sg263 52aca233e8 [Trace] fix annotation when add opentelemetry (#2869)
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 10:29:16 +08:00
ltd0924 9c25dcca0b [LLM] Update Multinode Deployment (#2830)
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] fix ci bugs

* Update fastdeploy/engine/args_utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [LLM] update random port

* [LLM] update random port

* [LLM] fix ci bugs

* fix ci bugs

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-16 23:42:54 +08:00
ltd0924 d245d1ca6c [LLM] support send batch data and aggregate data (#2860)
* [LLM] support send batch data and aggregate data

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] update
2025-07-16 23:42:20 +08:00
Yuanle Liu 63d6e7ce06 fix and refine vl (#2866)
* refine vl config

* delete attn_sep

* fix vl accuracy
2025-07-16 05:59:28 -07:00
周周周 aa76085d1f [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
2025-07-16 20:10:57 +08:00
sg263 42b80182e0 [Trace] add opentelemetry (#2852)
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-16 15:33:25 +08:00
Yuanle Liu dda4a9f848 rl update (#2861) 2025-07-16 00:33:10 -07:00
yangjianfengo1 a83a3eea5f 将FLAGS_max_partition_size修改为环境变量获取 (#2854) 2025-07-16 14:14:21 +08:00
xiaoxiaohehe001 0d0340392f [Fix] Fix mm ep weight init. (#2855)
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix_45t_mm

* Update load_weight_utils.py

* Update load_weight_utils.py
2025-07-16 12:02:39 +08:00
YuanRisheng 0253381fb9 fix config (#2858) 2025-07-16 11:40:10 +08:00
freeliuzc 2d1184aefe [Fix] fix expert_parallel bug in decoder stage (#2848) 2025-07-16 11:08:18 +08:00
yulangz 17314ee126 [XPU] Update doc and add scripts for downloading dependencies (#2845)
* [XPU] update xvllm download

* update supported models

* fix xpu model runner in huge memory with small model

* update doc
2025-07-16 11:05:56 +08:00
YuanRisheng 101ad33332 [BugFix] Fix Configs (#2849)
* fix config

* fix config
2025-07-15 19:50:36 -07:00
RAM 0fad10b35a [Executor] CUDA Graph support padding batch (#2844)
* cuda graph support padding batch

* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.

* Do not insert max_num_seqs when the user specifies a capture list

* Support set graph optimization config from YAML file

* update cuda graph ci

* fix ci bug

* fix ci bug
2025-07-15 19:49:01 -07:00
Yuanle Liu 61b3997b85 refactor rl get_name_mappings_to_training (#2847)
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training

* fix tp>1

* change variable name(ffn1->up_gate_proj/ffn2->down_proj)

* change variable name(linear_weight->weight/linear_bias->bias)

* add rl names mapping for vl

* fix ernie 0.3B error

* fix develop code

* fix
2025-07-15 07:31:42 -07:00
Zero Rains e7bcbbab52 Merge vl execution path into normal execution path (#2829)
* merge vl model into gpu_model runner

Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6

* fix chinese

Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a

* fix the parse parameter

Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe

* fix the bug in online_inference

Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559

* polish code

Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c
2025-07-15 22:20:03 +08:00