* add xpu ci case
* add xpu ci case
* add xpu ci case
* Change runner from XPU-P800-8Card to XPU-P800
* Remove cache queue port from test_pd_03b_tp1.py
Removed cache queue port arguments from test cases.
* Remove cache queue port from test_pd_21b_tp2.py
Removed cache queue port arguments from test cases.
* Update README with PYTHONPATH setup instructions
Added instructions for setting PYTHONPATH in CI scripts.
* fix(xpu_model_runner): reset seq_lens_encoder to 0 for decode role in PD splitwise mode
- Set seq_lens_encoder to 0 when splitwise_role is 'decode' during prefill processing
- This ensures proper continuation of decoding after P generate first token in PD disaggregated architecture
- Fixes potential sequence length inconsistency in PD splitwise deployment scenarios
* format
* support tag phase token enforce generation
* optimize note and some feature
* fix sampler unit test
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* Remove cache queue port from test configuration
Removed cache queue port configuration from test.
* Remove cache queue port from test_vl_model.py
Removed cache queue port argument from test configuration.
* Update test_w4a8.py
* Remove cache queue port from test_mtp.py
Removed cache queue port configuration from test.
* Remove cache queue port from test_logprobs_21b_tp4
Removed cache queue port configuration from test.
* Remove cache queue port from test configuration
Removed cache queue port configuration from test.
* Update test_ep4tp4_online.py
* Update run_xpu_ci_pytest.sh to comment out installations
Comment out PaddlePaddle installation and XVLLM download steps.
* add bs1 r3 test case
* async put
* r3 test case 1.0
* success run eb5
* refine test case
* pre-commit
* add eb45 & glm testcase
* format code
* add p2pstore requirements
* support only last turn
* R3 use worker log
* refine code &fix ci bug
* refine error mesg
* fix empty input bug
* Success set acc ci of eb45 and glm45
* refine code
* fix bug
* [XPU] fix multi-batch bug in VL model
* Add command to kill additional port processes
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization
* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization
* add usage commit
* update envs and xpu
* add requirements
* fix quantization value
* add unit test
* add unit test
* fix unit test
* add unit test
* add unit test
* add unit test
* add unit test
* add unit test
* add unit test
* fix FD_USAGE_STATS_SERVER
* fix
* fix
* add doc
* add doc
* add doc
* add doc
* add doc
* fix file name
* Remove cache queue port from test configuration
Removed cache queue port configuration from test.
* Remove cache queue port from test_vl_model.py
Removed cache queue port argument from test configuration.
* Update test_w4a8.py
* Remove cache queue port from test_mtp.py
Removed cache queue port configuration from test.
* Remove cache queue port from test_logprobs_21b_tp4
Removed cache queue port configuration from test.
* Remove cache queue port from test configuration
Removed cache queue port configuration from test.
* Update test_ep4tp4_online.py
* [fix] fix cache transfer manager updating/clearing
* [fix] fix code style
* [fix] fix config
* [fix] fix engine client
* [fix] let worker update kv cache status signal
* [fix] update worker process
* [fix] fix clear/update for case if comm group is shutdown
* [fix] update dynamic weight manager
* [fix] fix port
* [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting