* Enhance run_ci_xpu.sh with caching and prefill options
* Update model path and configuration in run_ci_xpu.sh
* Add '北朝' keyword to assertion in run_45vl.py
* Enhance process termination logic in run_ci_xpu.sh
* Set timeout for CI_XPU job to 60 minutes
* Remove extra newline in stop_processes function
* Update paddlepaddle-xpu installation command
Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.
* Update PaddlePaddle installation command
* Remove max_tokens from model response configuration
Removed max_tokens parameter from the model response call.
* add xpu logprobs case
* Fix formatting and improve setup_logprobs_env
Add newline at end of file and update setup_logprobs_env function.
* Refactor test_logprobs_21b_tp4.py for clarity
* Change top_p value from 1.0 to 0
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
* support
* fix
* support w4afp8 v1_loader and v0_loader
* fix
* fix test
* fix test
* fix test
* fix moe.py
* add test_ernie_4_5_w4afp8
* add test
* delete tensor
* fix test
* fix
* add
* fix test
* support v1 loader
* remove useless code
* remove useless
* [Model] support Qwen3VL images success
* [Model] support Qwen3VL rope_3d
* [Model] support Qwen3VL remove log
* [Model] support Qwen3VL RL
* [Model] support Qwen3VL tp
* [Model] support Qwen3VL video
* [Model] support Qwen3VL fix ernievl
* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds
* [Model] support Qwen3VL fix multi card
* [Model] support Qwen3VL file close
* [Model] support Qwen3VL fix ce
* [Model] support Qwen3VL fix unittest
* [Model] support Qwen3VL add unittest
---------
Co-authored-by: Ayakouji <yuhongh@qq.com>
* Add comprehensive unit tests for data type conversion functionality
* fix
* Fix unit test failures in test_local_scheduler.py
* update
* fix code
* update mock
* add ut
* rm file
* update test
* 删除已覆盖的测试用例
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM
* [Optimization] refactor(chat_handler,completion_handler): rename class
* Add tests for openai api_server coverage
* update
* Update tests for openai api_server
* fix bugs
* test: disable some api_server lifespan/controller tests for local env
* Format test_api_server with black
* update
* update
* test: narrow envs patch in api_server tests to avoid side effects
* fix: separate MagicMock creation to avoid missing req argument
* fix: patch TRACES_ENABLE env var in api_server tests
* fix: use os.environ patch for TRACES_ENABLE
* test: use fake fastdeploy.envs in api_server tests
* test: pass fake Request into chat/completion routes
* test: increase coverage for tracing and scheduler control
* fix: set dynamic_load_weight in tracing headers test
* ci: add retry and validation for FastDeploy.tar.gz download
* ci: fix indentation in _base_test.yml
* refactor: simplify test_api_server.py (807->480 lines, ~40% reduction)
* fix: restore missing args attributes (revision, etc.) in _build_args
* fix: patch sys.argv to prevent SystemExit: 2 in api_server tests
* improve coverage
* Remove docstring from test_api_server.py
Removed unnecessary docstring from test_api_server.py
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
* Add unit tests for TokenProcessor functionality
* Add trace stubs for token processor tests
* Increase token processor test coverage
* Clean up imports in test_token_processor.py
Remove unnecessary path manipulation in test file.
* Cleanup: Remove unused imports in test_token_processor
Removed unused imports from the test file.
* Add trace_carrier to task in test cases
Added trace_carrier attribute to task in multiple test cases to ensure proper handling of trace information.
* Refine token processor tests for safe coverage
* Expand postprocess coverage
* Add ZMQ logprob parsing test
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: Tao Luo <luotao02@baidu.com>
* Enhance run_ci_xpu.sh with caching and prefill options
* Update model path and configuration in run_ci_xpu.sh
* Add '北朝' keyword to assertion in run_45vl.py
* Enhance process termination logic in run_ci_xpu.sh
* Set timeout for CI_XPU job to 60 minutes
* Remove extra newline in stop_processes function
* Update paddlepaddle-xpu installation command
Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.
* Update PaddlePaddle installation command
* Remove max_tokens from model response configuration
Removed max_tokens parameter from the model response call.