* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
---------
Co-authored-by: mouxin <mouxin@baidu.com>
* fp4 dense
* [WIP] support nvfp4, dense part
* [wip] developing loading qwen model
* loading
* update
* dense fp4 OK, cudagraph error
* [WIP] moe forward part
* with flashinfer-backend
* qwen3_moe_fp4
* update
* support flashinfer-cutlass moe, qwen3-moe-fp4 OK
* support ernie4.5-fp4
* fix load error
* add some ut
* add docs
* fix CLA, test
* fix the apply() in ModelOptNvFp4FusedMoE
* fix CodeStyle
* del the PADDLE_COMPATIBLE_API
* fix broken url: nvidia_gpu.md
* fix docs
* fix token_ids
* fix CI in Hopper
* move flashinfer imports inside the function
* fix model_runner
Removed the logic for generating random padding IDs.
* Remove skip condition for CUDA version in nvfp4 test
* add test for nvfp4
* fix according to review
* Add Chinese translation link to NVFP4 documentation
* del flashinfer.py
* fix unittest
---------
Co-authored-by: zoooo0820 <zoooo0820@qq.com>
Co-authored-by: bukejiyu <395822456@qq.com>
* Waiting for cache transfer manager inited
* up
* up
* up
* up
* up
* fix according comments
* fix unittest
* fix
* fix unittest
* fix error
* pass storage_backend to worker
* support build sm 80,86,89,90 to one whl package
* create tmp dir before build custom ops in FD_UNIFY_BUILD mode
* typo fix
* ignore exceptions in xpu ..
* update data_processor
* fix unit test
* fix unit test
* add unit test
* add tool parser plugins
* fix tool call
* fix tool call
* fix tool call
* fix unit test
* fix unit test
* add unit test
* fix unit test
* fix unit test
* fix unit test
* support mxfp4 in gpt-oss
* support mxfp4 in gpt-oss
* add scope for flashinfer
* remove torch code
* update envs.FD_MXFP4_BACKEND
* update process_weights_after_loading
* update env name
* support tp in gpt-oss, add e2e test
* add flashinfer-python-paddle in requirements
* fix import error
* add test
* add test
* add test
* add test
* to_request_for_infer initial commit
* refact to from_chat_completion_request
* preprocess use request initial commit
* bugfix
* processors refact to using request
* bug fix
* refact Request from_generic_request
* post process initial commit
* bugfix
* postprocess second commit
* bugfix
* serving_embedding initial commit
* serving_reward initial commit
* bugfix
* replace function name
* async_llm initial commit
* offline initial commit and fix bug
* bugfix
* fix async_llm
* remove add speculate_metrics into data
* fix logprobs bug
* fix echo bug
* fix bug
* fix reasoning_max_tokens
* bugfix
* bugfix and modify unittest
* bugfix and modify unit test
* bugfix
* bugfix
* bugfix
* modify unittest
* fix error when reasong_content is none for text_processor
* remove some unnessary logic
* revert removed logic
* implement add and set method for RequestOutput and refact code
* modify unit test
* modify unit test
* union process_request and process_request_obj
* remove a unit test
* union process_response and process_response_obj
* support qwen3_vl_processor
* modify unittest and remove comments
* fix prompt_logprobs
* fix codestyle
* add v1
* v1
* fix unit test
* fix unit test
* fix pre-commit
* fix
* add process request
* add process request
* fix
* fix
* fix unit test
* fix unit test
* fix unit test
* fix unit test
* fix unit test
* remove file
* add unit test
* add unit test
* add unit test
* fix unit test
* fix unit test
* fix
* fix
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
Co-authored-by: luukunn <981429396@qq.com>
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com>
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>