[Feature] Fix counter release logic & update go-router download URL (#6280)

* [Doc] Update prerequisites in the documentation

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Fix counter release logic

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update token counter logic and docs

* [Feature] Update token counter logic and docs

---------

Co-authored-by: mouxin <mouxin@baidu.com>
This commit is contained in:
mouxin
2026-02-04 15:02:38 +08:00
committed by GitHub
parent 36547cfdb3
commit 6e96bd0bd2
16 changed files with 115 additions and 51 deletions
+1
View File
@@ -83,6 +83,7 @@ FastDeploy 支持在**英伟达(NVIDIAGPU**、**昆仑芯(KunlunxinXPU
- [投机解码](./docs/zh/features/speculative_decoding.md) - [投机解码](./docs/zh/features/speculative_decoding.md)
- [前缀缓存](./docs/zh/features/prefix_caching.md) - [前缀缓存](./docs/zh/features/prefix_caching.md)
- [分块预填充](./docs/zh/features/chunked_prefill.md) - [分块预填充](./docs/zh/features/chunked_prefill.md)
- [负载均衡调度Router](./docs/zh/online_serving/router.md)
## 致谢 ## 致谢
+1
View File
@@ -81,6 +81,7 @@ Learn how to download models, enable using the torch format, and more:
- [Speculative Decoding](./docs/features/speculative_decoding.md) - [Speculative Decoding](./docs/features/speculative_decoding.md)
- [Prefix Caching](./docs/features/prefix_caching.md) - [Prefix Caching](./docs/features/prefix_caching.md)
- [Chunked Prefill](./docs/features/chunked_prefill.md) - [Chunked Prefill](./docs/features/chunked_prefill.md)
- [Load-Balancing Scheduling Router](./docs/online_serving/router.md)
## Acknowledgement ## Acknowledgement
+2 -2
View File
@@ -70,11 +70,11 @@ The usage and request scheduling workflow is as follows:
### Launching the Router ### Launching the Router
Start the Router service. Logs are written to `log_router/router.log`. Start the Router service. Logs are written to `log_router/router.log`. `fd-router` installation instructions can be found in the [Router documentation](../online_serving/router.md).
```shell ```shell
export FD_LOG_DIR="log_router" export FD_LOG_DIR="log_router"
python -m fastdeploy.router.launch \ /usr/local/bin/fd-router \
--host 0.0.0.0 \ --host 0.0.0.0 \
--port 30000 --port 30000
``` ```
+2 -2
View File
@@ -67,11 +67,11 @@ bash build.sh
**Quick Start** **Quick Start**
Start the Router service. The `--splitwise` parameter specifies the scheduling mode as disaggregated deployment. Log information is output to `log_router/router.log`. Start the Router service. The `--splitwise` parameter specifies the scheduling mode as disaggregated deployment. Log information is output to `log_router/router.log`. `fd-router` installation instructions can be found in the [Router documentation](../online_serving/router.md).
```bash ```bash
export FD_LOG_DIR="log_router" export FD_LOG_DIR="log_router"
python -m fastdeploy.router.launch \ /usr/local/bin/fd-router \
--host 0.0.0.0 \ --host 0.0.0.0 \
--port 30000 \ --port 30000 \
--splitwise --splitwise
Binary file not shown.

After

Width:  |  Height:  |  Size: 163 KiB

+11 -3
View File
@@ -4,12 +4,20 @@
FastDeploy provides a Golang-based [Router](https://github.com/PaddlePaddle/FastDeploy/tree/develop/fastdeploy/golang_router) for request scheduling. The Router supports both centralized deployment and Prefill/Decode (PD) disaggregated deployment.。 FastDeploy provides a Golang-based [Router](https://github.com/PaddlePaddle/FastDeploy/tree/develop/fastdeploy/golang_router) for request scheduling. The Router supports both centralized deployment and Prefill/Decode (PD) disaggregated deployment.。
![go-router](images/go-router-workflow.png)
## Installation ## Installation
### 1. Prebuilt Binaries ### 1. Prebuilt Binaries
Starting from FastDeploy v2.5.0, the official Docker images include the Go language environment required to build the Golang Router and also provide a precompiled Router binary. The Router binary is located by default in the `/usr/local/bin` directory and can be used directly without additional compilation. For installation details, please refer to the [FastDeploy Installation Guide](../get_started/installation/nvidia_gpu.md) Starting from FastDeploy v2.5.0, the official Docker images include the Go language environment required to build the Golang Router and also provide a precompiled Router binary. The Router binary is located by default in the `/usr/local/bin` directory and can be used directly without additional compilation. For installation details, please refer to the [FastDeploy Installation Guide](../get_started/installation/nvidia_gpu.md)
If you need to download the Golang-based router binary separately, it can be installed using the following steps:
```
wget https://paddle-qa.bj.bcebos.com/paddle-pipeline/FastDeploy_ActionCE/develop/latest/fd-router
mv fd-router /usr/local/bin/fd-router
```
### 2. Build from Source ### 2. Build from Source
You need to build the Router from source in the following scenarios: You need to build the Router from source in the following scenarios:
@@ -33,7 +41,7 @@ bash build.sh
Start the Router service. The `--port` parameter specifies the scheduling port for centralized deployment. Start the Router service. The `--port` parameter specifies the scheduling port for centralized deployment.
``` ```
./fd-router --port 30000 /usr/local/bin/fd-router --port 30000
``` ```
Start a mixed inference instance. Compared to standalone deployment, specify the Router endpoint via `--router`. Other parameters remain unchanged. Start a mixed inference instance. Compared to standalone deployment, specify the Router endpoint via `--router`. Other parameters remain unchanged.
@@ -50,7 +58,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
Start the Router service with PD disaggregation enabled using the `--splitwise` flag. Start the Router service with PD disaggregation enabled using the `--splitwise` flag.
``` ```
./fd-router \ /usr/local/bin/fd-router \
--port 30000 \ --port 30000 \
--splitwise --splitwise
``` ```
@@ -105,7 +113,7 @@ popd
Launch the Router with the custom configuration specified via `--config_path`: Launch the Router with the custom configuration specified via `--config_path`:
``` ```
./fd-router \ /usr/local/bin/fd-router \
--port 30000 \ --port 30000 \
--splitwise \ --splitwise \
--config_path examples/run_with_config/config/config.yaml --config_path examples/run_with_config/config/config.yaml
+2 -2
View File
@@ -55,10 +55,10 @@ FastDeploy提供[Router](https://github.com/PaddlePaddle/FastDeploy/tree/develop
- Router接收实例的生成结果,返回给用户 - Router接收实例的生成结果,返回给用户
上手示例: 上手示例:
- 启动Router服务,日志信息输出在`log_router/router.log` - 启动Router服务,日志信息输出在`log_router/router.log``fd-router`的安装方法参考[Router说明文档](../online_serving/router.md)。
``` ```
export FD_LOG_DIR="log_router" export FD_LOG_DIR="log_router"
python -m fastdeploy.router.launch \ /usr/local/bin/fd-router \
--host 0.0.0.0 \ --host 0.0.0.0 \
--port 30000 \ --port 30000 \
``` ```
+2 -2
View File
@@ -65,10 +65,10 @@ bash build.sh
**快速上手** **快速上手**
启动Router服务,其中`--splitwise`参数指定为分离式部署的调度方式,日志信息输出在`log_router/router.log` 启动Router服务,其中`--splitwise`参数指定为分离式部署的调度方式,日志信息输出在`log_router/router.log``fd-router`的安装方法参考[Router说明文档](../online_serving/router.md)。
``` ```
export FD_LOG_DIR="log_router" export FD_LOG_DIR="log_router"
python -m fastdeploy.router.launch \ /usr/local/bin/fd-router \
--host 0.0.0.0 \ --host 0.0.0.0 \
--port 30000 \ --port 30000 \
--splitwise --splitwise
Binary file not shown.

After

Width:  |  Height:  |  Size: 163 KiB

+13 -4
View File
@@ -4,11 +4,19 @@
FastDeploy提供Golang版本[Router](https://github.com/PaddlePaddle/FastDeploy/tree/develop/fastdeploy/golang_router),用于实现请求的调度。Router支持集中式部署和PD分离式部署。 FastDeploy提供Golang版本[Router](https://github.com/PaddlePaddle/FastDeploy/tree/develop/fastdeploy/golang_router),用于实现请求的调度。Router支持集中式部署和PD分离式部署。
![go-router](images/go-router-workflow.png)
## 安装 ## 安装
### 1. 预编译库下载 ### 1. 预编译库下载
在 FastDeploy v2.5.0 及之后版本中,官方 Docker 镜像将内置 Golang Router 编译所需的 Go 语言环境,并提供已编译完成的 Router 二进制文件。该二进制文件默认位于 `/usr/local/bin` 目录下,可直接使用。相关安装方式可参考 [FastDeploy 安装文档](../get_started/installation/nvidia_gpu.md) 在 FastDeploy v2.5.0 及之后版本中,官方 Docker 镜像将内置 Golang Router 编译所需的 Go 语言环境,并提供已编译完成的 Router 二进制文件。该二进制文件默认位于 `/usr/local/bin` 目录下,可直接使用。相关安装方式可参考 [FastDeploy 安装文档](../get_started/installation/nvidia_gpu.md)
若需单独下载 Golang router 二进制文件,可通过以下方式:
```
wget https://paddle-qa.bj.bcebos.com/paddle-pipeline/FastDeploy_ActionCE/develop/latest/fd-router
mv fd-router /usr/local/bin/fd-router
```
### 2. 编译安装 ### 2. 编译安装
@@ -27,13 +35,14 @@ FastDeploy提供Golang版本[Router](https://github.com/PaddlePaddle/FastDeploy/
git clone https://github.com/PaddlePaddle/FastDeploy.git git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy/fastdeploy/golang_router cd FastDeploy/fastdeploy/golang_router
bash build.sh bash build.sh
cp
``` ```
## 集中式部署 ## 集中式部署
启动Router服务,其中`--port`参数指定集中式部署的调度端口. 启动Router服务,其中`--port`参数指定集中式部署的调度端口.
``` ```
./fd-router \ /usr/local/bin/fd-router \
--port 30000 --port 30000
``` ```
@@ -51,7 +60,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
启动Router服务,其中`--splitwise`参数指定为分离式部署的调度方式. 启动Router服务,其中`--splitwise`参数指定为分离式部署的调度方式.
``` ```
./fd-router \ /usr/local/bin/fd-router \
--port 30000 \ --port 30000 \
--splitwise --splitwise
``` ```
@@ -106,7 +115,7 @@ popd
在Router启动Router服务,其中`--config_path`参数指定配置路径. 在Router启动Router服务,其中`--config_path`参数指定配置路径.
``` ```
./fd-router \ /usr/local/bin/fd-router \
--port 30000 \ --port 30000 \
--splitwise \ --splitwise \
--config_path examples/run_with_config/config/config.yaml --config_path examples/run_with_config/config/config.yaml
+31 -8
View File
@@ -17,8 +17,7 @@ ROUTER_PORT=52600
FD_BIN_DIR="/usr/local/bin" FD_BIN_DIR="/usr/local/bin"
FD_ROUTER_BIN="${FD_BIN_DIR}/fd-router" FD_ROUTER_BIN="${FD_BIN_DIR}/fd-router"
FD_ROUTER_URL="https://paddle-qa.bj.bcebos.com/FastDeploy/fd-router" FD_ROUTER_URL="https://paddle-qa.bj.bcebos.com/paddle-pipeline/FastDeploy_ActionCE/develop/latest/fd-router"
FD_ROUTER_SHA256="67640aaeebdd886826d3534930b2154cd2c1441a26bc3f38c3af5f0aadba7c2d"
ports=( ports=(
$S1_PORT $((S1_PORT + 1)) $((S1_PORT + 2)) $((S1_PORT + 3)) $S1_PORT $((S1_PORT + 1)) $((S1_PORT + 2)) $((S1_PORT + 3))
@@ -37,20 +36,44 @@ if [ ! -x "${FD_ROUTER_BIN}" ]; then
mkdir -p "${FD_BIN_DIR}" mkdir -p "${FD_BIN_DIR}"
TMP_BIN="${FD_ROUTER_BIN}.tmp" TMP_BIN="${FD_ROUTER_BIN}.tmp"
wget -q --no-proxy "${FD_ROUTER_URL}" -O "${TMP_BIN}" || exit 1 wget -q --no-proxy "${FD_ROUTER_URL}" -O "${TMP_BIN}" || {
echo "❌ Download fd-router failed"
echo "${FD_ROUTER_SHA256} ${TMP_BIN}" | sha256sum -c - || {
echo "❌ Integrity check failed"
rm -f "${TMP_BIN}" rm -f "${TMP_BIN}"
exit 1 exit 1
} }
# ------- sanity checks (no fixed hash) -------
# 1. must be ELF binary
file "${TMP_BIN}" || grep -q "ELF" || {
echo "❌ fd-router is not an ELF binary"
rm -f "${TMP_BIN}"
exit 1
}
# 2. must be x86_64 architecture
file "${TMP_BIN}" | grep -q "x86-64" || {
echo "❌ fd-router architecture mismatch"
rm -f "${TMP_BIN}"
exit 1
}
# 3. size check (avoid HTML / empty / error pages)
SIZE=$(stat -c%s "${TMP_BIN}")
if [ "$SIZE" -lt 1000000 ]; then
echo "❌ fd-router size is too small ($SIZE bytes), suspicious"
rm -f "${TMP_BIN}"
exit 1
fi
# -------------------------------------
mv "${TMP_BIN}" "${FD_ROUTER_BIN}" mv "${TMP_BIN}" "${FD_ROUTER_BIN}"
chmod +x "${FD_ROUTER_BIN}" chmod +x "${FD_ROUTER_BIN}"
echo "fd-router installed and verified" echo "fd-router installed with sanity checks"
else else
echo "fd-router already exists" echo "fd-router already exists"
fi fi
# start router # start router
+31 -8
View File
@@ -24,8 +24,7 @@ LOG_DATE=$(date +%Y%m%d_%H%M%S)
FD_BIN_DIR="/usr/local/bin" FD_BIN_DIR="/usr/local/bin"
FD_ROUTER_BIN="${FD_BIN_DIR}/fd-router" FD_ROUTER_BIN="${FD_BIN_DIR}/fd-router"
FD_ROUTER_URL="https://paddle-qa.bj.bcebos.com/FastDeploy/fd-router" FD_ROUTER_URL="https://paddle-qa.bj.bcebos.com/paddle-pipeline/FastDeploy_ActionCE/develop/latest/fd-router"
FD_ROUTER_SHA256="67640aaeebdd886826d3534930b2154cd2c1441a26bc3f38c3af5f0aadba7c2d"
ports=($P_PORT $D_PORT $ROUTER_PORT) ports=($P_PORT $D_PORT $ROUTER_PORT)
check_ports "${ports[@]}" || { check_ports "${ports[@]}" || {
@@ -40,20 +39,44 @@ if [ ! -x "${FD_ROUTER_BIN}" ]; then
mkdir -p "${FD_BIN_DIR}" mkdir -p "${FD_BIN_DIR}"
TMP_BIN="${FD_ROUTER_BIN}.tmp" TMP_BIN="${FD_ROUTER_BIN}.tmp"
wget -q --no-proxy "${FD_ROUTER_URL}" -O "${TMP_BIN}" || exit 1 wget -q --no-proxy "${FD_ROUTER_URL}" -O "${TMP_BIN}" || {
echo "❌ Download fd-router failed"
echo "${FD_ROUTER_SHA256} ${TMP_BIN}" | sha256sum -c - || {
echo "❌ Integrity check failed"
rm -f "${TMP_BIN}" rm -f "${TMP_BIN}"
exit 1 exit 1
} }
# ------- sanity checks (no fixed hash) -------
# 1. must be ELF binary
file "${TMP_BIN}" || grep -q "ELF" || {
echo "❌ fd-router is not an ELF binary"
rm -f "${TMP_BIN}"
exit 1
}
# 2. must be x86_64 architecture
file "${TMP_BIN}" | grep -q "x86-64" || {
echo "❌ fd-router architecture mismatch"
rm -f "${TMP_BIN}"
exit 1
}
# 3. size check (avoid HTML / empty / error pages)
SIZE=$(stat -c%s "${TMP_BIN}")
if [ "$SIZE" -lt 1000000 ]; then
echo "❌ fd-router size is too small ($SIZE bytes), suspicious"
rm -f "${TMP_BIN}"
exit 1
fi
# -------------------------------------
mv "${TMP_BIN}" "${FD_ROUTER_BIN}" mv "${TMP_BIN}" "${FD_ROUTER_BIN}"
chmod +x "${FD_ROUTER_BIN}" chmod +x "${FD_ROUTER_BIN}"
echo "fd-router installed and verified" echo "fd-router installed with sanity checks"
else else
echo "fd-router already exists" echo "fd-router already exists"
fi fi
# start router # start router
@@ -19,5 +19,5 @@ if [ -n "$PID" ]; then
fi fi
echo "Starting new fd-router process..." echo "Starting new fd-router process..."
nohup ./fd-router --config_path ./config/config.yaml --splitwise > fd-router.log 2>&1 & nohup /usr/local/bin/fd-router --config_path ./config/config.yaml --splitwise > fd-router.log 2>&1 &
echo "fd-router started with PID: $!" echo "fd-router started with PID: $!"
@@ -14,5 +14,5 @@ if [ -n "$PID" ]; then
fi fi
echo "Starting new fd-router process..." echo "Starting new fd-router process..."
nohup ./fd-router --config_path ./config/config.yaml --splitwise > fd-router.log 2>&1 & nohup /usr/local/bin/fd-router --config_path ./config/config.yaml --splitwise > fd-router.log 2>&1 &
echo "fd-router started with PID: $!" echo "fd-router started with PID: $!"
@@ -142,7 +142,7 @@ func extractPromptFromCompletionsRequest(rawReq map[string]any) string {
} }
// PostToPD sends requests to both Prefill and Decode instances, only returns Decode node response // PostToPD sends requests to both Prefill and Decode instances, only returns Decode node response
func PostToPD(c *gin.Context, decodeURL, prefillURL string, reqBody []byte, isStream bool, completionEndpoint string) (*http.Response, error) { func PostToPD(c *gin.Context, decodeURL, prefillURL string, reqBody []byte, isStream bool, message string, completionEndpoint string) (*http.Response, error) {
ctx := c.Request.Context() ctx := c.Request.Context()
decodeEndpoint := fmt.Sprintf("%s/v1/%s", decodeURL, completionEndpoint) decodeEndpoint := fmt.Sprintf("%s/v1/%s", decodeURL, completionEndpoint)
@@ -206,13 +206,13 @@ func PostToPD(c *gin.Context, decodeURL, prefillURL string, reqBody []byte, isSt
} }
if prefillRes.resp != nil { if prefillRes.resp != nil {
go readPrefillRecv(ctx, prefillURL, isStream, prefillRes.resp) go readPrefillRecv(ctx, prefillURL, isStream, message, prefillRes.resp)
} }
return decodeRes.resp, nil return decodeRes.resp, nil
} }
func readPrefillRecv(ctx context.Context, url string, isStream bool, backendResp *http.Response) { func readPrefillRecv(ctx context.Context, url string, isStream bool, message string, backendResp *http.Response) {
if backendResp == nil || backendResp.Body == nil { if backendResp == nil || backendResp.Body == nil {
return return
} }
@@ -231,21 +231,22 @@ func readPrefillRecv(ctx context.Context, url string, isStream bool, backendResp
// Fallback to ensure release // Fallback to ensure release
if !released { if !released {
scheduler_handler.Release(ctx, url) scheduler_handler.Release(ctx, url)
scheduler_handler.ReleasePrefillTokens(ctx, url, message)
logger.Debug("[prefill] release in defer (fallback) url=%s", url) logger.Debug("[prefill] release in defer (fallback) url=%s", url)
} }
}() }()
for scanner.Scan() { for scanner.Scan() {
line := scanner.Text() _ = scanner.Text()
// First read that returns data // First read that returns data
if !released { if !released {
scheduler_handler.Release(ctx, url) scheduler_handler.Release(ctx, url)
scheduler_handler.ReleasePrefillTokens(ctx, url, message)
released = true released = true
logger.Debug("[prefill] first chunk received, release scheduler url=%s", url) logger.Debug("[prefill] first chunk received, release scheduler url=%s", url)
} }
logger.Debug("[prefill] recv result: %s", line)
} }
if err := scanner.Err(); err != nil { if err := scanner.Err(); err != nil {
@@ -295,11 +296,12 @@ func CommonCompletions(c *gin.Context, extractor PromptExtractor, completionEndp
requestBodyData []byte requestBodyData []byte
prefillURL string prefillURL string
decodeURL string decodeURL string
message string
) )
if isSplitwise { if isSplitwise {
// PD mode: select instances for Prefill/Decode separately // PD mode: select instances for Prefill/Decode separately
message := extractor(rawReq) message = extractor(rawReq)
prefillURL, decodeURL, err = manager.SelectWorkerPair(ctx, message) prefillURL, decodeURL, err = manager.SelectWorkerPair(ctx, message)
if err != nil { if err != nil {
@@ -313,9 +315,6 @@ func CommonCompletions(c *gin.Context, extractor PromptExtractor, completionEndp
return return
} }
// Prefill node token count was added in SelectWorker, release when request ends
defer scheduler_handler.ReleasePrefillTokens(ctx, prefillURL, message)
// Construct disaggregate_info to ensure selected P/D work in pairs within FastDeploy // Construct disaggregate_info to ensure selected P/D work in pairs within FastDeploy
disagg, err := manager.BuildDisaggregateInfo(ctx, prefillURL, decodeURL) disagg, err := manager.BuildDisaggregateInfo(ctx, prefillURL, decodeURL)
if err != nil { if err != nil {
@@ -340,7 +339,7 @@ func CommonCompletions(c *gin.Context, extractor PromptExtractor, completionEndp
} }
destURL = decodeURL destURL = decodeURL
releaseTargets = []string{prefillURL, decodeURL} releaseTargets = []string{decodeURL}
// Expose scheduling results to caller for debugging/validating scheduling strategy // Expose scheduling results to caller for debugging/validating scheduling strategy
c.Writer.Header().Set("X-Router-Prefill-URL", prefillURL) c.Writer.Header().Set("X-Router-Prefill-URL", prefillURL)
@@ -376,7 +375,7 @@ func CommonCompletions(c *gin.Context, extractor PromptExtractor, completionEndp
// Send request // Send request
var backendResp *http.Response var backendResp *http.Response
if isSplitwise { if isSplitwise {
backendResp, err = PostToPD(c, decodeURL, prefillURL, requestBodyData, isStream, completionEndpoint) backendResp, err = PostToPD(c, decodeURL, prefillURL, requestBodyData, isStream, message, completionEndpoint)
} else { } else {
backendResp, err = GetClientWithRetry(c, requestBodyData, destURL) backendResp, err = GetClientWithRetry(c, requestBodyData, destURL)
} }
@@ -389,7 +388,7 @@ func CommonCompletions(c *gin.Context, extractor PromptExtractor, completionEndp
defer backendResp.Body.Close() defer backendResp.Body.Close()
if isSplitwise { if isSplitwise {
metrics.InferenceRequests.WithLabelValues("", releaseTargets[0], destURL, strconv.Itoa(backendResp.StatusCode)).Inc() metrics.InferenceRequests.WithLabelValues("", prefillURL, decodeURL, strconv.Itoa(backendResp.StatusCode)).Inc()
} else { } else {
metrics.InferenceRequests.WithLabelValues(destURL, "", "", strconv.Itoa(backendResp.StatusCode)).Inc() metrics.InferenceRequests.WithLabelValues(destURL, "", "", strconv.Itoa(backendResp.StatusCode)).Inc()
} }
@@ -203,7 +203,7 @@ func TestPostToPD(t *testing.T) {
})) }))
defer decodeServer.Close() defer decodeServer.Close()
resp, err := PostToPD(c, decodeServer.URL, prefillServer.URL, reqBody, false, "chat/completions") resp, err := PostToPD(c, decodeServer.URL, prefillServer.URL, reqBody, false, "test message", "chat/completions")
assert.NoError(t, err) assert.NoError(t, err)
assert.Equal(t, http.StatusOK, resp.StatusCode) assert.Equal(t, http.StatusOK, resp.StatusCode)
assert.NotNil(t, resp) assert.NotNil(t, resp)
@@ -217,7 +217,7 @@ func TestPostToPD(t *testing.T) {
defer prefillServer.Close() defer prefillServer.Close()
// Use invalid URL to simulate connection error // Use invalid URL to simulate connection error
resp, err := PostToPD(c, "http://invalid-server:9999", prefillServer.URL, reqBody, false, "chat/completions") resp, err := PostToPD(c, "http://invalid-server:9999", prefillServer.URL, reqBody, false, "test message", "chat/completions")
assert.Error(t, err) assert.Error(t, err)
assert.Nil(t, resp) assert.Nil(t, resp)
}) })
@@ -229,7 +229,7 @@ func TestPostToPD(t *testing.T) {
defer decodeServer.Close() defer decodeServer.Close()
// Use invalid URL to simulate connection error // Use invalid URL to simulate connection error
resp, err := PostToPD(c, decodeServer.URL, "http://invalid-server:9999", reqBody, false, "chat/completions") resp, err := PostToPD(c, decodeServer.URL, "http://invalid-server:9999", reqBody, false, "test message", "chat/completions")
assert.Error(t, err) assert.Error(t, err)
assert.Nil(t, resp) assert.Nil(t, resp)
}) })
@@ -336,7 +336,7 @@ func TestReadPrefillRecv(t *testing.T) {
t.Run("nil response handling", func(t *testing.T) { t.Run("nil response handling", func(t *testing.T) {
ctx := context.Background() ctx := context.Background()
// Should handle nil response gracefully without panic // Should handle nil response gracefully without panic
readPrefillRecv(ctx, "test-url", false, nil) readPrefillRecv(ctx, "test-url", false, "test message", nil)
}) })
t.Run("nil response body handling", func(t *testing.T) { t.Run("nil response body handling", func(t *testing.T) {
@@ -347,7 +347,7 @@ func TestReadPrefillRecv(t *testing.T) {
Body: nil, Body: nil,
} }
// Should handle nil body gracefully without panic // Should handle nil body gracefully without panic
readPrefillRecv(ctx, "test-url", false, resp) readPrefillRecv(ctx, "test-url", false, "test message", resp)
}) })
t.Run("mock response without scheduler dependency", func(t *testing.T) { t.Run("mock response without scheduler dependency", func(t *testing.T) {
@@ -361,7 +361,7 @@ func TestReadPrefillRecv(t *testing.T) {
// This test verifies basic error handling and response body consumption // This test verifies basic error handling and response body consumption
// without triggering scheduler initialization requirements // without triggering scheduler initialization requirements
readPrefillRecv(ctx, "test-url", false, resp) readPrefillRecv(ctx, "test-url", false, "test message", resp)
}) })
} }