mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
edd31e8849
* add * [tests] Add Paddle attention determinism tests and refactor resource manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * add * add * add * add * add more * add more * fixsome * fixsome * fix bugs * fix bugs * only in gpu * add docs * fix comments * fix some * fix some * fix comments * add more * fix potential problem * remove not need * remove not need * remove no need * fix bug * fix bugs * fix comments * fix comments * Update tests/ce/deterministic/test_determinism_verification.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/inter_communicator/test_ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/engine/test_sampling_params_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism_standalone.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix comments * fix import error * fix a bug * fix bugs * fix bugs * fix coverage * refine codes * refine code * fix comments * fix comments * fix comments * rm not need * fix allreduce large tensor bug * mv log files * mv log files * add files --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
30 lines
1.1 KiB
Bash
30 lines
1.1 KiB
Bash
export FD_MODEL_SOURCE=HUGGINGFACE
|
|
export FD_MODEL_CACHE=./models
|
|
|
|
export CUDA_VISIBLE_DEVICES=0
|
|
export ENABLE_V1_KVCACHE_SCHEDULER=1
|
|
|
|
# FD_DETERMINISTIC_MODE: Toggle deterministic mode
|
|
# 0: Disable deterministic mode (non-deterministic)
|
|
# 1: Enable deterministic mode (default)
|
|
# FD_DETERMINISTIC_LOG_MODE: Toggle determinism logging
|
|
# 0: Disable logging (high performance, recommended for production)
|
|
# 1: Enable logging with MD5 hashes (debug mode)
|
|
# Usage: bash start_fd.sh [deterministic_mode] [log_mode]
|
|
# Example:
|
|
# bash start_fd.sh 1 0 # Deterministic mode without logging (fast)
|
|
# bash start_fd.sh 1 1 # Deterministic mode with logging (debug)
|
|
export FD_DETERMINISTIC_MODE=${1:-1}
|
|
export FD_DETERMINISTIC_LOG_MODE=${2:-0}
|
|
|
|
|
|
python -m fastdeploy.entrypoints.openai.api_server \
|
|
--model ./models/Qwen/Qwen2.5-7B \
|
|
--port 8188 \
|
|
--tensor-parallel-size 1 \
|
|
--max-model-len 32768 \
|
|
--enable-logprob \
|
|
--graph-optimization-config '{"use_cudagraph":true}' \
|
|
--no-enable-prefix-caching \
|
|
--no-enable-output-caching
|