Files
FastDeploy/tests/ce/deterministic/start_fd.sh
T
gongweibao edd31e8849 [Feature] Add Deterministic Inference Support (#6476)
* add

* [tests] Add Paddle attention determinism tests and refactor resource manager

Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* add

* add

* add

* add

* add more

* add more

* fixsome

* fixsome

* fix bugs

* fix bugs

* only in gpu

* add docs

* fix comments

* fix some

* fix some

* fix comments

* add more

* fix potential problem

* remove not need

* remove not need

* remove no need

* fix bug

* fix bugs

* fix comments

* fix comments

* Update tests/ce/deterministic/test_determinism_verification.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/inter_communicator/test_ipc_signal.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/engine/test_sampling_params_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism_standalone.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix comments

* fix import error

* fix a bug

* fix bugs

* fix bugs

* fix coverage

* refine codes

* refine code

* fix comments

* fix comments

* fix comments

* rm not need

* fix allreduce large tensor bug

* mv log files

* mv log files

* add files

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-26 19:31:51 -08:00

30 lines
1.1 KiB
Bash

export FD_MODEL_SOURCE=HUGGINGFACE
export FD_MODEL_CACHE=./models
export CUDA_VISIBLE_DEVICES=0
export ENABLE_V1_KVCACHE_SCHEDULER=1
# FD_DETERMINISTIC_MODE: Toggle deterministic mode
# 0: Disable deterministic mode (non-deterministic)
# 1: Enable deterministic mode (default)
# FD_DETERMINISTIC_LOG_MODE: Toggle determinism logging
# 0: Disable logging (high performance, recommended for production)
# 1: Enable logging with MD5 hashes (debug mode)
# Usage: bash start_fd.sh [deterministic_mode] [log_mode]
# Example:
# bash start_fd.sh 1 0 # Deterministic mode without logging (fast)
# bash start_fd.sh 1 1 # Deterministic mode with logging (debug)
export FD_DETERMINISTIC_MODE=${1:-1}
export FD_DETERMINISTIC_LOG_MODE=${2:-0}
python -m fastdeploy.entrypoints.openai.api_server \
--model ./models/Qwen/Qwen2.5-7B \
--port 8188 \
--tensor-parallel-size 1 \
--max-model-len 32768 \
--enable-logprob \
--graph-optimization-config '{"use_cudagraph":true}' \
--no-enable-prefix-caching \
--no-enable-output-caching