mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[Docs] Add docs for disaggregated deployment (#6700)
* add docs for disaggregated deployment * pre-commit run for style check * update docs
This commit is contained in:
@@ -1,5 +1,7 @@
|
||||
[简体中文](../zh/features/disaggregated.md)
|
||||
|
||||
[Best Practice](../best_practices/Disaggregated.md)
|
||||
|
||||
# Disaggregated Deployment
|
||||
|
||||
Large Language Model (LLM) inference is divided into two phases: **Prefill** and **Decode**, which are compute-intensive and memory-bound, respectively.
|
||||
|
||||
Reference in New Issue
Block a user