mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[Docx] add language (en/cn) switch links (#4470)
* add install docs * 修改文档 * 修改文档
This commit is contained in:
@@ -1,3 +1,5 @@
|
||||
[简体中文](../zh/features/disaggregated.md)
|
||||
|
||||
# Disaggregated Deployment
|
||||
|
||||
Large model inference consists of two phases: Prefill and Decode, which are compute-intensive and memory access-intensive respectively. Deploying Prefill and Decode separately in certain scenarios can improve hardware utilization, effectively increase throughput, and reduce overall sentence latency.
|
||||
|
||||
Reference in New Issue
Block a user