[docs] add cli uasge to docs (#4569)

2026-04-23 00:17:25 +08:00 · 2025-10-28 10:35:11 +08:00
parent 6426414a0f
commit 0b196d82f3
17 changed files with 1282 additions and 2 deletions
@@ -0,0 +1,60 @@
+# tokenizer
+
+## Description
+
+The **Tokenizer** subcommand provides encoding and decoding functionality between text and token sequences. It also allows viewing or exporting model vocabulary information. Both text and multimodal models are supported.
+
+## Usage
+
+```
+fastdeploy tokenizer --model MODEL (--encode TEXT | --decode TOKENS | --vocab-size | --info)
+```
+
+## Parameters
+
+| Parameter                     | Description                                                                    | Default |
+| ----------------------------- | ------------------------------------------------------------------------------ | ------- |
+| --model, -m                   | Model path or name                                                             | None    |
+| --encode, -e                  | Encode text into a list of tokens                                              | None    |
+| --decode, -d                  | Decode a list of tokens back into text                                         | None    |
+| --vocab-size, -vs             | Display the vocabulary size                                                    | None    |
+| --info, -i                    | Display detailed tokenizer information (special tokens, IDs, max length, etc.) | None    |
+| --vocab-export FILE, -ve FILE | Export the vocabulary to a file                                                | None    |
+
+## Examples
+
+```
+# 1. Encode text into tokens
+# Convert input text into a token sequence recognizable by the model
+fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --encode "Hello, world!"
+
+# 2. Decode tokens into text
+# Convert a token sequence back into readable text
+fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --decode "[1, 2, 3]"
+
+# 3. View vocabulary size
+# Output the total number of tokens in the model’s vocabulary
+fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --vocab-size
+
+# 4. View tokenizer details
+# Includes special symbols, ID mappings, max token length, etc.
+fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --info
+
+# 5. Export vocabulary to a file
+# Save the tokenizer’s vocabulary to a local file
+fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --vocab-export ./vocab.txt
+
+# 6. Support for multimodal models
+# Decode tokens for a multimodal model
+fastdeploy tokenizer --model baidu/EB-VL-Lite-d --decode "[5300, 96382]"
+
+# 7. Combine multiple functions
+# Encode, decode, view vocabulary, and export vocabulary in a single command
+fastdeploy tokenizer \
+    -m baidu/ERNIE-4.5-0.3B-PT \
+    -e "你好哇" \
+    -d "[5300, 96382]" \
+    -i \
+    -vs \
+    -ve vocab.json
+```