mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[docs] add cli uasge to docs (#4569)
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
# tokenizer
|
||||
|
||||
## Description
|
||||
|
||||
The **Tokenizer** subcommand provides encoding and decoding functionality between text and token sequences. It also allows viewing or exporting model vocabulary information. Both text and multimodal models are supported.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
fastdeploy tokenizer --model MODEL (--encode TEXT | --decode TOKENS | --vocab-size | --info)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
| Parameter | Description | Default |
|
||||
| ----------------------------- | ------------------------------------------------------------------------------ | ------- |
|
||||
| --model, -m | Model path or name | None |
|
||||
| --encode, -e | Encode text into a list of tokens | None |
|
||||
| --decode, -d | Decode a list of tokens back into text | None |
|
||||
| --vocab-size, -vs | Display the vocabulary size | None |
|
||||
| --info, -i | Display detailed tokenizer information (special tokens, IDs, max length, etc.) | None |
|
||||
| --vocab-export FILE, -ve FILE | Export the vocabulary to a file | None |
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
# 1. Encode text into tokens
|
||||
# Convert input text into a token sequence recognizable by the model
|
||||
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --encode "Hello, world!"
|
||||
|
||||
# 2. Decode tokens into text
|
||||
# Convert a token sequence back into readable text
|
||||
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --decode "[1, 2, 3]"
|
||||
|
||||
# 3. View vocabulary size
|
||||
# Output the total number of tokens in the model’s vocabulary
|
||||
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --vocab-size
|
||||
|
||||
# 4. View tokenizer details
|
||||
# Includes special symbols, ID mappings, max token length, etc.
|
||||
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --info
|
||||
|
||||
# 5. Export vocabulary to a file
|
||||
# Save the tokenizer’s vocabulary to a local file
|
||||
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --vocab-export ./vocab.txt
|
||||
|
||||
# 6. Support for multimodal models
|
||||
# Decode tokens for a multimodal model
|
||||
fastdeploy tokenizer --model baidu/EB-VL-Lite-d --decode "[5300, 96382]"
|
||||
|
||||
# 7. Combine multiple functions
|
||||
# Encode, decode, view vocabulary, and export vocabulary in a single command
|
||||
fastdeploy tokenizer \
|
||||
-m baidu/ERNIE-4.5-0.3B-PT \
|
||||
-e "你好哇" \
|
||||
-d "[5300, 96382]" \
|
||||
-i \
|
||||
-vs \
|
||||
-ve vocab.json
|
||||
```
|
||||
Reference in New Issue
Block a user