Commit Graph

893 Commits

Author SHA1 Message Date
lonelygsh e0a1653b26 [Speculate Decoding] Fix bug of reasoning_phase_token_constraint kernel (#7349)
Co-authored-by: guanshihui] <guanshihui@baidu.com>
2026-04-14 20:57:11 +08:00
Echo-Nie 8819a039c9 [Others] Fix typo (#7280)
* typo

* typo

* typo

* typo
2026-04-14 17:28:22 +08:00
Yuanle Liu 0ddb6e461c [Optimization] 移除 num_blocks 上限限制 (#7241) 2026-04-13 07:07:41 -07:00
lonelygsh e83d45833f [Speculate Decoding] Fix step_idx semantics in limit_thinking and set_stop_value kernels (#7166)
- speculate_limit_thinking_content_length: update current_base_step to
  step_idx+1 (step_idx now records history count before current round);
  remove incorrect step_idx decrement on accept_num truncation; mark
  step_idx param as const.
- speculate_set_stop_value_multi_seqs: fix can_stop gate to use
  step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx
  formula (remove stale -accept_num offset); use <= condition so accept_idx
  maps directly to the accepted token that ends the stop sequence; fix
  accept_tokens index (remove -1).
- Update unit tests for speculate_set_stop_value_multi_seqs kernel.
2026-04-13 20:53:42 +08:00
周周周 73bd4ab318 [Feature] 为 FusedMoE 添加 hidden_size 显式参数支持 (#7361)
[Feature] 为 FusedMoE 添加 hidden_size 显式参数支持
2026-04-13 20:24:58 +08:00
YuBaoku 1e08ee74e5 [CI] Modify 4-card container startup config and move test case (#7363) 2026-04-13 05:23:49 -07:00
freeliuzc 31e2a8bbad [Speculative Decoding] Support mtp super ultra overlap in pd-split mode with insert_task overlap (#7323)
* support mtp overlap in pd-split mode with insert_task overlap
2026-04-13 19:41:17 +08:00
Nyako Shigure d659099415 [Cleanup] Replace torch proxy alias with public compat API (#7348) 2026-04-13 11:43:26 +08:00
周周周 225fc8d222 use self.hidden_size not use self.fd_config.model_config.hidden_size (#7340) 2026-04-11 22:39:43 +08:00
chen 4982aa000e [RL]moe bf16 ep support paddle batch_gemm (#7337)
* moe bf16 ep support paddle batch_gemm
2026-04-11 21:51:12 +08:00
AIbin ba01d7a823 [Optimization] [OP] [Models] dsk del prefill mask (#7313)
* dsk del prefill mask

* dsk support 1M+ seq_len rope

* update rope tests
2026-04-11 19:32:27 +08:00
K11OntheBoat 870dbac370 Use triton qk_norm both in Prefill and Decode (#7213)
Co-authored-by: “liuruian” <liuruian@baidu.com>
2026-04-10 15:44:01 +08:00
bukejiyu 14d46181b8 [Loader] add multi-thread model loading (#6877)
* multi-thread-loader

* fix ut
2026-04-09 23:40:15 -07:00
周周周 1782872d61 add deep_ep hopper test (#7206)
Co-authored-by: “liuruian” <liuruian@baidu.com>
2026-04-09 17:23:54 +08:00
fxyfxy777 39ff38aba1 [OP]Unify MoE op with moe_permute path for bf16 GLM (#7164) 2026-04-09 16:17:56 +08:00
cloudforge1 85c6773e6c [CI]【Hackathon 10th Spring No.33】config 单测补充 (#6730)
* [CI]【Hackathon 10th Spring No.33】config 单测补充

* fix test_commit_config: reset fields before partial-file test

* [CI]【Hackathon 10th Spring No.33】boost delta coverage for architecture helper branches

* [CI]【Hackathon 10th Spring No.33】add version attr to model config mock

* [CI]【Hackathon 10th Spring No.33】add mrope, runner validation, tail_layer coverage

* [CI]【Hackathon 10th Spring No.33】boost: cover 96 more lines (FDConfig assertions, guided decoding, env branches)

* [CI]【Hackathon 10th Spring No.33】config unit test

* [CI]【Hackathon 10th Spring No.33】cover expert parallel branch

* fix: reset commit hash before _load_from_version_file test; block cuda import via setitem(None)

* refactor: convert to unittest.TestCase style per reviewer request

---------

Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: Tao Luo <luotao02@baidu.com>
2026-04-09 14:28:54 +08:00
cloudforge1 cefc724607 [CI]【Hackathon 10th Spring No.29】engine unit test (#6771)
* [CI]【Hackathon 10th Spring No.29】engine unit test

Merge with upstream test_engine.py (PR #7083) and add comprehensive
coverage for LLMEngine: lifecycle, worker signals, requests, utils,
stop_profile, and start error handling.

* fix: add deploy_modality to _make_cfg() — Copilot review

---------

Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-04-09 13:45:59 +08:00
JYChen 43ace7af25 [RL] support moe-topk use topk_reduce_func (#7218)
* support moe-topk use topk_reduce_func

* fix ep error

* fix ut

* fix ut
2026-04-09 11:01:03 +08:00
AIbin 48d2bbeb74 fix dsa (#7252) 2026-04-08 20:21:38 +08:00
3em0 3749457476 [BugFix] fix multimodal hasher hash collision risk when ndarray shape or dtype differs (#7185)
numpy tobytes() only serializes raw element bytes without encoding shape
or dtype metadata. This means arrays with identical raw bytes but
different shapes (e.g. (6,4) vs (4,6)) or different dtypes (e.g.
float32 vs uint8 reinterpretation of same memory) produce the same
SHA-256 digest, leading to silent cache collisions in
ProcessorCacheManager / EncoderCacheManager / PrefixCacheManager.

Prepend a "{shape}|{dtype}|" header to the byte payload before hashing
so that shape and dtype participate in the digest.

Added test cases for shape and dtype sensitivity.
2026-04-08 04:26:02 -07:00
Jiaxin Sui fbc3aa93de [XPU][CI] Remove duplicate NICs from environment variables (#7244) 2026-04-08 19:14:15 +08:00
YuBaoku 4cd574cf90 [CI] Reduce execution time for ngram kernel tests (#7242) 2026-04-08 16:54:46 +08:00
guozhuangzhuang 757bafe3bd [Engine][DataProcessor] fix decode token (#7102) 2026-04-08 15:41:32 +08:00
K11OntheBoat bb48bcbaa2 Split enable_mm (#7183)
Co-authored-by: liuruian <liuruian@MacBook-Pro.local>
2026-04-08 11:25:41 +08:00
luukunn 8496ec71a6 [DataProcessor] Move image_processor to unified directory and add MultiModalProcessor (#7109)
* first commit

* step 9~10

* update multimodal

* update multimodal

* fix load tokenizer

* add unit test

* fix unit test & AdaptiveImageProcessor

* Delete unused code
2026-04-08 10:16:27 +08:00
cloudforge1 c529c2ad98 [Optimization]【Hackathon 10th Spring No.49】GPU ngram_match: BlockScan Phase 2 -optimized (#7136)
* Port ngram_match and hybrid_mtp_ngram kernels to CUDA

Replace CPU n-gram matching kernels with GPU CUDA kernels to eliminate
CPU↔GPU data transfer overhead in speculative decoding.

Key changes:
- ngram_match.cc → ngram_match.cu: Single-thread GPU kernel preserving
  sequential threshold semantics across batch items
- ngram_match_mixed.cu: Replace CPU function with __global__ kernel
- ngram.py: Remove ~10 .cpu() tensor copies, pass GPU tensors directly
- mtp.py: Remove .cpu()/.cuda() round-trips and CUDAPinnedPlace copies

Design: <<<1,1>>> single-thread kernels (same approach as TensorRT-LLM).
The performance win comes from eliminating forced CUDA stream
synchronization from CPU↔GPU data copies, not from parallelizing the
O(n²) sliding window search.

* Add correctness + latency test for GPU ngram kernels

* Fix test data: step_idx semantics and ngram-matchable patterns

* fix: add CPU fallback path for ngram_match and hybrid_mtp_ngram ops

Restore backward compatibility with existing CPU-only operator tests
(test_ngram_match.py, test_hybrid_mtp_ngram.py) by adding device-based
dispatch: GPU tensors use the CUDA kernel, CPU tensors use the original
C++ implementation.

* fix(test): wrap imported ops with staticmethod to prevent self-binding

Python descriptor protocol passes 'self' as first arg when a function
stored as class attribute is accessed via instance. Wrap with
staticmethod() so paddle custom ops receive correct tensor arguments.

* fix(test): ensure max_model_len >= input_len to prevent broadcast error in latency test

* fix: keep input_ids_len on CPU in __init__, move to GPU in _run_impl

Reverts line 39 to match develop (keeps .cpu()) so diff-cover
no longer flags it as an uncovered changed line. The tensor is
moved to GPU via .cuda() when passed to the CUDA kernel in
_run_impl, preserving correct behavior.

* Extract shared ngram search into __device__ helper (ngram_match_common.cuh)

Per upstream requirement: '两个Kernel逻辑有较为相似部分,Kernel
形式为提取共用的匹配逻辑,外加业务逻辑'

The core ngram sliding-window search + token copy logic is now defined
once in ngram_match_common.cuh as two __device__ __forceinline__
functions:
  - ngram_search_and_copy: single-haystack sliding window match
  - ngram_search_batch_item: two-phase search (input_ids then pre_ids)

Both kernels call ngram_search_batch_item with their business-specific
parameters:
  - ngram_match_kernel: write_offset=1, min_ngram_size=1
  - ngram_match_mixed_kernel: write_offset=ori_seq_len_this_time,
    min_ngram_size=configurable

No functional change. CPU fallback paths unchanged.

* refactor: parallel CUDA kernels for ngram_match (<<<bsz,256>>> search)

Two-phase parallel architecture addressing reviewer feedback:
- Phase 1: <<<bsz, 256>>> — parallel sliding-window ngram search
  using atomicMin64 CAS loop for leftmost-match semantics
- Phase 2: <<<1, 1>>> — serial threshold + token copy (inter-batch
  dependency via running sum of seq_lens_this_time)

Phase 1 is O(bsz × seq_len × ngram_size) distributed across bsz × 256
threads.  Phase 2 is O(bsz × max_draft_tokens) — negligible.

Shared code extracted into ngram_match_common.cuh:
  NgramMatchResult struct, atomicMin64, parallel_ngram_search,
  4 kernel functions (search+gather for both kernel types)

Tests: 6 new large-scale correctness tests with env-var threshold
override — bsz=256/seq_len=128k, bsz=1/seq_len=128k, bsz=256/seq_len=1k
for both ngram_match and hybrid_mtp_ngram.

* fix: move __global__ kernel defs from .cuh to .cu files (fix linker multiple-def error)

Both ngram_match.cu and ngram_match_mixed.cu include ngram_match_common.cuh.
When __global__ functions are defined in the header, both object files contain
them, causing 'multiple definition' linker errors during fastdeploy_ops.so link.

Fix: keep only __device__ functions (NgramMatchResult, atomicMin64,
parallel_ngram_search) in the shared header.  Move __global__ kernel
definitions into each respective .cu file.

Net code change: +304/-304 (zero net lines).

* fix: align mixed kernel signatures with host function tensors

Fix 7 type-mismatch compilation errors in ngram_match_mixed.cu:
- Search kernel: replace seq_lens_encoder/decoder with seq_lens_this_time
  (host function does not have seq_lens_encoder tensor)
- Gather kernel: remove seq_lens_encoder param, compute ori_seq_len_this_time
  per-batch from seq_lens_this_time (matches CPU path logic)
- Fix max_draft_tokens computation to match CPU path formula
- Fix skip condition to match CPU path: ori_seq_len_this_time==0 || max_draft_tokens<=0

* 【Hackathon 9th No.49】Replace serial Phase 2 with CUB BlockScan parallel threshold

Phase 2 gather kernel now launches <<<1, 1024>>> threads with CUB
BlockScan prefix-sum for parallel threshold enforcement, replacing
the serial <<<1,1>>> loop.

Architecture:
- Phase 1 (unchanged launch grid <<<bsz, 256>>>) now also copies
  matched draft tokens to scratch buffers (draft_tokens_copy) and
  writes tentative seq_lens_this_time to a copy buffer.
- Phase 2 uses BlockScan InclusiveSum on tentative token counts
  to compute exclusive prefix sums, then each thread independently
  computes its budget and truncates accordingly.

Both ngram_match.cu and ngram_match_mixed.cu updated.
Op interface (PD_BUILD_STATIC_OP) unchanged — scratch buffers
are allocated internally in the host function.

* fix: resolve Copilot/bot review comments on PR #7136

- Remove dead NgramMatchResult writes from both Phase 1 kernels
- Fix encoder-active init: default seq_lens_this_time_copy=0, set 1 for active
- Add remaining_active budget deduction to mixed gather kernel (parity)
- Add PD_CHECK(max_batch_size <= NGRAM_GATHER_THREADS) to both host functions
- Remove unused match_buf/match_results allocation from both host functions
- Pass seq_lens_encoder to Phase 2 gather for encoder-active skip
- clang-format applied

* test: add multi-scale latency benchmark (batch 32→1024)

Adds test_latency_scaling that benchmarks GPU kernel vs CPU path at
batch sizes 32, 128, 256, 512, 1024 with input_len=512.
Shows Phase 2 BlockScan scaling and per-batch-item amortization.

* cleanup: remove unused kernel params, dead struct, add benchmark env gate

- Remove unused max_draft_tokens_param from ngram_match_search_kernel
  (draft_token_num[batch_idx] already covers the constraint)
- Remove unused seq_lens_decoder from ngram_match_mixed_search_kernel
  (only used in gather kernel, not search kernel)
- Remove dead NgramMatchResult struct from ngram_match_common.cuh
- Add BENCHMARK_NGRAM env gate to test_latency and test_latency_scaling
  (prevents benchmark tests from inflating CI runtime)

* revert: remove benchmark env gate — let CI run benchmarks

* fix: address Copilot review — GPU mirror for input_ids_len, device fix in mtp, benchmark timing isolation

* fix: correct stale comment in mixed gather (at-least-ori → 1-token)

* bench: add 5-group benchmark matching NKNaN methodology

Groups: seq_len, batch_size, ngram hit pattern, threshold, threshold×batch.
Data creation outside timing loop. GPU kernel vs CPU-copy path.

* fix: rename benchmark for CI discovery, bump to 10k iterations

- Renamed benchmark_ngram_kernel.py → test_benchmark_ngram_kernel.py
  so pytest discovers it (test_*.py pattern)
- Bumped NUM_ITERS 10→10000, WARMUP 2→5 for noise-free profiling
- Gated benchmark class with RUN_NGRAM_BENCHMARKS=1 (won't bloat CI)

* fix: correct stale filename in benchmark docstring

* fix: move PD_CHECK before Phase 1 launch (fail-fast)

* bench: remove env-gate from benchmark groups, cut NUM_ITERS to 1000

Benchmark groups 1-5 now run unconditionally in CI (~9s total).
Env-gates moved to separate PR #7170.

* fix: address Copilot review — conditional return, defensive guards, GPU placement

- ngram_match.cu: add remaining<=0 early return, conditional return
  only when tokens produced (matches CPU continue behavior), include
  encoder-active items in Phase 2 threshold-budget scan
- ngram_match_mixed.cu: split max_draft_tokens into explicit steps to
  prevent negative intermediates, conditional return only when tokens
  produced, add seq_lens_decoder invariant comment
- ngram.py: explicit .cuda() on input_ids_len_gpu creation
- test_ngram_gpu_kernel.py: use CPUPlace() in latency benchmark to
  measure actual D2H/H2D roundtrip

* fix: clarify CAS comment, fix negative intermediate in CPU fallback

- Add CAS non-atomic initial read comment in atomicMin64 (#3031826678)
- Split draft_budget into explicit int64_t steps in CPU fallback (#3031240456)

* perf: A1 (1024 threads) + A2 (early-exit) + fix B1 UB in ngram_match

- NGRAM_BLOCK_THREADS 256→1024: 4× thread parallelism per block
- Add early-exit break when position exceeds current best match
- Fix __ballot_sync UB: was inside divergent if(match) + loop break,
  revert to plain atomicMin64 (contention-free since matches are rare)
- Update stale '256 threads' comments in both .cu files

* perf: template-specialize ngram search + cache scratch buffers + fix benchmark

Kernel optimizations:
- Template-specialize parallel_ngram_search for ngram_size 1,2,3:
  register-cached ngram tokens, #pragma unroll, __restrict__ hints
- Cache Phase 1→2 scratch buffers (grow-only static paddle::Tensor)
  to eliminate per-call paddle::empty allocation overhead

Benchmark fix:
- Pre-allocate output tensors once, use fill_() in timing loop
  instead of creating new paddle.zeros/ones each iteration
  (removes ~20-40µs measurement noise per iteration)

---------

Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
2026-04-07 01:36:25 -07:00
Nana 367d37b523 fix typo (#7147) 2026-04-07 16:30:32 +08:00
Bingoo 2068656a85 [Optimization] merge matmul and add (#6986)
* merge matmul and add

* modify format

* using paddle.nn.functional.linear

* using _C_ops.linear

* using paddle.nn.functional.linear

* add FLAGS_use_legacy_linear env var in test case

* fix format

* add assert and remove env

* modify format

* using matmul for no bias

* modify accurate baseline
2026-04-03 18:02:03 +08:00
lizexu123 5f612a348d [BugFix] fix flashinfer-cutedsl moe nvfp4 (#7120)
* fix nvfp4

* fix

* add document

* fix nvfp4

* support eb5

* support bka

* support eb5

* support xpu

* fix

* fix

* add import cutedsl

* fix

* fix

* fix test

* fix H卡

* update document

* fix

* update document

* update document

* fix
2026-04-03 15:43:19 +08:00
luukunn 562fa31791 [BugFix]fix extract_tool_calls (#7154)
* fix extract_tool_calls
2026-04-02 21:18:37 +08:00
Yonghua Li 98f3fc9267 [RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests (#7083)
* [test] add a few unit tests

* [feat] update key prefix when model weights are updated

* [test] try to fix test_worker_process
2026-04-02 19:58:41 +08:00
fxyfxy777 9f3b3ce7f5 [Optimization] merge_allreduce (#7039) 2026-04-02 19:52:13 +08:00
Yuanle Liu 1af7f80811 Revert "[BugFix][Speculative Decoding] Correct index calculation in speculate…" (#7133)
This reverts commit ba1aa1edff.
2026-04-01 06:54:23 -07:00
luukunn fa7a84926d [Optimization]Fix tool parser (#7079)
* fix tool parser
2026-04-01 21:20:34 +08:00
lonelygsh ba1aa1edff [BugFix][Speculative Decoding] Correct index calculation in speculate decoding operators (#7121)
- Fix accept_idx calculation in spec_set_value_by_stop_seqs
- Fix condition check from < to <= for token matching
- Fix accept_tokens indexing logic
- Remove unnecessary -1 in current_step comparison for max_think_len

Co-authored-by: guanshihui] <guanshihui@baidu.com>
2026-04-01 05:36:53 -07:00
cmcamdy 7a2e33098f [XPU] Refactor pre process (#6993)
* [XPU] support speculate_pre_process

* merge develop

* fix codestype

* fix mtp, support cu_seqlens_q_output

* fix mtp, support cu_seqlens_q_output

* fix test

---------

Co-authored-by: lizan1999 <lizan03@baidu.com>
2026-04-01 20:29:55 +08:00
luukunn fdfc908e2f [Others] reuse unit test (#7127) 2026-04-01 18:36:00 +08:00
sunxin c29e86fc9d [Feature] Support mtp overlap schedule (#7001) 2026-04-01 14:24:26 +08:00
YuBaoku c6f0c5c3a6 [CI] Optimize test execution with single-GPU parallelism (#7085)
* [CI] Optimize test execution with single-GPU parallelism and log collection

* remove export CUDA_VISIBLE_DEVICES

* fix path error

* fix log_* path and debug

* [CI] Optimize test execution with single-GPU parallelism and log collection
2026-04-01 14:18:40 +08:00
zhouchong 91c832f607 [Feature] Add logging parameters and error output to terminal (#7098) 2026-04-01 13:18:42 +08:00
luukunn 3651113ee5 [DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052)
* remove ENABLE_V1_DATA_PROCESSOR

* fix unit test

* fix unit test
2026-04-01 09:53:41 +08:00
qwes5s5 ee2b965f5f adjust config info (#7054) 2026-03-31 21:26:05 +08:00
cloudforge1 5c5dc66aa7 [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 (#6731)
* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充

* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充
---------

Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-03-31 15:29:35 +08:00
qwes5s5 daa95244f7 abort requests (#6992) 2026-03-31 11:02:26 +08:00
Yonghua Li 6d9739f360 [BugFix] fix speculative gauge metrics in multi api server (#7082) 2026-03-31 10:52:50 +08:00
chenjian 6727df8286 [Optimization] Optimize ttft for prefill pd (#6680)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix

* fix ci

* fix format

* update according to review

* add comment

* fix

* fix format
2026-03-30 20:36:23 +08:00
jackyYang6 05f2d95729 [RL] Adapt async rollout checkpoint update flow (#7042)
* update checkpoint-transfer flow and control update_weights params

* test: add update_weights route validation
2026-03-30 19:19:34 +08:00
yzwu 8789329457 [Iluvatar] Support wi4a16 group_gemm (#7078) 2026-03-30 19:03:51 +08:00
kevin 18062c55bb [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys (#6929)
* [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [BugFix][KVCache] Fix test_get_block_hash_extra_keys_boundary_cases assertions

## Motivation

测试用例 `test_get_block_hash_extra_keys_boundary_cases` 中,Block [4,8) 的
调用错误地传入了 `mm_idx=1`,跳过了 img0[2,5);但 img0 覆盖 token 4,token 4
属于 block [4,8),应被包含在 hash_keys 中。此外,所有 assertEqual 只校验了
hash_keys,未校验返回的 mm_idx 游标。

## Modifications

- `test_get_block_hash_extra_keys_boundary_cases`:
  - 改为链式调用,用上一次返回的 mm_idx 作为下一次入参,模拟真实调用循环
  - Block [4,8) 入参从 `mm_idx=1` 改为沿用上次返回的 `mm_idx=0`,期望值从 `[]` 改为 `["hash-0"]`
  - 所有断言改为 `assertEqual((mm_idx, hash_keys), (...))` 同时校验游标
- `test_get_block_hash_extra_keys_no_overlap_at_boundaries`:
  - Case B 入参从 `mm_idx=1` 改为 `mm_idx=0`(从头遍历,img-a 走 continue)
  - 所有断言增加 mm_idx 校验
- `test_get_block_hash_extra_keys_image_crosses_block_boundary`:
  - 所有断言增加 mm_idx 校验
- `test_get_block_hash_extra_keys_no_mm_inputs`:
  - 断言增加 mm_idx 校验
- `test_get_block_hash_extra_keys_handles_multimodal_segments`:
  - call2、call3 断言增加 mm_idx 校验

## Usage or Command

```bash
python -m pytest tests/cache_manager/test_prefix_cache_manager.py::TestPrefixCacheManagerCoverage -v -k "get_block_hash_extra_keys"
```

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: chengyanfu <chengyanfu@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 17:13:31 +08:00
luukunn b9f8873367 [Optimization]Merge Text processor (#7030)
* merge text processor

* update

* fix unit test

* merge messages2ids

* fix unit test

* 删除重复代码

* remove redundant code

* delete code

* fix unit test
2026-03-30 15:02:35 +08:00