The old implementation uses `[[pad_id] * (max_len - len(inst)) + list(inst) for inst in insts]` to pad list sequences. This performs an $O(N \times \text{max\_len})$ list concatenation, creating many intermediate Python lists and stressing the garbage collector, before finally passing the result to `np.array(..., dtype=np.int64)`.
This change updates it to pre-allocate an empty numpy array (`np.full`) and safely populates it using numpy slicing (`padded_insts[i, :l] = inst`). The change results in a ~2x faster performance. This has been verified to be completely logically equivalent to the original un-modified processor output on a comprehensive set of test cases.
- speculate_limit_thinking_content_length: update current_base_step to
step_idx+1 (step_idx now records history count before current round);
remove incorrect step_idx decrement on accept_num truncation; mark
step_idx param as const.
- speculate_set_stop_value_multi_seqs: fix can_stop gate to use
step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx
formula (remove stale -accept_num offset); use <= condition so accept_idx
maps directly to the accepted token that ends the stop sequence; fix
accept_tokens index (remove -1).
- Update unit tests for speculate_set_stop_value_multi_seqs kernel.
* dsk del prefill mask
* dsk support 1M+ seq_len rope
* update rope tests
* Replace max_position_embeddings with max_model_len
* 1D grid: gridDim.x has a maximum size of 2^31-1, far exceeding the actual number of tokens.
* [CI] Fix prebuilt wheel installation and update Docs
* [CI] Update Dockerfile.gpu to restrict SM80/86/89/90, CUDA 12.6 and Python 3.10
* Update nvidia_gpu.md
* Update nvidia_gpu.md
* Revise NVIDIA GPU installation instructions
Updated installation instructions for PaddlePaddle and FastDeploy to remove specific CUDA version mentions and clarify support for multiple GPU architectures.
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [CI]【Hackathon 10th Spring No.33】config 单测补充
* fix test_commit_config: reset fields before partial-file test
* [CI]【Hackathon 10th Spring No.33】boost delta coverage for architecture helper branches
* [CI]【Hackathon 10th Spring No.33】add version attr to model config mock
* [CI]【Hackathon 10th Spring No.33】add mrope, runner validation, tail_layer coverage
* [CI]【Hackathon 10th Spring No.33】boost: cover 96 more lines (FDConfig assertions, guided decoding, env branches)
* [CI]【Hackathon 10th Spring No.33】config unit test
* [CI]【Hackathon 10th Spring No.33】cover expert parallel branch
* fix: reset commit hash before _load_from_version_file test; block cuda import via setitem(None)
* refactor: convert to unittest.TestCase style per reviewer request
---------
Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: Tao Luo <luotao02@baidu.com>
* [CI]【Hackathon 10th Spring No.29】engine unit test
Merge with upstream test_engine.py (PR #7083) and add comprehensive
coverage for LLMEngine: lifecycle, worker signals, requests, utils,
stop_profile, and start error handling.
* fix: add deploy_modality to _make_cfg() — Copilot review
---------
Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>