sunxin
|
51f812aaa4
|
fix empty get_padding_offset (#6462)
|
2026-02-12 12:34:23 +08:00 |
|
周周周
|
8277b95fa6
|
remove speculate_get_padding_offset op (#6308)
|
2026-02-03 15:18:12 +08:00 |
|
sunxin
|
adc69c15d0
|
[Model Runner] Prepare token count and move FA3 initialization into the graph (#6170)
* prepare for token num and put FA3 init in graph
|
2026-01-26 12:16:57 +08:00 |
|
yzwu
|
ac013803f3
|
[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555)
|
2025-12-18 02:14:25 -08:00 |
|
xiaozude
|
df67379bc3
|
[Metax] modify wrapSize to WARP_SIZE (#5442)
|
2025-12-09 01:44:02 -08:00 |
|
K11OntheBoat
|
8d99bac532
|
Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
|
2025-12-09 14:17:30 +08:00 |
|
周周周
|
937eb3c6ed
|
[get_padding_offset.] clean get_padding_offset.cu (#4777)
[get_padding_offset.] clean get_padding_offset.cu (#4777)
|
2025-11-05 10:47:40 +08:00 |
|
Ayakouji
|
453487d5b0
|
[Feat] ernie4_5_vl_moe support CudaGraph (#3226)
* delete dynamic control flow for decode
* coda-style
* fix scatter/gather typos and use input stream instead default stream
* support 0-Size Tensor
* update runner and model
* using static mem address as input
* fix mem leak
* refine code
* update mm_buffer
* fix typo
* fix buffersize
* fix unk token
* refine code
* refine
* support other arch
* open cudagraph in vlci
* fix
* update
* update
* update
* fix cmd
* update
---------
Co-authored-by: aquagull <hongyuh@qq.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-09-10 13:11:57 +08:00 |
|
lifulll
|
72094d4d82
|
enable dcu ci (#3402)
|
2025-08-29 10:23:08 +08:00 |
|
lizexu123
|
32b39620bc
|
[Code Simplification] remove cum_offsets (#3410)
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
|
2025-08-18 20:21:25 +08:00 |
|
周周周
|
1339e56282
|
[XPU] Remove padding_offsets from get_padding_offset.cu (#2911)
|
2025-07-18 14:16:44 +08:00 |
|
周周周
|
ddb10ac509
|
[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
|
2025-07-17 18:41:31 +08:00 |
|
liddk1121
|
1b54a2831e
|
Adapt for iluvatar gpu (#2684)
|
2025-07-07 16:53:14 +08:00 |
|
jiangjiajun
|
684703fd72
|
[LLM] First commit the llm deployment code
|
2025-06-09 19:20:15 +08:00 |
|