[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533)

* init

* init

* fix format

* add

* add files

* add ut

* fix some

* add ut

* add more

* add

* fix pre-commit

* fix pre-commit

* fix cover

* skip long seq

* add

* add

* fix

* remove not need

* fix set attr

* fix comments

* fix comments

* fix failed tests

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
This commit is contained in:
gongweibao
2026-03-16 21:32:43 +08:00
committed by GitHub
parent d113397b09
commit a6351dea0b
61 changed files with 1595 additions and 171 deletions
+8 -1
View File
@@ -152,6 +152,7 @@ class ExpertService:
if self.do_profile:
get_profile_block_num = np.zeros([1], dtype=np.int32)
attempt = 0
while True:
try:
self.get_profile_block_num_signal = IPCSignal(
@@ -162,7 +163,13 @@ class ExpertService:
create=False,
)
break
except:
except Exception as e:
attempt += 1
if attempt % 30 == 0:
console_logger.warning(
f"Waiting for IPC signal 'get_profile_block_num' to be created, "
f"retried {attempt} times: {e}"
)
time.sleep(1)
self.reset_kvcache_blocks()