all_reduce
* add custom op declaration * roll back try except
* [Feature] support custom all-reduce * add vllm adapted