MultiQueryDecoderAttention
* split MultiQueryDecoderAttention template_instantiation * update comment * CI
MultiQueryAppendC8Attention
* split MultiQueryAppendC8Attention template_instantiation * update setup_ops.py * fix ci * fix bug