[get_padding_offset.] clean get_padding_offset.cu (#4777)
* add test_get_padding_offset * fix * fix * fix