Human action recognition based on convolutional neural networks and vision transformers