Parameter and Data Sparsity for Efficient Training of Large Neural Networks