Improving GPU programming models through hardware cache coherence