Model-Based Self-Supervision for Fine-Grained Image Understanding