Natural Language as a Scaffold for Visual Recognition