LEARNING BOTH EXPERT AND UNIVERSAL KNOWLEDGE USING TRANSFORMERS