Improving the reliability of language models for summarization