Improving Message-Passing Performance and Scalability in High-Performance Clusters