Shapley value based multi-agent reinforcement learning: theory, method and its application to energy network