Detecting Similar HTML Documents Using A Sentence-Based Copy Detection Approach