Video object segmentation : from single-modality to multi-modality