Sporadic audio visual embodied navigation for human tracking