Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters.