Search the 15.7 million websites in Google’s C4 dataset

April 22, 2023

A new search tool from the Washington Post lets you find out. The new search tool can be found in the Post’s article Inside the secret list of websites that make AI like ChatGPT sound smart. For example, Search Engine Land was used. @kevinschaul and @dataviz_szuyu did all the hard work and built this great search tool for sites. As a reminder, the C4 (which stands for Colossal Clean Crawled Corpus) is only part of the data used by Google Bard and other large language models.

The source of this news is from Search Engine Land