This page explains what is included in a release, how the site frames collection outcomes, and why sparse or inconsistent records should be expected rather than treated as errors in all cases.
The release is aligned to a Tranco-based domain list rather than an arbitrary crawl frontier. That choice favors reproducibility and broad comparability with Internet measurement workflows. The dataset should therefore be read as "registration metadata for a fixed, ranking-style domain set at collection time," not as an exhaustive census of all registered domains.
Rows may come from WHOIS, RDAP, or a combination of collection logic recorded in the source field. The site does not hide those distinctions because source choice matters: some registries disclose more detail via RDAP, some remain WHOIS-centric, and some responses contain policy-driven redaction that cannot be "recovered" by parsing harder.
These choices are conservative on purpose. The project favors transparency over aggressive imputation.
WHOIS and RDAP data are heterogeneous. Registrant details may be blank because of privacy protection, legal policy, registrar defaults, or registry-specific formatting. Date fields are especially prone to variation in timezone handling and formatting. Nameserver fields may include multiple values in a single cell, and registrar strings may represent the same organization with minor textual variation.
For that reason, the dataset works best as a starting point for analysis, not as an unquestioned truth source. When a conclusion depends on a small subset of rows, it is often worth manually verifying examples against the relevant registry or registrar service.
Use the build time, CSV checksum, and archive checksum from meta.json when citing a release. Store the exact archive used in a project directory or lab bucket, rather than relying on the assumption that a future rebuild will match. If you enrich rows with DNS, hosting, or abuse data, version those enrichments separately so your pipeline distinguishes source drift from WHOIS drift.