FAQ

Answers to common questions about coverage, interpretation, and practical use of the dataset.

Why are some records sparse?

Because registry disclosure policies differ. Some TLDs expose detailed registrant fields, others redact them heavily, and some responses fail entirely. Empty cells should not automatically be treated as parser errors.

Does this site provide a live WHOIS API?

No. The site publishes snapshot files and documentation. The design goal is reproducible research and bulk analysis, not per-request online lookup.

Why package the file as a tar.gz archive?

The archive is smaller to distribute and easier to checksum as a release artifact. It also avoids treating a very large CSV as if it were a webpage asset to be parsed in-browser.

Can I use this for security investigations?

Yes, but usually as enrichment rather than a standalone verdict source. It becomes more valuable when joined with DNS, IP-to-ASN, certificate, or abuse data.

How should I cite a release?

Use the build time and checksum in meta.json. If a result depends on a specific snapshot, the checksum is the clearest identifier.

What is the sample CSV for?

The sample helps you inspect schema and value patterns quickly without downloading the full archive. It is useful for prototyping queries, tutorials, and parser checks.

Why keep status and error fields?

Because a row with little metadata is not the same as a row that failed to collect. Preserving those distinctions prevents misleading downstream analysis.

How often is the dataset updated?

Updates follow the project’s collection schedule and the underlying input list refresh. Each release should be treated as a new point-in-time artifact rather than an in-place mutable file.