About — WHOIS Dataset

About this project

This site publishes a documented, point-in-time WHOIS and RDAP snapshot for a Tranco-aligned domain set. The goal is not to mirror every public response on the Internet, but to offer a research-ready release with stable structure, reproducible packaging, and enough explanation that others can understand what the data means and what it does not mean.

Why this is more than a raw dump

A raw WHOIS export is often only the beginning of analysis. TLDs differ in which fields they expose, registrars vary in formatting conventions, and RDAP versus WHOIS collection can yield different kinds of success and failure. This project adds value by presenting a consistent CSV schema, preserving collection status, publishing checksums, and documenting the release as a snapshot rather than a vague "latest" file.

That packaging matters for research. When two analysts compare results, they need to know whether differences come from methodology, from drift in the input domain set, or from using different collection dates. The dataset manifest, archive naming, and fixed release semantics are all designed to reduce that ambiguity.

Who this is for

Researchers

Use it to study registration behavior, disclosure policy differences, or longitudinal metadata change across popular domains.

Security analysts

Join it with DNS, hosting, certificate, or abuse datasets to enrich investigations and cluster infrastructure.

Students and educators

Use the sample CSV and documented columns to teach parsing, filtering, and reproducible data handling workflows.

Tool builders

Prototype lookup, normalization, and bulk analysis logic without first building a full collection pipeline.

What it is not

This dataset is not a traffic meter, a malware feed, or a live registration API. It does not claim to capture the full operational behavior of domains, and it should not be read as a definitive statement about ownership or legitimacy. Some rows are sparse because registry policy, privacy protection, or technical collection failures limit what can be observed. That is why the site keeps collection status and error information visible instead of pretending every record is equally complete.

How to read a release responsibly

Start with the manifest, which records checksums, row counts, and build timestamps. Then read the Methodology page before using the data in a paper or report. If you need a fast preview, use Sample to inspect structure and common field patterns. If you are validating an extraction or training material, consult Columns for the exported schema.

For external references, cite the build time and checksum. For internal analysis, keep a copy of the archive you actually used rather than assuming you can reconstruct a past state later.