Context on what your genome is, how scan types differ, and how Varia scans your file locally.
A human genome is a sequence of approximately 3 billion base pairs split across 23 pairs of chromosomes. Inside that sequence sit around 20,000 protein-coding genes, but those genes account for only about 1.5 percent of the total. The remaining 98.5 percent is regulatory sequence that controls when genes turn on, non-coding RNAs that perform their own functions, structural elements that organize the chromosome, repetitive sequence, and regions whose function the literature has not yet resolved. The genome is far more than the genes.
Within those 3 billion bases, roughly 10 million positions are commonly variable across the human population. A variant at one of these positions is what makes one person's biology slightly different from another's at the molecular level. Most variants are biologically silent. A small fraction sit in or near a gene in a way that materially changes how a protein is built, expressed, or regulated, and a smaller fraction still have been studied carefully enough that we can say what the variant means with clinical confidence.
That fraction is what Varia curates. Out of approximately 10 million common variants in the human genome, Varia covers 84 SNPs that we have evaluated against the peer-reviewed literature and judged interpretable enough to act on. Each of those 84 SNPs expands to a per-genotype finding, which gives 221 insights across the 12 health domains Varia covers in V1.
This is a deliberately small number. The reason is the gap between what genomics can read and what it can interpret.
In 2003, when the first human genome was completed, sequencing one human genome cost approximately one billion dollars and took 13 years. By 2026, sequencing the same genome costs under 200 dollars and takes a few days. That is roughly a six-order-of-magnitude improvement in throughput per dollar over 23 years. Over the same period, clinical interpretation of what individual variants mean has grown roughly three orders of magnitude through peer-reviewed publication. The two curves are not parallel. The gap between what we can sequence and what we can interpret has widened every year since the first human genome was published, and there is no reason to expect it to close soon.
Varia commits to the slower, harder side of that gap. The product is the editorial discipline that decides which findings cross the bar from "studied" to "actionable," not the count of variants surfaced. Promethease and similar consumer genomics products operate at the other end of the same gap, returning tens of thousands of raw associations with no curation between the user and the literature. Both approaches have their place. Varia's bet is that for most users, most of the time, the question is not "what does my genome say about every variant ever studied" but "what does my genome say that I and my physician can do something about."
The remaining 98 to 99 percent of the genome is real, and meaningful, and worth study. Varia's V2 and beyond will continue to add curated domains as the literature matures. What Varia will not do is enumerate the unstudied.
Not every scan reads the same fraction of the genome. The three formats Varia accepts cover three very different slices, and the differences matter for what Varia can and cannot tell you.
SNP arrays are the consumer chip-based format. 23andMe, AncestryDNA, and similar services use a chip that reads roughly 600,000 to 900,000 pre-selected positions across the genome. That is approximately 0.02 to 0.03 percent of the 3 billion bases. The selected positions are chosen by the chip designer for population-level variant coverage, ancestry inference, and known clinical relevance, which means an SNP array reads what its designer thought was worth reading. If a variant matters but was not on the chip, the chip cannot detect it. Most consumer genomic findings published before 2020 use chip-based data, and most consumer products today still rely on it.
Whole exome sequencing (WES) sequences only the protein-coding portion of the genome. That is approximately 1 to 2 percent of the genome by base count, but it captures the regions where the largest fraction of disease-relevant variants have been characterized. WES sequences exhaustively within the coding regions: any variant in any exon shows up in the data, whether the variant is rare or common, known or novel. The trade-off is that WES misses variants in regulatory sequence, in introns, and in the vast non-coding regions where the literature is still actively maturing.
Whole genome sequencing (WGS) at 30x coverage reads the full 3 billion bases, with each position covered by an average of 30 independent reads to give statistical confidence in the call. WGS captures common variants, rare variants, structural variants, and novel sites that have never been catalogued. It is the most complete consumer-accessible scan format and is the format Varia is designed around. Clinical-grade WGS labs deliver results as VCF files (Variant Call Format), which Varia ingests directly.
The VCF format has one consequential subtlety: a variants-only VCF lists only positions where the user differs from the reference human genome. An all-positions VCF lists every position, including positions where the user matches reference. Most clinical labs return variants-only files. The implication is that for any position critical to a Varia finding, the absence of the position in a variants-only file is almost always "tested, homozygous reference," not "missing data." Conflating those two is the most common failure mode in naive scanner implementations, and the one Varia is designed to handle correctly. The detail of how this works is in Section 3.
Coverage is not the same as quality. A higher-coverage scan gives Varia more raw signal to interpret, but it does not change what the literature actually supports. Varia's curation discipline applies the same standard regardless of the scan format: a finding is only surfaced when the underlying variant has been studied carefully enough that the interpretation crosses the editorial bar. Higher coverage means fewer positions get returned as "not tested in your file." It does not change the count of findings the literature supports.
A scan file is not a finished interpretation. The same raw data, processed through different assumptions, can produce different findings. Varia's reading discipline is built around the cases consumer genetic data trips over most often.
Compound diplotype handling. Some clinically meaningful results depend on more than one variant working in combination. APOE genotype is the canonical example: determining which of the three APOE alleles (ε2, ε3, ε4) a chromosome carries requires reading both rs429358 and rs7412, and the clinically relevant phenotype (ε3/ε3, ε3/ε4, ε4/ε4, ε2/ε3, etc.) is the pair, not the individual SNPs. Varia explicitly builds the compound diplotype when both contributing variants are present, and surfaces a partial-confidence warning when only one is. A scanner that returned individual SNP results without composing the diplotype would miss the operationally relevant finding.
Reference inference. Clinical-grade WGS labs often return "variants-only" VCF files: the file lists positions where the user differs from the reference human genome, and silently omits positions where the user matches reference. For a position critical to a finding (rs7412 for APOE, for example), a missing position in a variants-only file is almost always homozygous reference, not missing data. Varia distinguishes the two cases at the format-detection layer: variants-only VCFs default absent positions to "tested, homozygous reference"; chip files default absent positions to "not tested in your file." The distinction matters because "you don't have the risk variant" and "we couldn't tell whether you have the risk variant" are different answers. Conflating them is the false-reassurance failure mode consumer genetic testing is most criticized for.
Multi-format support. Consumer genetic data comes in several flavors: 23andMe chip exports (v3, v4, v5 generations with different position coverage), AncestryDNA exports, and clinical WGS VCFs (GRCh37 or GRCh38 genome builds, single-sample or multi-sample, variants-only or all-positions). Each format requires its own parsing logic, build detection, position resolution (via rsID where present, coordinate when not), and strand-orientation normalization. Varia's scanner handles each format with format-aware defaults: build detection uses VCF header signals (reference line, contig lengths) when available; rsIDs are the canonical lookup key with coordinate fallback; strand orientation is normalized to plus-strand at parse time. Files that lack both detectable build signals and rsIDs are rejected with a fail-loud message rather than silently miscalled.
These three mechanics are what separate a defensible variant interpretation from a partial reading. The locked editorial discipline in /editorial-standards describes what Varia chooses to surface; this page describes how it gets there.
How pharmacogenomic variants surface in a Varia scan is covered on Medication Response.
Definitions for the genetics vocabulary Varia uses across the site. These are the same terms surfaced as inline tooltips throughout the site, collected here as a single reference.
Read next