audit-report

Bustabit RNG Consistency Report: 100 Million Rounds, Two Halves, One Verdict

Most auditors check a casino once and move on. We split Bustabit's entire 100-million-round hash chain into two halves and asked: Is the RNG consistent from start to finish? Seven statistical tests. One answer.

The Ex Pit Boss

18 Jun 2026 • 4 min read

Most provably fair audits check a casino once and call it a day. A green checkmark, a thumbs-up, move on.

We don't work like that.

This is the first published longitudinal consistency report for any crypto casino. We took Bustabit's entire 100-million-round Era 3 hash chain, split it into two halves — 50 million older games vs. 50 million newer games — and asked one question:

Does the RNG behave the same way across the entire chain, or did something change?

If a casino manipulates its RNG partway through a hash chain — even subtly — the statistical fingerprint shifts. Our 11-test comparison framework is designed to catch exactly that.

What We Tested

Detail	Value
Casino	Bustabit (Era 3)
Game	Crash
Total rounds	100,000,000
Algorithm	HMAC-SHA256, hash chain (raw bytes)
Salt	Bitcoin block hash (`...1e08b7fd...`)
Seeding event	BitcoinTalk #5560454
Sample size per half	500,000 (every 100th round)
Phase A	Games 50,000,001–100,000,000 (older half)
Phase B	Games 1–50,000,000 (newer half)

Original audit: Bustabit Provably Fair Audit Report — 100,000,000 Rounds Analyzed

Why This Matters

A standard provably fair audit verifies that individual hashes are correct. That's table stakes — any calculator can do it.

A consistency report answers a harder question: Is the distribution stable over time?

Here's why that matters:

A casino could use a perfectly valid HMAC-SHA256 algorithm but choose a biased salt for one segment of the chain
A hash chain could be partially regenerated after a seeding event if the operator controls the infrastructure
Subtle distribution drift (e.g., 0.5% fewer high multipliers in newer games) would be invisible to single-hash verification but detectable in aggregate statistics

Our comparison module runs 11 independent statistical tests — each attacking the problem from a different mathematical angle. If the RNG changed behavior anywhere in the chain, at least one test will flag it.

Results: 11/11 Tests Passed

Test	Result	Statistic	p-Value	Interpretation
Two-Sample Kolmogorov-Smirnov	✅ PASS	D = 0.00233	0.132	Distributions are statistically identical
Chi-Square Homogeneity	✅ PASS	χ² = 110.06	0.210	Bin distributions are homogeneous
Jensen-Shannon Divergence	✅ PASS	JSD = 0.0001 bits	—	Distributions are virtually identical
Welch's t-Test	✅ PASS	t = -1.167	0.243	Means are statistically equal
F-Test (Variance Ratio)	✅ PASS	F = 1.002	0.425	Variances are statistically equal
Cohen's d (Effect Size)	✅ PASS	d = -0.002	—	Negligible — no practical difference
Bhattacharyya Distance	✅ PASS	BD = 0.00006	—	Distributions are virtually identical
Mann-Whitney U	✅ PASS	z = -1.166	0.243	Rank distributions are identical
Anderson-Darling	✅ PASS	A² = 1.318	0.100	Tail distributions are identical
Cramér-von Mises	✅ PASS	T = 0.046	0.901	Integral CDF distance negligible
Energy Distance	✅ PASS	E = 0.0000255	—	Distributions are identical

Consistency Score: 9.2 / 10 — CONSISTENT

Verdict: RNG output is statistically identical across both halves of the chain. No evidence of manipulation, degradation, or behavioral change.

What the Numbers Mean

Kolmogorov-Smirnov (p = 0.132)

The KS test compares the full cumulative distributions of both halves. A p-value of 0.132 means there's a 13.2% chance of seeing this level of difference between two samples drawn from the same distribution. Perfectly normal. We'd only flag at p < 0.01.

Jensen-Shannon Divergence (0.0001 bits)

JSD measures how much information you'd need to distinguish one distribution from the other. At 0.0001 bits, the answer is: virtually none. For reference, our red flag threshold is 0.01 — this result is 100x below the alarm level.

Cohen's d (-0.002)

Effect size tells you whether a difference is practically meaningful, even if statistically detectable. At d = -0.002, the difference is negligible — you'd need millions of bets to even notice it in your bankroll.

Mann-Whitney U (p = 0.243)

A non-parametric rank-based test that compares whether one distribution tends to produce higher values than the other. Unlike the t-test, it makes no assumptions about the shape of the distribution. p = 0.243 means the ranks are indistinguishable.

Anderson-Darling (p = 0.100)

Similar to KS but with extra sensitivity at the tails of the distribution — exactly where manipulation would show up first. A² = 1.318 is well within normal range.

Cramér-von Mises (p = 0.901)

Measures the integral of the squared difference between the two CDFs. More sensitive than KS to small, distributed shifts. p = 0.901 is about as clean as it gets.

Energy Distance (E = 0.0000255)

A modern test (Székely & Rizzo, 2004) based on pairwise distances between samples. Can detect any type of distributional difference — location, scale, or shape. At 0.0000255, the distance is essentially zero.

How We Did It

Collected the full chain: Starting from the most recent game hash, we walked Bustabit's Era 3 hash chain backward through all 100,000,000 rounds. Each crash point was computed using the exact Bustabit algorithm (HMAC-SHA256 → 52-bit seed → crash formula).
Sampled uniformly: Every 100th round was recorded as a raw float (the uniform [0,1) value before crash conversion), giving us 1,000,000 representative samples.
Split the chain: First 500,000 samples (older 50M games) = Phase A. Last 500,000 samples (newer 50M games) = Phase B.
Ran the comparison: Our 11-test consistency framework compared the statistical fingerprints of both halves.

Total compute time: 410 seconds for collection + under 5 seconds for the consistency analysis.

Methodology

Full details on our testing framework, including NIST SP 800-22, PractRand, and TestU01 integration: fairplayaudit.com/methodology

Open Source

Every tool we used is publicly available. Verify our results yourself:

Scorecard Tool: github.com/GuidoHam/provably-fair-audit
Consistency Comparator: src/consistency-comparator.js
Bustabit Collector: src/bustabit-collector.js

What This Report Does NOT Tell You

Let's be clear about the limits:

This report confirms statistical consistency across the hash chain. It does not guarantee future behavior if Bustabit switches to a new chain or algorithm.
Provably fair means the outcomes are verifiable. It does not mean the house edge disappears. Bustabit's 1% house edge is mathematically baked in — that's by design, not manipulation.
Consistency ≠ profitability. The RNG is fair. The math still favors the house. Those are two separate facts.

The Bigger Picture: Why Longitudinal Monitoring Matters

Anyone can audit a casino once. We plan to do it continuously.

This Bustabit consistency report is the first in a series. Our roadmap:

Multi-casino baselines: CryptoGames, Wolf.bet, Bitsler, and more
Weekly automated monitoring: Fresh data, fresh comparisons, anomaly alerts
Public watchlist: A live dashboard showing which casinos are consistent — and which aren't

Other auditors tell you a casino was fair on audit day. We tell you it's still fair today.

Report ID: FPA-CONSISTENCY-1781762933375
Generated: June 18, 2026
Auditor: FairPlay Audit (fairplayaudit.com)
Framework: NIST SP 800-22 + PractRand + TestU01 + Custom Consistency Module