Bustabit RNG Consistency Report: 100 Million Rounds, Two Halves, One Verdict
Most auditors check a casino once and move on. We split Bustabit's entire 100-million-round hash chain into two halves and asked: Is the RNG consistent from start to finish? Seven statistical tests. One answer.
Most provably fair audits check a casino once and call it a day. A green checkmark, a thumbs-up, move on.
We don't work like that.
This is the first published longitudinal consistency report for any crypto casino. We took Bustabit's entire 100-million-round Era 3 hash chain, split it into two halves — 50 million older games vs. 50 million newer games — and asked one question:
Does the RNG behave the same way across the entire chain, or did something change?
If a casino manipulates its RNG partway through a hash chain — even subtly — the statistical fingerprint shifts. Our 11-test comparison framework is designed to catch exactly that.
What We Tested
| Detail | Value |
|---|---|
| Casino | Bustabit (Era 3) |
| Game | Crash |
| Total rounds | 100,000,000 |
| Algorithm | HMAC-SHA256, hash chain (raw bytes) |
| Salt | Bitcoin block hash (...1e08b7fd...) |
| Seeding event | BitcoinTalk #5560454 |
| Sample size per half | 500,000 (every 100th round) |
| Phase A | Games 50,000,001–100,000,000 (older half) |
| Phase B | Games 1–50,000,000 (newer half) |
Original audit: Bustabit Provably Fair Audit Report — 100,000,000 Rounds Analyzed
Why This Matters
A standard provably fair audit verifies that individual hashes are correct. That's table stakes — any calculator can do it.
A consistency report answers a harder question: Is the distribution stable over time?
Here's why that matters:
- A casino could use a perfectly valid HMAC-SHA256 algorithm but choose a biased salt for one segment of the chain
- A hash chain could be partially regenerated after a seeding event if the operator controls the infrastructure
- Subtle distribution drift (e.g., 0.5% fewer high multipliers in newer games) would be invisible to single-hash verification but detectable in aggregate statistics
Our comparison module runs 11 independent statistical tests — each attacking the problem from a different mathematical angle. If the RNG changed behavior anywhere in the chain, at least one test will flag it.
Results: 11/11 Tests Passed
| Test | Result | Statistic | p-Value | Interpretation |
|---|---|---|---|---|
| Two-Sample Kolmogorov-Smirnov | ✅ PASS | D = 0.00233 | 0.132 | Distributions are statistically identical |
| Chi-Square Homogeneity | ✅ PASS | χ² = 110.06 | 0.210 | Bin distributions are homogeneous |
| Jensen-Shannon Divergence | ✅ PASS | JSD = 0.0001 bits | — | Distributions are virtually identical |
| Welch's t-Test | ✅ PASS | t = -1.167 | 0.243 | Means are statistically equal |
| F-Test (Variance Ratio) | ✅ PASS | F = 1.002 | 0.425 | Variances are statistically equal |
| Cohen's d (Effect Size) | ✅ PASS | d = -0.002 | — | Negligible — no practical difference |
| Bhattacharyya Distance | ✅ PASS | BD = 0.00006 | — | Distributions are virtually identical |
| Mann-Whitney U | ✅ PASS | z = -1.166 | 0.243 | Rank distributions are identical |
| Anderson-Darling | ✅ PASS | A² = 1.318 | 0.100 | Tail distributions are identical |
| Cramér-von Mises | ✅ PASS | T = 0.046 | 0.901 | Integral CDF distance negligible |
| Energy Distance | ✅ PASS | E = 0.0000255 | — | Distributions are identical |
Consistency Score: 9.2 / 10 — CONSISTENT
Verdict: RNG output is statistically identical across both halves of the chain. No evidence of manipulation, degradation, or behavioral change.
What the Numbers Mean
Kolmogorov-Smirnov (p = 0.132)
The KS test compares the full cumulative distributions of both halves. A p-value of 0.132 means there's a 13.2% chance of seeing this level of difference between two samples drawn from the same distribution. Perfectly normal. We'd only flag at p < 0.01.
Jensen-Shannon Divergence (0.0001 bits)
JSD measures how much information you'd need to distinguish one distribution from the other. At 0.0001 bits, the answer is: virtually none. For reference, our red flag threshold is 0.01 — this result is 100x below the alarm level.
Cohen's d (-0.002)
Effect size tells you whether a difference is practically meaningful, even if statistically detectable. At d = -0.002, the difference is negligible — you'd need millions of bets to even notice it in your bankroll.
Mann-Whitney U (p = 0.243)
A non-parametric rank-based test that compares whether one distribution tends to produce higher values than the other. Unlike the t-test, it makes no assumptions about the shape of the distribution. p = 0.243 means the ranks are indistinguishable.
Anderson-Darling (p = 0.100)
Similar to KS but with extra sensitivity at the tails of the distribution — exactly where manipulation would show up first. A² = 1.318 is well within normal range.
Cramér-von Mises (p = 0.901)
Measures the integral of the squared difference between the two CDFs. More sensitive than KS to small, distributed shifts. p = 0.901 is about as clean as it gets.
Energy Distance (E = 0.0000255)
A modern test (Székely & Rizzo, 2004) based on pairwise distances between samples. Can detect any type of distributional difference — location, scale, or shape. At 0.0000255, the distance is essentially zero.
How We Did It
- Collected the full chain: Starting from the most recent game hash, we walked Bustabit's Era 3 hash chain backward through all 100,000,000 rounds. Each crash point was computed using the exact Bustabit algorithm (HMAC-SHA256 → 52-bit seed → crash formula).
- Sampled uniformly: Every 100th round was recorded as a raw float (the uniform [0,1) value before crash conversion), giving us 1,000,000 representative samples.
- Split the chain: First 500,000 samples (older 50M games) = Phase A. Last 500,000 samples (newer 50M games) = Phase B.
- Ran the comparison: Our 11-test consistency framework compared the statistical fingerprints of both halves.
Total compute time: 410 seconds for collection + under 5 seconds for the consistency analysis.
Methodology
Full details on our testing framework, including NIST SP 800-22, PractRand, and TestU01 integration: fairplayaudit.com/methodology
Open Source
Every tool we used is publicly available. Verify our results yourself:
- Scorecard Tool: github.com/GuidoHam/provably-fair-audit
- Consistency Comparator:
src/consistency-comparator.js - Bustabit Collector:
src/bustabit-collector.js
What This Report Does NOT Tell You
Let's be clear about the limits:
- This report confirms statistical consistency across the hash chain. It does not guarantee future behavior if Bustabit switches to a new chain or algorithm.
- Provably fair means the outcomes are verifiable. It does not mean the house edge disappears. Bustabit's 1% house edge is mathematically baked in — that's by design, not manipulation.
- Consistency ≠ profitability. The RNG is fair. The math still favors the house. Those are two separate facts.
The Bigger Picture: Why Longitudinal Monitoring Matters
Anyone can audit a casino once. We plan to do it continuously.
This Bustabit consistency report is the first in a series. Our roadmap:
- Multi-casino baselines: CryptoGames, Wolf.bet, Bitsler, and more
- Weekly automated monitoring: Fresh data, fresh comparisons, anomaly alerts
- Public watchlist: A live dashboard showing which casinos are consistent — and which aren't
Other auditors tell you a casino was fair on audit day. We tell you it's still fair today.
Report ID: FPA-CONSISTENCY-1781762933375
Generated: June 18, 2026
Auditor: FairPlay Audit (fairplayaudit.com)
Framework: NIST SP 800-22 + PractRand + TestU01 + Custom Consistency Module