Our Methodology

TL;DR: We don't trust casinos. We don't trust claims. We test raw cryptographic output against government-grade statistical standards — NIST SP 800-22, PractRand, and TestU01 — using 100,000+ rounds per game. Every dataset is published. Every test is reproducible. If you can do math, you can check our work.

Why This Page Exists

I spent 23 years as a pit boss in land-based casinos. I watched players lose money, sure — but they could see the wheel spin. They could watch the cards being dealt. Trust wasn't blind. It was built into the physical process.

Online crypto casinos removed all of that. They replaced it with a promise: "It's provably fair."

Here's the problem — provably fair proves integrity, not fairness. A casino can pass every hash check and still rig outcomes through seed timing exploits. That's not theory. That's a documented vulnerability with working proof-of-concept code.

So we built something different. A testing framework that doesn't care what the casino says. It only cares what the numbers show.

What We Actually Test

Raw Floats, Not Game Results

This is the most important architectural decision we made — and it's the same one used by GLI and eCOGRA, the gold standard in regulated gambling.

Every provably fair game works the same way under the hood:

HMAC-SHA256(server_seed, client_seed:nonce:round) → 32 bytes → float [0,1) → game result

Dice, Crash, Limbo, CoinFlip, Roulette — they all start from the same raw uniform float. The game result is just a deterministic transformation of that float. If the source bytes are uniformly distributed, the RNG is producing genuinely random output — a necessary condition for game fairness.

We test the source. Not the transformation.

Why? Because testing game results introduces noise from the transformation function itself. A crash multiplier distribution should look skewed — that's by design. But the underlying bytes should be perfectly uniform. Testing at the byte level is cleaner, more powerful, and catches manipulation that game-level tests would miss.

Scope: What Statistical Testing Proves — and What It Doesn't

Transparency matters. Before diving into our test suites, here's exactly what our statistical analysis can and cannot prove.

What Our Tests Prove

The RNG output is statistically random. No detectable patterns, bias, or predictability in the output stream.
The observed distribution matches the expected distribution. If Dice claims uniform outcomes across 0–100, we verify that mathematically.
No evidence of outcome shaving. If a casino subtly reduces payouts by 0.5%, our sample sizes are large enough to detect it.
No drift or degradation over time. The RNG performs consistently across the full dataset, not just in short bursts.

What Our Tests Do Not Prove

We do not verify game logic implementation. A game could have a perfect RNG but a broken or manipulated payout formula. Game logic verification is a separate discipline.
We do not prove that client seeds are used in the calculation. A casino could accept your client seed and silently ignore it. Verifying this requires code-level or cryptographic auditing, not statistical testing.
We do not prove that server seeds aren't pre-selected. A casino could theoretically generate multiple server seeds and pick favorable ones. Our tests detect the statistical fingerprint of this if done at scale, but not single-instance manipulation.
We do not audit the seed commitment process itself. Whether the pre-committed hash was genuinely locked before your bet is a cryptographic verification question, not a statistical one.

Bottom line: Statistical testing and game logic verification are two fundamentally different protection layers. They are complementary — not competing, not redundant. We focus on the mathematical side: is the RNG output genuinely random and unbiased? For a complete picture of casino fairness, both approaches are needed. We are transparent about this because intellectual honesty is non-negotiable.

Why Statistical Testing Is a Separate Protection Layer

Game logic verification confirms that the code does what the specification says. Statistical testing confirms that the output is genuinely random. These are different questions with different answers — and a casino can fail one while passing the other.

Consider a concrete example: a casino implements its game logic perfectly — every payout formula is correct, every hash matches, every verifier confirms. But the server seeds were not generated randomly. They were selected from a pre-computed pool that slightly favors the house. The code is flawless. The game logic passes every review. But the outcomes are biased. Only statistical testing on large sample sizes catches this.

The reverse is also true: a perfect RNG does not prove correct game logic. A casino could feed genuinely random numbers into a broken payout formula. That is why we explicitly state our scope — and why both disciplines exist.

Manipulation scenarios that only statistical testing detects:

Seed pre-selection at scale. Generating thousands of seed candidates and keeping only those that produce house-favorable sequences. The code is correct — the input is rigged.
Outcome shaving. Subtly shifting the distribution by 0.3–0.5% — invisible to individual players, invisible to code review, but detectable across 100,000+ rounds through chi-square and distribution tests.
RNG degradation over time. A system that starts fair but develops patterns after millions of rounds — detectable through autocorrelation and spectral analysis, not through code inspection.
Nonce manipulation. Holding the nonce at zero or cycling through a small set of nonces to reuse favorable outcomes. The hash chain looks correct — but the statistical fingerprint reveals the repetition.

No code review catches these. No game logic reconstruction catches these. Only math on large datasets does. That is our lane — and we stay in it because doing one thing at government-grade depth is more valuable than doing everything at surface level.

Three Testing Frameworks, One Verdict

We don't rely on a single test or a single framework. We run two independent, complementary test suites on every dataset:

Framework 1: NIST SP 800-22 Rev. 1a — Complete Suite

The National Institute of Standards and Technology published SP 800-22 as the standard for evaluating random and pseudorandom number generators. It's what governments, military contractors, and financial institutions use to certify their cryptographic systems.

We run the complete battery — all 15 tests. Not a subset. Not a "simplified version." The full thing.

#	Test	NIST Section	What It Catches
1	Monobit (Frequency)	§ 2.1	Overall bias — are there more 0s than 1s in the bitstream?
2	Block Frequency	§ 2.2	Local bias — does the balance hold in smaller sub-sequences?
3	Runs Test	§ 2.3	Patterns in consecutive values — too many or too few streaks?
4	Longest Run of Ones	§ 2.4	Suspicious clustering — are the longest streaks within normal range?
5	Binary Matrix Rank	§ 2.5	Linear dependencies — hidden structure in the bit matrix?
6	DFT Spectral	§ 2.6	Periodic patterns — Fourier analysis reveals hidden cycles
7	Non-overlapping Template	§ 2.7	Specific bit patterns appearing too often or too rarely
8	Overlapping Template	§ 2.8	Same as above, but with overlapping pattern windows
9	Maurer's Universal	§ 2.9	Compressibility — can the output be compressed? (If yes: not random)
10	Linear Complexity	§ 2.10	Predictability — could a linear feedback shift register reproduce this?
11	Serial Test	§ 2.11	Pair and triplet uniformity — are bit combinations evenly distributed?
12	Approximate Entropy	§ 2.12	Entropy in overlapping patterns — is the output truly unpredictable?
13	Cumulative Sums	§ 2.13	Drift over time — does the output trend in one direction?
14	Random Excursions	§ 2.14	Cycle analysis — abnormal patterns in cumulative sum walks
15	Random Excursions Variant	§ 2.15	State visit frequency — does the random walk visit states evenly?

Each test produces a p-value. We use a significance level of α = 0.01 (99% confidence). A p-value below 0.01 means the output deviates from randomness more than chance alone would explain.

Additional Statistical Tests

Beyond NIST, we run four more tests from standard statistics — different mathematical lenses on the same data:

#	Test	What It Catches
16	Chi-Square Goodness of Fit	Are outcomes distributed as uniformly as they should be?
17	Kolmogorov-Smirnov	Does the empirical distribution match the theoretical one?
18	Serial Correlation (Lag-1)	Can you predict the next value from the previous one?
19	Runs Up/Down (Wald-Wolfowitz)	Are there suspicious trends — too many ups or downs in a row?

Framework 2: PractRand

NIST is the industry standard. PractRand is the industry nightmare.

Developed by Chris Doty-Humphrey, PractRand is widely regarded as the most demanding PRNG test suite in existence. Where NIST tests might pass a mediocre generator, PractRand will tear it apart.

PractRand works differently from NIST. It consumes a raw binary stream and runs progressively harder tests at increasing data volumes — from kilobytes to terabytes. It doesn't just check for bias. It hunts for subtle correlations, periodicities, and structural weaknesses that standard tests miss entirely.

If NIST is a medical check-up, PractRand is an autopsy. It finds things you didn't know were there.

We convert casino outcomes into raw binary streams and feed them directly into PractRand. A generator that passes both NIST and PractRand is, for all practical purposes, indistinguishable from true randomness.

Framework 3: TestU01 (BigCrush)

TestU01 is the academic gold standard, developed at the Université de Montréal. Its BigCrush battery runs 106 statistical tests over 3–4 hours — the most comprehensive single-run analysis of a random number generator that exists in peer-reviewed literature.

Where NIST gives you the government stamp and PractRand hunts for subtle structural flaws, BigCrush throws everything academia has developed over decades at your data. If a generator survives all three, there is no known statistical method that could distinguish it from true randomness.

Audit Tiers

Not every audit needs the same depth. We run two tiers:

Standard Audit (Every Report)

Every published audit report runs through our 25-test battery:

15 NIST SP 800-22 tests (complete suite)
4 additional statistical tests (Chi-Square, K-S, Serial Correlation, Runs Up/Down)
6 game-specific validation tests

This already exceeds what any competitor runs. It covers everything a well-implemented provably fair system should pass.

Deep Audit (On Request)

For casinos that want to prove they’re beyond reproach — or players who need absolute certainty — we go further:

PractRand — progressive binary stream analysis, from kilobytes to terabytes
TestU01 BigCrush — 106 academic-grade tests, 3–4 hour runtime

A Deep Audit is available on request. We run it when the stakes are high, the dataset is large, or someone challenges our findings. Three independent scientific frameworks, zero overlap in methodology, one verdict.

If NIST is the medical check-up, PractRand is the MRI, and BigCrush is the full autopsy. Most patients only need the check-up. But we have the operating room ready.

Game-Specific Validation

On top of the raw-float analysis, we run game-specific tests on the actual outcomes. These verify that the transformation from raw float to game result is implemented correctly — a casino could have a perfect RNG but a broken game formula.

#	Test	Game	What It Verifies
20	Crash Instant Rate (Stake)	Crash	~4.0% of rounds bust at 1.00x (matches Stake's house edge)
21	Crash Instant Rate (Roobet)	Crash	~5.95% of rounds bust at 1.00x (matches Roobet's house edge)
22	Crash Instant Rate (Bustabit)	Crash	~4.0% of rounds bust at 1.00x (matches Bustabit's house edge)
23	Coin Fairness	CoinFlip	50/50 split between heads and tails within expected variance
24	Roulette Distribution	Roulette	Chi-square across all 37 slots (0–36)
25	Dice Distribution	Dice	Uniform distribution across the 0–100 range

Total: 25 individual tests per audit — 15 NIST + 4 additional statistical + 6 game-specific. For Deep Audits, add PractRand and TestU01 BigCrush (106 additional tests) on top.

Show me another casino review site that runs even five of these.

Sample Sizes

We don't do spot checks. Our minimum sample size is 100,000 rounds per game. For major audits, we go to 250,000 or more. Our Bustabit audit analyzed 100 million rounds.

Why does sample size matter? Because small samples hide manipulation. A rigged coin that lands heads 52% of the time looks normal after 100 flips. After 100,000 flips, the bias screams. Statistical power increases with sample size — and we use enough data to detect deviations as small as 0.1%.

Data Integrity

Every audit report includes:

SHA-256 dataset hash — cryptographic proof that the data hasn't been altered after testing
Complete seed parameters — server seed, client seed, nonce range
Reproducibility instructions — step-by-step guide so anyone can regenerate our results
Raw data download — the actual outcomes as JSON, available for independent verification

We don't ask you to trust us. We give you the tools to verify us. That's the difference between an audit and an opinion.

What We Don't Do

Transparency means being honest about limitations too:

We don't verify game logic or code implementation. Our scope is statistical analysis of RNG output. Whether the game code correctly implements its published specification is a separate audit discipline that requires source code review or independent game reconstruction.
We can't test live server behavior in real-time. We audit historical data. A casino could theoretically behave differently for specific players or time periods. Statistical analysis catches systematic manipulation, not targeted single-round rigging.
We don't audit smart contracts. On-chain games with published Solidity code are a different beast. Our focus is HMAC-SHA256 based provably fair systems.
We don't guarantee future fairness. An audit is a snapshot. That's why we advocate for continuous monitoring and regular re-audits.
We don't test withdrawal speed or customer support. Our scope is mathematical fairness. For business practices, read why math alone isn't enough.

The Scoring System — RNG Audit Score

Each audit produces a FairPlay Score from 0 to 10:

Score	Rating	Meaning (RNG Analysis)
9.0–10.0	EXCELLENT	All tests passed. No statistical anomalies detected.
7.0–8.9	GOOD	Minor deviations within acceptable variance. No evidence of manipulation.
5.0–6.9	MARGINAL	Some tests show borderline results. Warrants closer monitoring.
3.0–4.9	CONCERNING	Multiple statistical anomalies. Expanded testing recommended.
0.0–2.9	FAILED	Systematic deviations detected. Data inconsistent with fair RNG.

Important: The FairPlay Score reflects RNG randomness and statistical integrity only. It does not cover game logic implementation, payout formula correctness, or the seed commitment process. A casino can score 10/10 on RNG quality and still have issues elsewhere. This score answers one question: is the random number generator producing genuinely unbiased output? For the full picture, RNG auditing and game logic verification are both needed — we do the math side.

The score is calculated from the pass/fail ratio across all applicable tests, weighted by severity. A failed NIST Monobit test (fundamental bias) weighs heavier than a marginal Runs test result.

Longitudinal RNG Monitoring — The Consistency Comparator

A single audit tells you one thing: the RNG was fair on that day. But what about next week? Next month? What if a casino passes our 25-test battery in June, then quietly changes something in July?

This is the gap that nobody in the industry addresses. Traditional provably fair verification — including our own standard audit — is a snapshot. It tells you the state of the RNG at one point in time. It cannot tell you whether things changed after the audit.

The Consistency Comparator solves this.

How It Works — In Plain English

Imagine you fill two jars with marbles from the same machine. If the machine is consistent, both jars should look roughly the same — similar colors, similar ratios, similar randomness. If you open the second jar and suddenly find twice as many red marbles, something changed.

That’s exactly what the Consistency Comparator does with casino data:

Phase A: We collect a large sample of game outcomes — say, 100,000 rounds of Crash. We run our full test battery. We save the results.

Phase B: Weeks or months later, we collect another 100,000 rounds from the same game. Same casino, same game type — but fresh data from a new anonymous account.

Then we compare.

The Comparator runs seven independent statistical tests between Phase A and Phase B. Each test looks at the data from a different angle — distribution shape, average values, spread, information content. If all seven agree the data looks consistent, the casino gets a high Consistency Score. If the tests detect a shift, the score drops — and we know something changed.

The Seven Comparison Tests

You don’t need a math degree to understand what each test checks:

1. Kolmogorov-Smirnov Test — Do the two samples follow the same overall pattern? Think of it as overlaying two graphs and checking if they match.

2. Chi-Square Homogeneity — If you sort outcomes into buckets (0-10%, 10-20%, etc.), do both samples fill the buckets the same way?

3. Jensen-Shannon Divergence — An information theory measure. How surprised would you be if you thought the data came from Phase A, but it actually came from Phase B? Low surprise = consistent.

4. Welch’s t-Test — Are the average outcomes the same? A rigged game that shaves 1% off high payouts will shift the average.

5. F-Test — Is the spread (variance) the same? A tighter or looser distribution over time signals a change in the RNG.

6. Cohen’s d — Even if a difference is statistically real, is it big enough to matter? This separates meaningful changes from noise.

7. Bhattacharyya Distance — A geometric measure of how much two distributions overlap. Perfect overlap = identical behavior.

The Consistency Score

All seven tests feed into a weighted Consistency Score from 0 to 10:

9–10: CONSISTENT — No detectable change. The RNG behaves the same as before.

8–9: STABLE — Minor statistical fluctuations within normal range.

6–8: DRIFTING — Some tests flag differences. Could be natural variance, could be early signs of a change. Warrants closer monitoring.

4–6: CONCERNING — Multiple tests detect shifts. Something likely changed in the RNG or game configuration.

0–4: DIVERGENT — The data looks fundamentally different. Strong evidence of RNG manipulation or replacement.

Detection Sensitivity

In our validation tests:

A 0.5% outcome shave — barely noticeable to any individual player — drops the Consistency Score from 9.4 to 5.6 (CONCERNING). A 2% manipulation drops it to 2.6 (DIVERGENT). With 100,000+ samples, there is nowhere to hide.

Why This Changes Everything

Traditional audits have a fundamental weakness: the casino knows it’s being tested. If you announce an inspection, you’ll find a clean kitchen. Our approach uses anonymous accounts — the casino doesn’t know which player is collecting data for a monitoring run.

This means:

A casino can’t “perform” for the auditor and then change behavior afterward. If the RNG drifts, degrades, or gets manipulated between monitoring periods — we catch it. Not because someone reported a problem. Not because we got lucky. Because the math doesn’t lie, and we’re always watching.

The Consistency Comparator is open source, like the rest of our tools: github.com/GuidoHam/provably-fair-audit.

Why One Audit Isn't Enough

A single audit proves fairness on one day. That's it.

Think about it from the casino's perspective. They know an auditor tested their Crash game in June. The report is public. They scored 10/10. Great marketing material. But what happens in July? August? December?

Nothing stops a casino from changing seed generation parameters, swapping RNG implementations, or introducing account-specific behavior after the audit is published. A one-time audit creates a perverse incentive: perform well for the test, then do whatever you want afterward.

This isn't paranoia. It's basic game theory. If the inspection only happens once, the cost of cheating after the inspection is zero.

Longitudinal monitoring removes that safety net. We audit the same casino multiple times over weeks and months, using anonymous accounts the casino can't identify. Each monitoring run is unannounced. The casino doesn't know when we're collecting data, which account is ours, or which game we're testing.

If anything changes between audits — a shifted distribution, a new pattern in the output, a subtle increase in house edge — the Consistency Comparator flags it. Not because a player complained. Because the math caught it.

One audit is a photograph. Longitudinal monitoring is a surveillance camera. Both have their place. But only one catches what happens when nobody's looking.

What You Get

Provably fair verification works on two levels. Both matter. Neither is sufficient alone.

Level 1: Individual Round Verification

Every player can verify their own bets. Export your seeds, paste them into our free verifier, and confirm that the casino calculated your result correctly. This proves cryptographic integrity — the outcome matches the committed hash.

This is necessary. But it only proves that the casino followed its own formula for that specific round. It tells you nothing about the quality of the randomness feeding into that formula, and nothing about whether the casino treats all players equally over thousands of rounds.

Level 2: Statistical Audit Over Time

We verify the big picture. 100,000+ rounds per game, analyzed against 25 statistical tests, repeated over time with the Consistency Comparator. This catches what individual verification cannot:

Subtle outcome shaving (0.3–0.5% bias invisible to any single player)
RNG degradation or replacement between audit periods
Account-specific treatment differences
Post-audit manipulation

Both Together

Individual verification confirms: my bet was calculated correctly.
Longitudinal statistical auditing confirms: the underlying randomness is genuine and hasn't changed.

One without the other leaves gaps. A player who only checks individual rounds can't detect systemic bias. An auditor who only tests once can't detect post-audit changes. The combination — self-verification plus ongoing independent statistical monitoring — is the most complete protection against manipulated seeds that currently exists in crypto gambling.

We're not claiming perfection. We've listed what our tests don't prove above. But within the domain of RNG statistical analysis, this is as thorough as it gets.

The Deal

FairPlay Audit is funded through affiliate partnerships with the casinos we audit. Here's how that works in practice:

You register at a casino through one of our links. You play normally. The casino pays us a commission on your activity. Standard affiliate model — every casino review site works this way.

The difference: when you sign up through FairPlay, you're playing at a casino under ongoing independent statistical review. Not a one-time check. Not a paid placement. Continuous, unannounced monitoring with published data.

What it costs you: Nothing. Zero. Your odds, payouts, and house edge are identical whether you sign up through us or directly. Affiliate tracking doesn't change game mechanics.

What you get: Access to the same casino you'd use anyway — but with a third party running 25+ statistical tests on 100,000+ rounds, publishing the results, and flagging any changes over time. For free.

What the casino gets: New players and a public, verifiable fairness certification that no marketing budget can replicate.

What we get: Commission that funds more audits, deeper testing, and broader casino coverage.

The incentive alignment is straightforward: we only earn if you play at casinos we've audited. We only maintain credibility if our audits are honest. A rigged audit would destroy the only asset we have. The business model enforces the integrity — not despite the affiliate structure, but because of it.

Full disclosure of all affiliate relationships is published in our Independence & Affiliate Disclosure.

How Verification Methods Compare

	Self-Verification	One-Time Audit	FairPlay Longitudinal Monitoring
What it checks	Single round, single bet	RNG quality at one point in time	RNG quality over weeks/months, repeated
Sample size	1 round	1,000–100,000 rounds	100,000+ rounds per monitoring period
Detects bias	No	Yes, at time of audit	Yes, including changes over time
Detects post-audit manipulation	No	No	Yes — unannounced repeat testing
Detects outcome shaving	No	Yes, if sample is large enough	Yes, with high sensitivity (0.5% detectable)
Detects account-specific rigging	No	Only for the tested account	Multiple anonymous accounts over time
Casino knows it's being tested	N/A	Usually yes	No — anonymous data collection
Data published	Player keeps their own	Varies	Always — SHA-256 hashed, downloadable
Cost to player	Free (DIY)	N/A (not player-facing)	Free
Reproducible	Yes	Sometimes	Always — open source tools, raw data

None of these methods alone is complete. Self-verification confirms individual rounds. One-time audits confirm RNG quality at a snapshot. Longitudinal monitoring confirms consistency over time. The strongest protection uses all three — and that's exactly what playing through FairPlay gives you.

Open Source Commitment

Repository: github.com/GuidoHam/provably-fair-audit — Community Edition with 25 statistical tests. MIT licensed. 116 clones in the first two weeks.

Our testing tools are published as open source on GitHub. You can read the code. You can run it yourself. You can file issues if you find a bug.

This isn't generosity — it's strategy. Open source means our methodology is under permanent peer review. If our tests are flawed, someone will find it. That pressure keeps us honest. And honestly? That's exactly how it should work.

Not every casino appreciates this level of scrutiny. We’ve documented the seven most common excuses casinos give when asked about independent audits — and why none of them hold up.

Challenge Us

If you think our methodology has a gap, our math is wrong, or our conclusions don't follow from the data — tell us. We publish a standing invitation to challenge any audit we've ever produced. Bring data, not opinions, and we'll respond in kind.

That's the whole point. We're not asking you to believe us. We're asking you to check.