xref: /xnu-10002.81.5/tools/entropy_health_test_bounds.py (revision 5e3eaea39dcf651e66cb99ba7d70e32cc4a99587)
1*5e3eaea3SApple OSS Distributions#!/usr/bin/env python3
2*5e3eaea3SApple OSS Distributions
3*5e3eaea3SApple OSS Distributionsfrom fractions import Fraction
4*5e3eaea3SApple OSS Distributionsfrom math import ceil
5*5e3eaea3SApple OSS Distributionsfrom math import comb
6*5e3eaea3SApple OSS Distributions
7*5e3eaea3SApple OSS Distributions
8*5e3eaea3SApple OSS Distributions# The inverse of 2, i.e. 2^-1. To be used as a base in exponentiations
9*5e3eaea3SApple OSS Distributions# representing probabilities.
10*5e3eaea3SApple OSS DistributionsINV2 = Fraction(1, 2)
11*5e3eaea3SApple OSS Distributions
12*5e3eaea3SApple OSS Distributions
13*5e3eaea3SApple OSS Distributions# The probability of a false positive health test failure expressed as
14*5e3eaea3SApple OSS Distributions# the negative logarithm of the *actual* probability. In simpler
15*5e3eaea3SApple OSS Distributions# terms, the actual probability is:
16*5e3eaea3SApple OSS Distributions#
17*5e3eaea3SApple OSS Distributions# INV2 ** A
18*5e3eaea3SApple OSS Distributions#
19*5e3eaea3SApple OSS Distributions# It is simpler to keep this representation when computing the bound
20*5e3eaea3SApple OSS Distributions# of the Repetition Count Test (below).
21*5e3eaea3SApple OSS DistributionsA = 40
22*5e3eaea3SApple OSS Distributions
23*5e3eaea3SApple OSS Distributions
24*5e3eaea3SApple OSS Distributions# The estimated min-entropy per sample in bits. Min-entropy is the
25*5e3eaea3SApple OSS Distributions# negative logarithm of the probability of the *most likely* outcome.
26*5e3eaea3SApple OSS Distributions#
27*5e3eaea3SApple OSS Distributions# We consider this estimate to be conservative.
28*5e3eaea3SApple OSS DistributionsH = 1
29*5e3eaea3SApple OSS Distributions
30*5e3eaea3SApple OSS Distributions
31*5e3eaea3SApple OSS Distributions# The probability of the most likely outcome occurring in a given
32*5e3eaea3SApple OSS Distributions# sample. This derives from the definition of min-entropy (see above).
33*5e3eaea3SApple OSS DistributionsP = INV2 ** H
34*5e3eaea3SApple OSS Distributions
35*5e3eaea3SApple OSS Distributions
36*5e3eaea3SApple OSS Distributions# 4.4.1 Repetition Count Test
37*5e3eaea3SApple OSS Distributions#
38*5e3eaea3SApple OSS Distributions# The Repetition Count Test (RCT) detects catastrophic failures in the
39*5e3eaea3SApple OSS Distributions# noise source when it becomes "stuck" generating a single value over
40*5e3eaea3SApple OSS Distributions# many consecutive samples.
41*5e3eaea3SApple OSS Distributions#
42*5e3eaea3SApple OSS Distributions# The probability of generating C consecutive identical samples is:
43*5e3eaea3SApple OSS Distributions#
44*5e3eaea3SApple OSS Distributions# P^(C-1)
45*5e3eaea3SApple OSS Distributions#
46*5e3eaea3SApple OSS Distributions# Or equivalently:
47*5e3eaea3SApple OSS Distributions#
48*5e3eaea3SApple OSS Distributions# 2^(-H * (C-1))
49*5e3eaea3SApple OSS Distributions#
50*5e3eaea3SApple OSS Distributions# To keep this under our rate of acceptable false positives, we need
51*5e3eaea3SApple OSS Distributions# to satisfy this inequality:
52*5e3eaea3SApple OSS Distributions#
53*5e3eaea3SApple OSS Distributions# 2^-A >= 2^(-H * (C-1))
54*5e3eaea3SApple OSS Distributions#
55*5e3eaea3SApple OSS Distributions# Taking the logarithm of both sides, we have:
56*5e3eaea3SApple OSS Distributions#
57*5e3eaea3SApple OSS Distributions# -A >= -H * (C-1)
58*5e3eaea3SApple OSS Distributions#
59*5e3eaea3SApple OSS Distributions# Solving for C, we have:
60*5e3eaea3SApple OSS Distributions#
61*5e3eaea3SApple OSS Distributions# (A / H) + 1 >= C
62*5e3eaea3SApple OSS Distributionsdef repetition_count_bound():
63*5e3eaea3SApple OSS Distributions    return 1 + ceil(Fraction(A, H))
64*5e3eaea3SApple OSS Distributions
65*5e3eaea3SApple OSS Distributions
66*5e3eaea3SApple OSS Distributions# 4.4.2 Adaptive Proportion Test
67*5e3eaea3SApple OSS Distributions#
68*5e3eaea3SApple OSS Distributions# The Adaptive Proportion Test (APT) tries to detect more subtle noise
69*5e3eaea3SApple OSS Distributions# source failures causing certain values to occur with unexpected
70*5e3eaea3SApple OSS Distributions# frequency. It does this by taking a sample from the noise source and
71*5e3eaea3SApple OSS Distributions# counting how many times the same sample occurs within a fixed-size
72*5e3eaea3SApple OSS Distributions# window.
73*5e3eaea3SApple OSS Distributions
74*5e3eaea3SApple OSS Distributions
75*5e3eaea3SApple OSS Distributions# The size of the window for non-binary alphabets for the APT.
76*5e3eaea3SApple OSS DistributionsW = 512
77*5e3eaea3SApple OSS Distributions
78*5e3eaea3SApple OSS Distributions
79*5e3eaea3SApple OSS Distributions# The probability mass function measuring the probability of exactly k
80*5e3eaea3SApple OSS Distributions# occurrences of a given value within the observation window of size
81*5e3eaea3SApple OSS Distributions# W. We use the probability of the most likely event (as above).
82*5e3eaea3SApple OSS Distributions#
83*5e3eaea3SApple OSS Distributions# There are three terms:
84*5e3eaea3SApple OSS Distributions#
85*5e3eaea3SApple OSS Distributions# 1. The binomial coefficient of k, i.e. W-choose-k. Simply, how many
86*5e3eaea3SApple OSS Distributions# ways are there to get exactly k outcomes given W chances.
87*5e3eaea3SApple OSS Distributions#
88*5e3eaea3SApple OSS Distributions# 2. The probability of each of those k events occurring.
89*5e3eaea3SApple OSS Distributions#
90*5e3eaea3SApple OSS Distributions# 3. The probability that the other W-k events have some other
91*5e3eaea3SApple OSS Distributions# outcome.
92*5e3eaea3SApple OSS Distributionsdef pmf(k):
93*5e3eaea3SApple OSS Distributions    return comb(W, k) * P**k * (1 - P)**(W-k)
94*5e3eaea3SApple OSS Distributions
95*5e3eaea3SApple OSS Distributions
96*5e3eaea3SApple OSS Distributions# The sum of probabilties of all possible counts of occurrences is 1.
97*5e3eaea3SApple OSS Distributionsassert sum(map(pmf, range(W+1))) == 1
98*5e3eaea3SApple OSS Distributions
99*5e3eaea3SApple OSS Distributions
100*5e3eaea3SApple OSS Distributions# We want to find the minimal count of occurrences such that the
101*5e3eaea3SApple OSS Distributions# cumulative probability of seeing *at least* that count of
102*5e3eaea3SApple OSS Distributions# occurrences (but possibly more) is no more than our false
103*5e3eaea3SApple OSS Distributions# positive threshold.
104*5e3eaea3SApple OSS Distributionsdef adaptive_proportion_bound():
105*5e3eaea3SApple OSS Distributions    # The list of probabilities for each of the possible counts of
106*5e3eaea3SApple OSS Distributions    # occurrences.
107*5e3eaea3SApple OSS Distributions    probs = [pmf(x) for x in range(W+1)]
108*5e3eaea3SApple OSS Distributions
109*5e3eaea3SApple OSS Distributions    # The list of cumulative distributions for each of the possible
110*5e3eaea3SApple OSS Distributions    # counts of occurrences.
111*5e3eaea3SApple OSS Distributions    #
112*5e3eaea3SApple OSS Distributions    # Whereas probs is a list of probabilities of *exactly* k
113*5e3eaea3SApple OSS Distributions    # occurrences, this is a list of probabilities of *k or more*
114*5e3eaea3SApple OSS Distributions    # occurrences.
115*5e3eaea3SApple OSS Distributions    #
116*5e3eaea3SApple OSS Distributions    # These are just sums of probabilities across a range of counts.
117*5e3eaea3SApple OSS Distributions    dists = [sum(probs[x:]) for x in range(W+1)]
118*5e3eaea3SApple OSS Distributions
119*5e3eaea3SApple OSS Distributions    # Because we have constructed dists as an ordered list of
120*5e3eaea3SApple OSS Distributions    # cumulative probabilities, we can simply return the index of the
121*5e3eaea3SApple OSS Distributions    # first value that is below our threshold.
122*5e3eaea3SApple OSS Distributions    for i, d in enumerate(dists):
123*5e3eaea3SApple OSS Distributions        if d <= INV2**A:
124*5e3eaea3SApple OSS Distributions            return i
125*5e3eaea3SApple OSS Distributions
126*5e3eaea3SApple OSS Distributions
127*5e3eaea3SApple OSS Distributionsdef main():
128*5e3eaea3SApple OSS Distributions    print('Estimated min-entropy:', H)
129*5e3eaea3SApple OSS Distributions    print('False positive rate: 2^-{}'.format(A))
130*5e3eaea3SApple OSS Distributions    print('Repetition Count Test bound:', repetition_count_bound())
131*5e3eaea3SApple OSS Distributions    print('Adaptive Proportion Test bound:', adaptive_proportion_bound())
132*5e3eaea3SApple OSS Distributions
133*5e3eaea3SApple OSS Distributions
134*5e3eaea3SApple OSS Distributionsif __name__ == '__main__':
135*5e3eaea3SApple OSS Distributions    main()
136