Riemann Hypothesis: A Simple Example

In a recent blog, I suggested that in principle the Erdős–Kac Theorem should have a complementary application with respect to the distribution of prime numbers where the normal (Gaussian) distribution can be used to explain behaviour. To demonstrate this important aspect, we take repeated samples (of same size) within a relatively restricted region of the number system, where changes in the average gap as between primes is so small as to be discounted.

So starting with the n = 1,000,000,000 I took 100 samples of size t = 1000 up to n = 1,000,100,000.

For convenience in identifying the number of primes in each sample, I took them in strict sequence with the accompanying values listed below. As the primes are themselves distributed in a random fashion this would seem permissible in this instance.
Alternatively - at least in theory - 100 repeated random samples of 1000 (with replacement) could be taken within the same range i.e. 100,000,000 to 100,100,000. However in practice this would be more difficult.

So the first row for example represents the 10 samples in sequence from 100,000,000 to 100,010,000 with the last row, for example representing the final 10 sample values from 100,090,000 to 100,100,000.

54	56	57	55	57	61	57	56	47	51
61	55	57	49	43	54	56	43	58	54
51	56	51	49	43	54	56	43	58	54
60	56	55	57	54	52	59	56	51	56
49	43	59	64	55	63	62	53	49	51
63	54	40	54	51	56	52	54	42	57
53	73	55	50	54	53	61	49	52	56
54	59	44	57	50	56	56	53	52	54
57	56	52	54	63	43	54	52	51	48
49	55	57	52	54	52	56	48	56	66

Now in general terms, the mean number of primes in each sample can be approximated as t/log n

= 1000/18.42 = 54.29 (approx)

This equates well with actual value averaged over the 100 sample results above = 54.11.

Much more problematic however is the provision of a general formula to approximate the standard deviation (for all values of n).

Though I experienced doubts on several occasions with respect to my initial "hunch", repeated empirical testing seems to suggest it as perhaps the simplest and best estimate,

i.e. √{t/(2log n)}

This would give the standard deviation as 5.21 and compares well enough with the estimated standard deviation (based on the 100 sample values) = 5.46. This does not of course constitute a proof, and indeed a much greater degree of sampling would be required to truly establish it as the most likely estimate.

However in principle by now using the normal distribution, we could estimate the probabilities associated for example of sample values lying within any prescribed distance from the mean (on both sides).

For example we would expect for the above a little in excess of 2/3 of sample values to lie within 1 standard deviation of the mean value.

This would suggest therefore the probability that 2/3 of sample values (for frequency of primes occurring) would lie in the range of 49 - 59 (approx).

Addendum (5/3/2016). Having returned to this issue in recent days, I feel I can bring more clarity to a situation that I did not really feel had been properly dealt with, first time around.

Though I was hoping that the standard deviation would correspond to √{t/(log n)}, I was led - largely through the empirical evidence of a small sample - to adjust it somewhat to fit the data.

However on reflection this was not warranted. Even just a few stray "outliers" with respect to this data would have a key large influence on the standard deviation. Therefore it was unrealistic to expect that the empirical example would fit in with theoretical explanations.

The Erdős–Kac Theorem states that if ω(n) is the number of (distinct) prime factors of n, the probability distribution of

\frac{\omega(n) - \log\log n}{\sqrt{\log\log n}}

is the normal distribution.

In like manner I am suggesting that if ω(n) is the number of primes in a sample of size t. (i.e. where samples are taken in the region of n) the probabilility distribution of

ω(n) - (t/log t)

√{(t/logt)}

is the normal distribution.

The Erdős–Kac Theorem would suggest that where the average number of (distinct) primes is approximately 100 (with standard deviation 10), then we would expect just a little more than 2/3 of all factors to lie within 1 standard deviation of 100 either side of the mean (i.e. between 90 and 110).

In like manner if the average mean value of the number of primes in each sample is 100 (where samples are taken in the region of n) then we would likewise expect again that in a little more than 2/3 of samples, the number of primes would lie within 1 standard deviation (i.e. 10) of 100 (i.e. between 90 and 110).

Riemann Hypothesis

Tuesday, November 18, 2014

A Simple Example

No comments:

Post a Comment

54	56	57	55	57	61	57	56	47	51
61	55	57	49	43	54	56	43	58	54
51	56	51	49	43	54	56	43	58	54
60	56	55	57	54	52	59	56	51	56
49	43	59	64	55	63	62	53	49	51
63	54	40	54	51	56	52	54	42	57
53	73	55	50	54	53	61	49	52	56
54	59	44	57	50	56	56	53	52	54
57	56	52	54	63	43	54	52	51	48
49	55	57	52	54	52	56	48	56	66

54	56	57	55	57	61	57	56	47	51
61	55	57	49	43	54	56	43	58	54
51	56	51	49	43	54	56	43	58	54
60	56	55	57	54	52	59	56	51	56
49	43	59	64	55	63	62	53	49	51
63	54	40	54	51	56	52	54	42	57
53	73	55	50	54	53	61	49	52	56
54	59	44	57	50	56	56	53	52	54
57	56	52	54	63	43	54	52	51	48
49	55	57	52	54	52	56	48	56	66

54	56	57	55	57	61	57	56	47	51
61	55	57	49	43	54	56	43	58	54
51	56	51	49	43	54	56	43	58	54
60	56	55	57	54	52	59	56	51	56
49	43	59	64	55	63	62	53	49	51
63	54	40	54	51	56	52	54	42	57
53	73	55	50	54	53	61	49	52	56
54	59	44	57	50	56	56	53	52	54
57	56	52	54	63	43	54	52	51	48
49	55	57	52	54	52	56	48	56	66