Wednesday, September 21, 2011

Prime Numbers (part 1) : Primality test

For those who don't know me, I 'm a big fan of prime numbers. I always like the problems related to it. Prime numbers always have interesting properties that used a lot in algorithmic problems. There will be a series of posts about prime numbers and here is the first.

Prime numbers are the numbers that are only divisible by 1 and itself.
The first ten primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29. There are infinite number of prime numbers.

Given a number n we need to check if  this number is a prime or not.
By definition the number is prime if it isn't divisible by any number other than 1 and itself, so what we need is to check the numbers from 2 to n-1 and if n is divisible by any number of them then n is not a  prime.
This algorithm runs in O(n) time which is not feasible for large number. It's obvious that any even number other than 2 is not a prime number and we need only to check for the odd divisors of n as in is always odd. Can we still do better ? . it's also obvious that n is not divisible by any number greater than n/2 so we can check only the numbers from 2 to n/2.

Actually it is enough to check till square root of n, why??
any composite number (non prime)  n  = k * n/k where k is a number less than or equal root n and n/k greater than or equal root2 n so for every divisor that is greater than root 2 there exist a corresponding one less than root n and this is the one we will check.
example :
16 = 1*16 = 2*8 = 4*4 = 16*1
so our algorithm now runs in O(sqrt(n)) and this is the best deterministic algorithm for primality test assuming no precomputed primes.
C++ Code :
bool is_prime(int n){
    if(n == 0 || n == 1) return false;
    if(n == 2) return true;
    if(n%2 == 0) return false;
    int to = (int)sqrt(n+1);
    for(int i = 3; i <= to ; i+=2){
        if(n%i ==0) return false;
    }
    return true;
}
Sieve of Eratosthenes :
what if we want to check the primality of n consecutive numbers starting from zero? we can call our function we constructed n times, it will run in n*root(n).
Sieve of Eratosthenes is the algorithm designed for solving this problem efficiently.This algorithm works as follow :
  1. Assume that all numbers from 0 to n are primes
  2. mark 0 and 1 as not prime
  3. mark every even number as not prime
  4. for every odd number in the list which is marked as prime, loop for all multiples of this number and mark them as non prime
Example :
after step 3:
numbers : 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10
is prime  : 0,  0,  1,  1,  0,  1,  0,  1,  0,  1,  0

iteration 1 :
3 is prime
numbers : 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10
is prime  : 0,  0,  1,  1,  0,  1,  0,  1,  0,  0,  0
iteration 2 :
5 is prime

numbers : 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10
is prime  : 0,  0,  1,  1,  0,  1,  0,  1,  0,  0,  0
for every prime number p we start from  p*p and mark as not prime and increment by 2*p.
we said before that we should mark all multiples of p as non prime so why we started from p*p not from 2*p?. the answer is 2*p is divisible by 2 so it must be checked before , 3*p is divisible by 3 so checked before and (p-1)*p is divisible by p-1 so the first number needs to be checked is p*p. we increment by 2*p as we only care about odd numbers.

C++ Code:


bool is_prime[1000]
void generate(int maxPrimes){

    for(int i = 0; i <= maxPrimes; i++){
        is_prime[i] = i%2;
    }
    is_prime[0] = is_prime[1] = 0;
    is_prime[2] = 1;
    for(int i = 3; i*i <= maxPrimes ;i+=2){
        if(is_prime[i]){
            for(int j = i*i ; j <= maxPrimes ; j+=2*i){
                is_prime[j] = 0;
            }
        }
    }
}
This algorithm runs in O(n lg(lgn))

Segmented Sieve :

This is a variation of sieve. The standard sieve generates the primes between 0 to n, this algorithm generates primes between lower and upper. One can say that we can easily solve this problem using the previous algorithm from 0 to upper, thats true but what if the upper is too large but upper - lower is sufficiently small. It's obvious now we need an algorithm that solves this problem efficiently. The algorithm works as follows :
  1. generate the primes from 0 to root(upper)
  2. create an array with length upper - lower + 1 where index 0 represents lower, 1 represents lower + 1 and so on and initialize it with all true.
  3. mark even numbers as non prime.
  4. loop over all generated primes and mark their multiples as non primes in the newly created array.
C++ Code :

 bool is_prime_large[100005];
void segmented_sieve(int min, int max){

    if(min < 2) min = 2;
    int maxPrimes = (int)sqrt(max+1);
    if(maxPrimes < 2) maxPrimes = 2;
    generate(maxPrimes);
    for(int i = 0 ; i < (max - min +1);i++){
        is_prime_large[i] = 1;
    }
    for(int i = min%2 ; i < (max - min + 1);i+=2){
        is_prime_large[i] = 0;
    }
    for(int i = min;i<= maxPrimes;i++){
        is_prime_large[i-min] = is_prime[i];
    }
    for(int i = 3;i<= maxPrimes;i++){
        if(!is_prime[i]) continue;
        int p = i;
        int start = p*p;
        if(start < min) start = min%p == 0?min : min + p - min %p;
        if(start %2 == 0) start += p;
        for(int j = start ; j <= max; j+=2*p ){
            is_prime_large[j - min] = false;
        }
    }
    for(int i = 0;i <(max - min + 1);i++){
        if( is_prime_large[i] ) printf("%d\n",(min+i));
    }
}

notes about the code :
when we find a prime number p we start from p*p as explained in standard sieve.
let start = p*p. if start is less than min so we need to get the first odd number greater than or equal to min that is divisible by p. This number is either the min itself or min + p - min%p. I think the rest of the code is clear.

Masked sieve

skip this section if you aren't familiar with bitwise operations.
Now what if we need to generate the prime numbers from 0 to n  but n is large. it is large enough to fits in time but not enough to fits in memory. Standard sieve takes O(n) memory, every number from 0 to n has 1 byte so we use n bytes.
Actually, each number needs only one bit to check if it is prime or not so we can optimize the memory from n bytes to n/8 bytes. we can do more better and optimize to n/16 bytes by saving only odd numbers.

C++ Code:

#define n 300000002

    unsigned char bits[n/16 + 1];
    inline bool isset(int x){
        x >>= 1;
        int ind = x >> 3;
        int sh = x & 7;
        return bits[ind] & (1 << sh);
    }

    inline void setbit(int x){
        x >>= 1;
        int ind = x >> 3;
        int sh = x & 7;
        bits[ind] |= (1 << sh);
    }
    void seive(){

        for (int i = 3; i*i < n; i+=2) {
            if(!isset(i)){
                for ( int j = i*i; j < n; j+=i+i) {
                        setbit(j);
                }
            }
        }
    }
    bool is_prime(int x){
        if(x == 0 || x == 1) return false;
        if(x == 2) return true;
        return !isset(x);
    }


Notes : This code assumes that 0 is prime and 1 is not prime. to get the index of number x, we divide it by 2 as we only save odd numbers(x>>1) ,to know its index in the array we divide it by 8 (x>>3) as each entry in the array holds 8 numbers, and to know its position in the entry we take the number modulo 8(taking the first 3 bits x&7).

at the end i hope i have covered this part will. I will wait for your questions, comments and suggestions.

Thanks,
Mohamed Gaber

2 comments: