Not yet "complete" region, it seems that new technology emerge each month related to this issue.

One of the algorithms “derived” from this is calculating [rank] and [select] for a given bit string.

For example;

Given bit string

0011 0000 0101 0111

rank(10) =3 … counts of “1” to 10th bit

select(3)=10 … the 3rd “1” is in 10th bit

What I find this interesting is that not only “search index” of calculating rank and select is succinct, but also, this calculation can be done in O(1) regardless of size for any given bit string.

For this calculation, there’s a need to count the number of “1”s in constant bits (for example for 64 bits). This problem (counting bits) is called by many, “the popcount problem” or “the population count”.

It seems that there are many ways to count bits. The fastest maybe is using SSSE3 POPCNT of new Intel processors.

Calculating in older CPU?

After surfing the web a bit, I found it amazing that most of the web pages say that using “bit shift” seems faster than using “predefined tables”

[predefined table]

unsigned char table[256];

void initialize_table()

{

for (int i = 0; i < 256; i++) {

table[i] = count8bit(i);

}

}

unsigned long long count64bit_table(unsigned long long x)

{

int n = 0;

n += table[x & 0xff];

n += table[(x >> 8) & 0xff];

n += table[(x >> 16) & 0xff];

n += table[(x >> 24) & 0xff];

n += table[(x >> 32) & 0xff];

n += table[(x >> 40) & 0xff];

n += table[(x >> 48) & 0xff];

n += table[(x >> 56) & 0xff];

return n;

}

[bit shift]

unsigned long long count64bit_shift(unsigned long long x)

{

x = ((x & 0xaaaaaaaaaaaaaaaaULL) >> 1)

+ (x & 0x5555555555555555ULL);

x = ((x & 0xccccccccccccccccULL) >> 2)

+ (x & 0x3333333333333333ULL);

x = ((x & 0xf0f0f0f0f0f0f0f0ULL) >> 4)

+ (x & 0x0f0f0f0f0f0f0f0fULL);

x = ((x & 0xff00ff00ff00ff00ULL) >> 8)

+ (x & 0x00ff00ff00ff00ffULL);

x = ((x & 0xffff0000ffff0000ULL) >> 16)

+ (x & 0x0000ffff0000ffffULL);

x = ((x & 0xffffffff00000000ULL) >> 32)

+ (x & 0x00000000ffffffffULL);

return x;

}

To be sure, I coded and ran it in my Pentium M machine (Windows XP).

I also wanted to know the difference between C and Java.

Looping the calculation for 10,000,000,000 times, and running 3 times.

The conclusion...

[gcc (without any optimization)]

Input hexadecimal string of 64bit:11ff11ff00ff00ff

Calculating value 1296775004637036799 : 0x11ff11ff00ff00ffULL. Loop:100000*100000

Calculating loop:0...

count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:300.752

count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:487.030

Calculating loop:1...

count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:299.691

count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:486.098

Calculating loop:2...

count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:301.694

count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:489.855

[gcc -O3]

Input hexadecimal string of 64bit:11ff11ff00ff00ff

Calculating value 1296775004637036799 : 0x11ff11ffff00ffULL. Loop:100000*100000

Calculating loop:0...

count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:5.858

count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:5.648

Calculating loop:1...

count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:5.658

count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:5.678

Calculating loop:2...

count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:5.679

count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:5.628

[java (client JVM, Sun 1.5.0_11)]

Input hexadecimal string of 64bit:

11ff11ff00ff00ff

Calculating value 1296775004637036799 : 0x11ff11ff00ff00ffULL. Loop:100000*100000

Calculating loop:0...

count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:433.834

count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:365.485

Calculating loop:1...

count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:435.066

count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:366.146

Calculating loop:2...

count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:433.744

count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:365.856

[java -server]

Input hexadecimal string of 64bit:

11ff11ff00ff00ff

Calculating value 1296775004637036799 : 0x11ff11ff00ff00ffULL. Loop:100000*100000

Calculating loop:0...

count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:275.627

count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:42.952

Calculating loop:1...

count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:121.685

count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:43.142

Calculating loop:2...

count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:121.194

count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:42.651

Quite interesting results!

And yes, Sun java library Integer.countBits() and Long.countBits() are using bit shift algorithms.