Tuesday, May 29, 2007

Addicted to Succinct

Since last year, I've been addicted to Succinct Data Structure.
Not yet "complete" region, it seems that new technology emerge each month related to this issue.

One of the algorithms “derived” from this is calculating [rank] and [select] for a given bit string.
For example;

Given bit string
0011 0000 0101 0111
rank(10) =3 … counts of “1” to 10th bit
select(3)=10 … the 3rd “1” is in 10th bit

What I find this interesting is that not only “search index” of calculating rank and select is succinct, but also, this calculation can be done in O(1) regardless of size for any given bit string.

For this calculation, there’s a need to count the number of “1”s in constant bits (for example for 64 bits). This problem (counting bits) is called by many, “the popcount problem” or “the population count”.

It seems that there are many ways to count bits. The fastest maybe is using SSSE3 POPCNT of new Intel processors.
Calculating in older CPU?
After surfing the web a bit, I found it amazing that most of the web pages say that using “bit shift” seems faster than using “predefined tables”

[predefined table]

unsigned char table[256];
void initialize_table()
{
for (int i = 0; i < 256; i++) {
table[i] = count8bit(i);
}
}

unsigned long long count64bit_table(unsigned long long x)
{
int n = 0;
n += table[x & 0xff];
n += table[(x >> 8) & 0xff];
n += table[(x >> 16) & 0xff];
n += table[(x >> 24) & 0xff];
n += table[(x >> 32) & 0xff];
n += table[(x >> 40) & 0xff];
n += table[(x >> 48) & 0xff];
n += table[(x >> 56) & 0xff];
return n;
}


[bit shift]

unsigned long long count64bit_shift(unsigned long long x)
{
x = ((x & 0xaaaaaaaaaaaaaaaaULL) >> 1)
+ (x &amp; 0x5555555555555555ULL);
x = ((x & 0xccccccccccccccccULL) >> 2)
+ (x & 0x3333333333333333ULL);
x = ((x & 0xf0f0f0f0f0f0f0f0ULL) >> 4)
+ (x & 0x0f0f0f0f0f0f0f0fULL);
x = ((x & 0xff00ff00ff00ff00ULL) >> 8)
+ (x & 0x00ff00ff00ff00ffULL);
x = ((x & 0xffff0000ffff0000ULL) >> 16)
+ (x & 0x0000ffff0000ffffULL);
x = ((x & 0xffffffff00000000ULL) >> 32)
+ (x & 0x00000000ffffffffULL);
return x;
}


To be sure, I coded and ran it in my Pentium M machine (Windows XP).
I also wanted to know the difference between C and Java.
Looping the calculation for 10,000,000,000 times, and running 3 times.

The conclusion...

[gcc (without any optimization)]

Input hexadecimal string of 64bit:11ff11ff00ff00ff
Calculating value 1296775004637036799 : 0x11ff11ff00ff00ffULL. Loop:100000*100000
Calculating loop:0...
count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:300.752
count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:487.030
Calculating loop:1...
count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:299.691
count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:486.098
Calculating loop:2...
count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:301.694
count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:489.855


[gcc -O3]

Input hexadecimal string of 64bit:11ff11ff00ff00ff
Calculating value 1296775004637036799 : 0x11ff11ffff00ffULL. Loop:100000*100000
Calculating loop:0...
count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:5.858
count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:5.648
Calculating loop:1...
count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:5.658
count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:5.678
Calculating loop:2...
count64bit_table(0x11ff11ff00ff00ffULL) Count:36 Time:5.679
count64bit_shift(0x11ff11ff00ff00ffULL) Count:36 Time:5.628


[java (client JVM, Sun 1.5.0_11)]

Input hexadecimal string of 64bit:
11ff11ff00ff00ff
Calculating value 1296775004637036799 : 0x11ff11ff00ff00ffULL. Loop:100000*100000
Calculating loop:0...
count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:433.834
count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:365.485
Calculating loop:1...
count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:435.066
count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:366.146
Calculating loop:2...
count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:433.744
count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:365.856


[java -server]
Input hexadecimal string of 64bit:
11ff11ff00ff00ff
Calculating value 1296775004637036799 : 0x11ff11ff00ff00ffULL. Loop:100000*100000
Calculating loop:0...
count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:275.627
count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:42.952
Calculating loop:1...
count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:121.685
count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:43.142
Calculating loop:2...
count64bit_table(0x11ff11ff ff00ffULL) Count:36 Time:121.194
count64bit_shift(0x11ff11ff ff00ffULL) Count:36 Time:42.651


Quite interesting results!
And yes, Sun java library Integer.countBits() and Long.countBits() are using bit shift algorithms.