Sort a given a huge array of bounded, non-duplicate, positive integers. Extend the problem to include negative integers. What are the different ways to reduce memory complexity of the problem.
Notes for clarity: The reason we are calling the array bounded is for the reason of allocation of memory in the first example. Second, ignoring negative integers is only to simplify the first code sample for understanding.
- Fixed bug in boolean array size as pointed by ploggingdev - thanks!
- Added notes on similarity with BucketSort
- Updated post title to indicate the domain to be bounded
- Thanks to edi9999 for pointing
O(N)
constraint during thegetNextSetBit()
operation. Refer to issue #4 for more details.
Most of the interview candidates that I have talked to about this problem come up
with the quick answer as MergeSort
or divide-and-conquer. The cost of sorting
being O(N * Log(N))
- which in this case is not the fastest sorting time.
The fastest time to sort an integer array is O(X)
, where X
is the maximum value
of integer in the array. For cases where N approaches X
, that is, the array is
so huge that the number of elements in the array equal the difference in min
and max
values, the complexity approaches O(N)
Let me explain how.
- Construct a boolean array of length N
- For every integer
n
in array, mark the boolean at indexn
in array astrue
- Once the array iteration is complete, just iterate over the boolean array again
to print all the indices that have the value set as
true
public void sortBoundedIntegers(int[] array) {
/// the regular checks first
if(array == null) {
return;
}
if(array.length == 0) {
return;
}
// build the boolean result array
// Integer.MAX_INT makes sure that all integer values can be accommodated
// this is very memory in-efficient and thus one should ideally use
// a sparse bit-array for space savings
boolean[] result = new boolean[Integer.MAX_INT];
// iterate
for(int index = 0; index < array.length; index++) {
int num = array[index];
result[num] = true;
}
// print the result
for(int index = 0; index < result.length; index++) {
if(result[index]) {
System.out.println(index);
}
}
}
Note: The above solution using a boolean array is not the optimized solution
and is meant only for clarity of algorithm. The ideal solution would be to use a
sparse bit-array
. Please go over Issue #4
for some discussion.
-
A
boolean
occupies one-byte of memory. Thus, switching to a bit-vector (also called as bit-array) will reduce the memory consumption by a factor of8
. Check code sample 2. -
In case the integers are also negative, another bit-array can be used to check for negatives, and then both iterated one-after-another to produce result. Check code sample 2.
-
To further reduce the memory consumption, one can make use of sparsed-bit-arrays. This can lead to huge drop in memory consumption if the integers are spaced apart a lot. Check code sample 2. Refer to brettwooldridge/SparseBitSet for one such implementation.
-
In case the integer array contains duplicates, use a small
short
array than theboolean
array to hold the number of times an integer has been seen, thus still sorting inO(X)
time. -
If we use sparse bit-arrays on the above code example, then the part of the problem becomes similar to BucketSort where the buckets/bins are created to reduce memory space of the problem. However, the individual bin-sorting is based on bits in the bit-array.
public void sortBoundedIntegers(int[] array) {
// run regular checks
final int len = array.length;
// considering SparsedBitSet is an implementation available
BitSet negative = new SparsedBitSet();
BitSet positive = new SparsedBitSet();
// sort
for(int index = 0; index < len; index++) {
int num = array[index];
if(num < 0) {
num = 0 - num; // convert to positive
negative.setBit(num, true);
} else {
positive.setBit(num, true);
}
}
// output
int index = -1;
do {
index = negative.getNextSetBit(index);
if(index < 0) {
break;
}
System.out.println(0 - index);
} while(true);
index = -1;
do {
index = positive.getNextSetBit(index);
if(index < 0) {
break;
}
System.out.println(index);
} while(true);
}
- Bit-Arrays
- Hashing
- Bloom Filters
- BucketSort
- Python implementation of Countingsort, Bucketsort and Radixsort
- Integer Sorting in 0(n sqrt (log log n)) Expected Time and Linear Space, Yijie Han and Mikkel Thorup, FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science Pages 135-144