How to find the smallest 1000 or largest 1000 items from a billion of items

This is one of the most important concept in data structure and frequently asked in interviews. The one solution to this problem is sorting the elements and returning it. But when we see the complexity, it won't be advisable to sort all the billion items.

For cracking this problem, we use a data structure called "min- heap" or "max-heap". For finding the largest 'n' elements, min- heap is used and for smallest 'n' elements, max-heap is used. In max-heap, keys of parent nodes are always greater than the children and the root node will be give the maximum value in that heap. Just the opposite in min-heap.

The algorithm to find the smallest 1000 elements are as follows: -

  • Create a max-heap of size 1000.
  • Put the first 1000 elements to max-heap.
  • For all the elements starting from 1001 to 'N', delete the maximum element from the max-heap and put the latter element.
  • Return the heap.
For finding the largest 1000 elements, use min-heap.

References:

Comments

Popular posts from this blog

Difference between "diff" and "sdiff" commands in Unix

Anonymous classes in C++