01 | import heapq |
02 | import random |
03 |
04 | class TopkHeap( object ): |
05 | def __init__( self , k): |
06 | self .k = k |
07 | self .data = [] |
08 |
09 | def Push( self , elem): |
10 | if len ( self .data) < self .k: |
11 | heapq.heappush( self .data, elem) |
12 | else : |
13 | topk_small = self .data[ 0 ] |
14 | if elem > topk_small: |
15 | heapq.heapreplace( self .data, elem) |
16 |
17 | def TopK( self ): |
18 | return [x for x in reversed ([heapq.heappop( self .data) for x in xrange ( len ( self .data))])] |
19 |
20 | if __name__ = = "__main__" : |
21 | print "Hello" |
22 | list_rand = random.sample( xrange ( 1000000 ), 100 ) |
23 | th = TopkHeap( 3 ) |
24 | for i in list_rand: |
25 | th.Push(i) |
26 | print th.TopK() |
27 | print sorted (list_rand, reverse = True )[ 0 : 3 ] |
上面的用heapq就能轻松搞定。
变态的需求来了:给出N长的序列,求出BtmK小的元素,即使用大顶堆。
heapq在实现的时候,没有给出一个类似Java的Compartor函数接口或比较函数,开发者给出了原因见这里:http://code.activestate.com/lists/python-list/162387/
于是,人们想出了一些很NB的思路,见:http://stackoverflow.com/questions/14189540/python-topn-max-heap-use-heapq-or-self-implement
我来概括一种最简单的:
将push(e)改为push(-e)、pop(e)改为-pop(e)。
也就是说,在存入堆、从堆中取出的时候,都用相反数,而其他逻辑与TopK完全相同,看代码:
01 | class BtmkHeap( object ): |
02 | def __init__( self , k): |
03 | self .k = k |
04 | self .data = [] |
05 |
06 | def Push( self , elem): |
07 | # Reverse elem to convert to max-heap |
08 | elem = - elem |
09 | # Using heap algorighem |
10 | if len ( self .data) < self .k: |
11 | heapq.heappush( self .data, elem) |
12 | else : |
13 | topk_small = self .data[ 0 ] |
14 | if elem > topk_small: |
15 | heapq.heapreplace( self .data, elem) |
16 |
17 | def BtmK( self ): |
18 | return sorted ([ - x for x in self .data]) |
经过测试,是完全没有问题的,这思路太Trick了……