bisect --- 数组二分查找算法

源代码: Lib/bisect.py


这个模块对有序列表提供了支持,使得他们可以在插入新数据仍然保持有序。对于长列表,如果其包含元素的比较操作十分昂贵的话,这可以是对更常见方法的改进。这个模块叫做 bisect 因为其使用了基本的二分(bisection)算法。源代码也可以作为很棒的算法示例(边界判断也做好啦!)

定义了以下函数:

bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None)

a 中找到 x 合适的插入点以维持有序。参数 lohi 可以被用于确定需要考虑的子集;默认情况下整个列表都会被使用。如果 x 已经在 a 里存在,那么插入点会在已存在元素之前(也就是左边)。如果 a 是列表(list)的话,返回值是可以被放在 list.insert() 的第一个参数的。

The returned insertion point i partitions the array a into two halves so that all(val < x for val in a[lo : i]) for the left side and all(val >= x for val in a[i : hi]) for the right side.

key specifies a key function of one argument that is used to extract a comparison key from each input element. The default value is None (compare the elements directly).

在 3.10 版更改: Added the key parameter.

bisect.bisect_right(a, x, lo=0, hi=len(a), *, key=None)
bisect.bisect(a, x, lo=0, hi=len(a))

类似于 bisect_left(),但是返回的插入点是 a 中已存在元素 x 的右侧。

The returned insertion point i partitions the array a into two halves so that all(val <= x for val in a[lo : i]) for the left side and all(val > x for val in a[i : hi]) for the right side.

key specifies a key function of one argument that is used to extract a comparison key from each input element. The default value is None (compare the elements directly).

在 3.10 版更改: Added the key parameter.

bisect.insort_left(a, x, lo=0, hi=len(a), *, key=None)

Insert x in a in sorted order.

key specifies a key function of one argument that is used to extract a comparison key from each input element. The default value is None (compare the elements directly).

This function first runs bisect_left() to locate an insertion point. Next, it runs the insert() method on a to insert x at the appropriate position to maintain sort order.

Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.

在 3.10 版更改: Added the key parameter.

bisect.insort_right(a, x, lo=0, hi=len(a), *, key=None)
bisect.insort(a, x, lo=0, hi=len(a))

类似于 insort_left(),但是把 x 插入到 a 中已存在元素 x 的右侧。

key specifies a key function of one argument that is used to extract a comparison key from each input element. The default value is None (compare the elements directly).

This function first runs bisect_right() to locate an insertion point. Next, it runs the insert() method on a to insert x at the appropriate position to maintain sort order.

Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.

在 3.10 版更改: Added the key parameter.

Performance Notes

When writing time sensitive code using bisect() and insort(), keep these thoughts in mind:

  • Bisection is effective for searching ranges of values. For locating specific values, dictionaries are more performant.

  • The insort() functions are O(n) because the logarithmic search step is dominated by the linear time insertion step.

  • The search functions are stateless and discard key function results after they are used. Consequently, if the search functions are used in a loop, the key function may be called again and again on the same array elements. If the key function isn't fast, consider wrapping it with functools.cache() to avoid duplicate computations. Alternatively, consider searching an array of precomputed keys to locate the insertion point (as shown in the examples section below).

参见

  • Sorted Collections is a high performance module that uses bisect to managed sorted collections of data.

  • The SortedCollection recipe uses bisect to build a full-featured collection class with straight-forward search methods and support for a key-function. The keys are precomputed to save unnecessary calls to the key function during searches.

搜索有序列表

上面的 bisect() 函数对于找到插入点是有用的,但在一般的搜索任务中可能会有点尴尬。下面 5 个函数展示了如何将其转变成有序列表中的标准查找函数

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError

def find_lt(a, x):
    'Find rightmost value less than x'
    i = bisect_left(a, x)
    if i:
        return a[i-1]
    raise ValueError

def find_le(a, x):
    'Find rightmost value less than or equal to x'
    i = bisect_right(a, x)
    if i:
        return a[i-1]
    raise ValueError

def find_gt(a, x):
    'Find leftmost value greater than x'
    i = bisect_right(a, x)
    if i != len(a):
        return a[i]
    raise ValueError

def find_ge(a, x):
    'Find leftmost item greater than or equal to x'
    i = bisect_left(a, x)
    if i != len(a):
        return a[i]
    raise ValueError

Examples

函数 bisect() 还可以用于数字表查询。这个例子是使用 bisect() 从一个给定的考试成绩集合里,通过一个有序数字表,查出其对应的字母等级:90 分及以上是 'A',80 到 89 是 'B',以此类推

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

One technique to avoid repeated calls to a key function is to search a list of precomputed keys to find the index of a record:

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])       # Or use operator.itemgetter(1).
>>> keys = [r[1] for r in data]         # Precompute a list of keys.
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)