NumPy之四：高级索引和索引技巧

xiaoxiao2021-12-14 18

使用索引数组进行索引使用布尔值数组进行索引ix_函数

1. 使用索引数组进行索引

>>> a = np.arange(12)**2 # the first 12 square numbers >>> i = np.array( [ 1,1,3,8,5 ] ) # an array of indices >>> a[i] # the elements of a at the positions i array([ 1, 1, 9, 64, 25]) >>> >>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) # a bidimensional array of indices >>> a[j] # 生成的数组形状和j一样 array([[ 9, 16], [81, 49]])

如果被索引的数组a是多维的，那么索引数组将引用数组a的第一维。

>>> palette = np.array( [ [0,0,0], # black ... [255,0,0], # red ... [0,255,0], # green ... [0,0,255], # blue ... [255,255,255] ] ) # white >>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to a color in the palette ... [ 0, 3, 4, 0 ] ] ) >>> palette[image] # the (2,4,3) color image array([[[ 0, 0, 0], [255, 0, 0], [ 0, 255, 0], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 255], [255, 255, 255], [ 0, 0, 0]]])

也可以给出多于1维的索引。针对每个维的索引数组必须形状相同。

>>> a = np.arange(12).reshape(3,4) >>> a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> i = np.array( [ [0,1], # 第一个轴的索引 ... [1,2] ] ) >>> j = np.array( [ [2,1], # 第二个轴的索引 ... [3,3] ] ) >>> >>> a[i,j] # i 和 j形状必须相同 array([[ 2, 5], [ 7, 11]])

解释一下这个过程：a[i,j]的机制是数组i和数组j相同位置的对应数字两两组成一对索引，然后用这对索引在数组a中进行取值。比如数组i的索引(0,0)处的值为0，数组j的索引(0,0)处的值为2，它们组成的索引对是(0,2)，在数组a中对应的值是2。在这样的机制下，理所当然要求数组i和数组j需要有相同的形状，否则将无法取得相应的索引对。又因为数组i和数组j分别是数组a的两个轴(axis)上的索引数组，所以最终的结果也就和数组i/j的形状相同。

>>> a[i,2] array([[ 2, 6], [ 6, 10]])

上面的过程是：数组i是数组a第一个轴的索引数组，a[i,2]中的数字2表示数组a的第二个轴的索引，数组i中的每个数字都与2组成索引对，也就是([ [(0,2), (1,2)], [(1,2),(2,2)] ])，然后依据这些索引对和相应的形状取数组a中的值。

>>> a[:,j] array([[[ 2, 1], [ 3, 3]], [[ 6, 5], [ 7, 7]], [[10, 9], [11, 11]]])

上面的过程是：对数组a第一个轴进行完整切片，得到(0,1,2)，然后每个值都与数组j中的元素两两组成索引对，也就是组成3个二维索引对，然后根据索引对取数组a中的值。

自然，我们也可以将i和j放入一个序列(比如一个列表)中，然后用这个序列进行索引。

>>> l = [i,j] >>> a[l] # 等价于 a[i,j] array([[ 2, 5], [ 7, 11]])

但是，我们不能把i和j组成大数组后再去对数组a进行索引，因为根据前面的内容，我们知道，用1个索引数组对另一个数据索引时，索引数组中的元素都被解释成数组a第一维的索引。

>>> s = np.array( [i,j] ) >>> s array([[[0, 1], [1, 2]], [[2, 1], [3, 3]]]) >>> s.shape (2, 2, 2) >>> a[s] Traceback (most recent call last): File "<stdin>", line 1, in ? IndexError: index (3) out of range (0<=index<=2) in dimension 0

上面的错误信息很明确地指出：数组a的第一维索引最大为2，而数组s中出现了3，超出了索引范围。也就是说，出错的根本原因是索引超出了范围，而不是a[s]语法本身有问题。可以自己试验来验证。

>>> a[tuple(s)] # 等价于a[i,j] array([[ 2, 5], [ 7, 11]])

可以利用数组索引对数组赋值。

>>> a = np.arange(5) >>> a array([0, 1, 2, 3, 4]) >>> a[[1,3,4]] = 0 >>> a array([0, 0, 2, 0, 0])

但是，如果索引列表有重复值，赋值的话也会多次赋值，以最后一次赋值为准：

>>> a = np.arange(5) >>> a[[0,0,2]]=[1,2,3] >>> a array([2, 1, 3, 3, 4])

看起来很合理，但要小心，如果你想要使用Python的+=运算，结果可能大出所料：

>>> a = np.arange(5) >>> a[[0,0,2]]+=1 >>> a array([1, 1, 3, 3, 4])

尽管索引列表中0出现了2次，0号元素却只增加了1。

2. 使用布尔值数组进行索引

使用布尔索引最自然的方式是布尔值数组与原数组有相同的形状:

>>> a = np.arange(12).reshape(3,4) >>> b = a > 4 >>> b # b is a boolean with a's shape array([[False, False, False, False], [False, True, True, True], [ True, True, True, True]], dtype=bool) >>> a[b] # 选中的元素组成一维数组 array([ 5, 6, 7, 8, 9, 10, 11])

这个性质很适合用来给元素重新赋值：

>>> a[b] = 0 # All elements of 'a' higher than 4 become 0 >>> a array([[0, 1, 2, 3], [4, 0, 0, 0], [0, 0, 0, 0]])

使用布尔索引的第二种方式比较类似于整数索引；对数组每一维，我们提供一维的布尔数组来选择我们想要的值。

>>> a = np.arange(12).reshape(3,4) >>> b1 = np.array([False,True,True]) # first dim selection >>> b2 = np.array([True,False,True,False]) # second dim selection >>> >>> a[b1,:] # 选择第2、3行的所有列 array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[b1] # same thing array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[:,b2] # selecting columns array([[ 0, 2], [ 4, 6], [ 8, 10]]) >>> >>> a[b1,b2] # 奇怪的事发生了 array([ 4, 10])

**注意一维布尔数组的长度必须和你要切片的维(或axis)的长度相同。在上面的例子中，b1是长度为3的一维数组，b2是长度为4，适合索引数组a的第二维。

3. ix_()函数

ix_ 函数可以合并不同的向量来获得各个n元组的结果。举个例子，如果你想要计算三个向量两两组合的结果a+b*c，也就是说要计算 ∑i=0(ai+∏j=0,k=0bj∗ck) ，在下面的例子中，a,b,c长度分别为4，3，5，这样算下来，最终的结果应该有60(4*3*5）个。数据量少的时候可以手工算，如果数据量大的话，ix_函数就排上用场了。

>>> a = np.array([2,3,4,5]) >>> b = np.array([8,5,4]) >>> c = np.array([5,4,6,8,3]) >>> ax,bx,cx = np.ix_(a,b,c) >>> ax array([[[2]], [[3]], [[4]], [[5]]]) >>> bx array([[[8], [5], [4]]]) >>> cx array([[[5, 4, 6, 8, 3]]]) >>> ax.shape, bx.shape, cx.shape ((4, 1, 1), (1, 3, 1), (1, 1, 5)) >>> result = ax+bx*cx >>> result array([[[42, 34, 50, 66, 26], [27, 22, 32, 42, 17], [22, 18, 26, 34, 14]], [[43, 35, 51, 67, 27], [28, 23, 33, 43, 18], [23, 19, 27, 35, 15]], [[44, 36, 52, 68, 28], [29, 24, 34, 44, 19], [24, 20, 28, 36, 16]], [[45, 37, 53, 69, 29], [30, 25, 35, 45, 20], [25, 21, 29, 37, 17]]]) >>> result[3,2,4] 17 >>> a[3]+b[2]*c[4] 17

显然，最后的结果数组result包含了所有可能的数值，且位置和原数组一一对应，比如a[2]+b[0]*c[4]正是result[2,0,4]。

还可以像下面一样来执行同样的功能：

>>> def ufunc_reduce(ufct, *vectors): ... vs = np.ix_(*vectors) ... r = ufct.identity ... for v in vs: ... r = ufct(r,v) ... return r and then use it as: >>> >>> ufunc_reduce(np.add,a,b,c) array([[[15, 14, 16, 18, 13], [12, 11, 13, 15, 10], [11, 10, 12, 14, 9]], [[16, 15, 17, 19, 14], [13, 12, 14, 16, 11], [12, 11, 13, 15, 10]], [[17, 16, 18, 20, 15], [14, 13, 15, 17, 12], [13, 12, 14, 16, 11]], [[18, 17, 19, 21, 16], [15, 14, 16, 18, 13], [14, 13, 15, 17, 12]]])

转载请注明原文地址: https://ju.6miu.com/read-965651.html

专利

最新回复(0)