Python利用数组进行数据处理
NumPy数组会用数组表达式代替循环,此法称为矢量化
矢量化数组运算(各种数值计算)要比等价的纯Python方式快1~2个数量级,
##计算一组值(网格型) 函数是sqrt(x^2 + y^2)
##np.meshgrid(译为:网格)函数接受两个一维数组,并产生两个二维矩阵
In [7]: import numpy as np
In [8]: points = np.arange(-5, 5, 0.01) ##100-个间隔相等的点
In [9]: xs, ys = np.meshgrid(points, points)
In [23]: import matplotlib.pyplot as plt
In [24]: z = np.sqrt(xs ** 2 + ys ** 2)
In [16]: a = np.array([1, 2, 3, 4 ,5])
In [17]: a
Out[17]: array([1, 2, 3, 4, 5])
In [19]: b = np.array([1, 2, 4, 3 ,12])
In [20]: xs, ys = np.meshgrid(a, b)
In [21]: xs
Out[21]:
array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
In [22]: ys
Out[22]:
array([[ 1, 1, 1, 1, 1],
[ 2, 2, 2, 2, 2],
[ 4, 4, 4, 4, 4],
[ 3, 3, 3, 3, 3],
[12, 12, 12, 12, 12]])
In [23]: import matplotlib.pyplot as plt
In [24]: z = np.sqrt(xs ** 2 + ys ** 2)
In [25]: z
Out[25]:
array([[ 1.41421356, 2.23606798, 3.16227766, 4.12310563, 5.09901951],
[ 2.23606798, 2.82842712, 3.60555128, 4.47213595, 5.38516481],
[ 4.12310563, 4.47213595, 5. , 5.65685425, 6.40312424],
[ 3.16227766, 3.60555128, 4.24264069, 5. , 5.83095189],
[12.04159458, 12.16552506, 12.36931688, 12.64911064, 13. ]])
##
In [31]: plt.imshow(z, cmap=plt.cm.gray);plt.colorbar
Out[31]: <function matplotlib.pyplot.colorbar>
In [32]: plt.title("Image plot of sqrt")
Out[32]: Text(0.5,1,'Image plot of sqrt')
##将条件逻辑表述为数组运算
numpy.where()函数是三元表达式 x if condition else y的矢量化
##cond 是布尔数组 xarr,yarr是两个值数组
In [16]: xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
In [17]: xarr
Out[17]: array([1.1, 1.2, 1.3, 1.4, 1.5])
In [19]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
In [20]: cond = np.array([True, False, True, True, False])
##根据cond中的值选取xarr和yarr的值:当cond的值为True,选取xarr的值,否之 yarr的值
In [19]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
In [20]: cond = np.array([True, False, True, True, False])
In [21]: result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
In [22]: result
Out[22]: [1.1, 2.2, 1.3, 1.4, 2.5]
##虽然这样的方式也可以实现,但是它对大数组的处理速度不是很快
##而且无法用于多维数组
##可用np.where,则可以将该功能写得非常简洁
In [23]: result = np.where(cond, xarr, yarr)
In [24]: result
Out[24]: array([1.1, 2.2, 1.3, 1.4, 2.5])
##np.where的第二个和第三个参数不必是数组,可以是标量值
##np.where通常用于根据另一个数组产生一个新的数组
##eg:由一个随机数据组成的矩阵,如果将正值替换为2,负值替换为-2.
In [25]: from numpy.random import randn
In [26]: arr = randn(4, 4)
In [27]: arr
Out[27]:
array([[-0.09549766, 0.56886408, -0.81207683, -0.40563281],
[ 1.20518735, 1.10153376, 1.80663697, -0.79473457],
[-0.38531348, 0.78867781, 0.49068315, 1.36815743],
[-1.1150682 , 1.73365055, -0.6661473 , 1.65001034]])
In [28]: np.where(arr > 0, 2, -2)
Out[28]:
array([[-2, 2, -2, -2],
[ 2, 2, 2, -2],
[-2, 2, 2, 2],
[-2, 2, -2, 2]])
In [29]: np.where(arr > 0, 2, arr) ##只将正值设置为2
Out[29]:
array([[-0.09549766, 2. , -0.81207683, -0.40563281],
[ 2. , 2. , 2. , -0.79473457],
[-0.38531348, 2. , 2. , 2. ],
[-1.1150682 , 2. , -0.6661473 , 2. ]])