18. Pandas的DataFrame的算术运算

DataFrame类型和Series类型一样也支持算法运算,但DataFrame是二维的,DataFrame的算术运算的两个数据可以都是DataFrame,也可有一个是数值scalar。

1).如果其中一个是数值,那么这个数值会和DataFrame的每个位置上的数据进行相应的运算。

import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
print df * 2
print df + 2

2).参与运算的如果是两个DataFrame,有可能所有的行、列是一致的,那么运算时对应行列的位置进行相应的算术运算,若行列没有对齐,那么填值NaN。

import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df.iloc[:,1:3]
print s2
print df + s2

结果为:

          a         b         c         d
0  0.483973  0.645901 -1.035946  0.195398
1 -0.008440 -0.433560 -1.179151  0.840267
2  0.399064  0.621388 -1.935247 -0.064402
3  1.096569  0.739594 -0.795671 -1.431564
4  0.169745 -0.713899  1.513129 -0.977025
          b         c
0  0.645901 -1.035946
1 -0.433560 -1.179151
2  0.621388 -1.935247
3  0.739594 -0.795671
4 -0.713899  1.513129
    a         b         c   d
0 NaN  1.291802 -2.071893 NaN
1 NaN -0.867120 -2.358301 NaN
2 NaN  1.242777 -3.870493 NaN
3 NaN  1.479188 -1.591342 NaN
4 NaN -1.427797  3.026257 NaN

3). 如果参与运算的一个是DataFrame,另一个是Series,那么pandas会对Series进行行方向的广播,然后做相应的运算。

import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df.iloc[0]
print s2
print df + s2

执行结果:

          a         b         c         d
0 -1.238851 -2.682975  1.127531 -1.205118
1 -0.164544 -0.811380  1.418037  0.356827
2  0.322918 -0.818707  0.428460 -1.142152
3 -0.205018  1.837780 -0.353513  1.731527
4  1.395693  0.377382  0.746702  0.757560
a   -1.238851
b   -2.682975
c    1.127531
d   -1.205118
Name: 0, dtype: float64
          a         b         c         d
0 -2.477701 -5.365949  2.255061 -2.410237
1 -1.403394 -3.494354  2.545568 -0.848292
2 -0.915933 -3.501682  1.555990 -2.347271
3 -1.443869 -0.845195  0.774018  0.526409
4  0.156842 -2.305593  1.874232 -0.447559

4). 参与运算的两个DataFrame并非完全一样,即行列个数和行列名有可能都不同,那么有对应上的就做运算,无填充NaN。

import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df[1:4][["b", "d"]]
print s2
print df - s2

程序结果:

          a         b         c         d
0 -0.642915 -0.607192 -0.297931  0.732260
1  0.797971  0.366959  0.017239 -0.448221
2 -0.061617  1.880258  0.351112  0.600822
3 -0.398104 -1.161508 -2.210417 -0.127446
4  0.485083  0.279539  1.316857  0.052885
          b         d
1  0.366959 -0.448221
2  1.880258  0.600822
3 -1.161508 -0.127446
    a   b   c   d
0 NaN NaN NaN NaN
1 NaN   0 NaN   0
2 NaN   0 NaN   0
3 NaN   0 NaN   0
4 NaN NaN NaN NaN

5). 列方向也有相应的计算处理方式。如果是列方向的运算,一个是dataFrame,另一个是Series,首先将Series沿列方向广播,然后运算。

import pandas as pd
import numpy as np
val = np.random.randn(5, 4)
idx = list("abcd")
df = pd.DataFrame(val, columns = idx)
print df
s2 = df['a']
print s2
print df.sub(s2, axis = 0)

执行结果:

          a         b         c         d
0  2.110223  0.470813  0.671169 -1.005801
1 -0.566596  0.507211  0.639038  0.140981
2 -0.447541  0.467905 -0.877711 -1.020221
3  1.068080  0.866918 -0.284191 -0.888743
4  1.033273 -1.125950  0.537627 -0.803254
0    2.110223
1   -0.566596
2   -0.447541
3    1.068080
4    1.033273
Name: a, dtype: float64
   a         b         c         d
0  0 -1.639410 -1.439054 -3.116024
1  0  1.073806  1.205634  0.707577
2  0  0.915446 -0.430171 -0.572681
3  0 -0.201162 -1.352270 -1.956822
4  0 -2.159223 -0.495646 -1.836526