12. Pandas的DataFrame的布尔选择

DataFrame的行除了用之前的loc、iloc等,这些都是基于index做的相应的操作,而布尔序列值来选择非index列的值作用范围比loc等要宽泛一些、用途较为广泛,即给出一个布尔的列表来选择对应的行。

  • 单列上的布尔选择
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
bs = df1["bx"] > 30
print df1[bs]

程序执行结果:

dataframe ***********
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   ax  bx  cx  dx  ex
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
  • 多列上布尔选择,布尔选择还可以进行逻辑上的组合
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
bs = (df1["bx"] > 30) & (df1["cx"] > 40)
print df1[bs]

程序执行结果:

dataframe ***********
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   ax  bx  cx  dx  ex
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59

布尔选择的结果还是DataFrame,所以对于结果可以进行切片、label、loc等访问。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
bs = df1["bx"] > 30
print df1[bs]
print df1[bs][["ax", "ex"]]
print df1[bs]["e": "h"]

程序的执行结果:

dataframe ***********
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   ax  bx  cx  dx  ex # print df1[bs]
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
   ax  ex # print df1[bs][["ax", "ex"]]
e  30  34
f  35  39
g  40  44
h  45  49
i  50  54
j  55  59
   ax  bx  cx  dx  ex # print df1[bs]["e": "h"]
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49