25. Pandas的数据拼接-concat函数

在pandas里提供concat函数可以将形参给出的列表里的各个pandas的数据拼接成一个大的数据。

  • 两个Series的拼接
import pandas as pd
import numpy as np
s1 = pd.Series(np.arange(2,6))
s2 = pd.Series(np.arange(8,12))
ss = pd.concat([s1, s2])
print ss

程序的执行结果:

0     2
1     3
2     4
3     5
0     8
1     9
2    10
3    11
dtype: int64
  • 两个DataFrame的拼接 1). label和columns均相同的情况下:
import pandas as pd
import numpy as np
col = "hello the cruel world".split()
idx = ["a", "b", "c", "d"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx, columns = col)
print df1
df2 = pd.DataFrame(val2, index = idx, columns = col)
print df2
df12 = pd.concat([df1, df2])
print df12

程序的执行结果:

   hello  the  cruel  world # prinf df1
a      0    1      2      3
b      4    5      6      7
c      8    9     10     11
d     12   13     14     15
   hello  the  cruel  world # print df2
a     20   21     22     23
b     24   25     26     27
c     28   29     30     31
d     32   33     34     35
   hello  the  cruel  world # print df12
a      0    1      2      3
b      4    5      6      7
c      8    9     10     11
d     12   13     14     15
a     20   21     22     23
b     24   25     26     27
c     28   29     30     31
d     32   33     34     35

2). 对于DataFrame的拼接比较复杂,原因是label和columns有可能不是一一对应的,这个时候两DataFrame未匹配上的label或columns下的值为NaN。

import pandas as pd
import numpy as np
col1 = "hello the cruel world".split()
col2 = "hello the nice world".split()
idx1 = ["a", "b", "c", "d"]
idx2 = ["a", "b", "d", "e"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx1, columns = col1)
print df1
df2 = pd.DataFrame(val2, index = idx2, columns = col2)
print df2
df12 = pd.concat([df1, df2])
print df12

程序执行结果:

   hello  the  cruel  world # print df1
a      0    1      2      3
b      4    5      6      7
c      8    9     10     11
d     12   13     14     15
   hello  the  nice  world # print df2
a     20   21    22     23
b     24   25    26     27
d     28   29    30     31
e     32   33    34     35
   cruel  hello  nice  the  world # print df12
a      2      0   NaN    1      3
b      6      4   NaN    5      7
c     10      8   NaN    9     11
d     14     12   NaN   13     15
a    NaN     20    22   21     23
b    NaN     24    26   25     27
d    NaN     28    30   29     31
e    NaN     32    34   33     35
  • 指定拼接的轴,默认是列方向的拼接数据,可以指定concat 的形参axis为行上的拼接数据。
import pandas as pd
import numpy as np
col1 = "hello the cruel world".split()
col2 = "hello the nice world".split()
idx1 = ["a", "b", "c", "d"]
idx2 = ["a", "b", "d", "e"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx1, columns = col1)
print df1
df2 = pd.DataFrame(val2, index = idx2, columns = col2)
print df2
df12 = pd.concat([df1, df2], axis = 1)
print df12

程序的执行结果:

   hello  the  cruel  world # print df1
a      0    1      2      3
b      4    5      6      7
c      8    9     10     11
d     12   13     14     15
   hello  the  nice  world# print df2
a     20   21    22     23
b     24   25    26     27
d     28   29    30     31
e     32   33    34     35
   hello  the  cruel  world  hello  the  nice  world # print df12
a      0    1      2      3     20   21    22     23
b      4    5      6      7     24   25    26     27
c      8    9     10     11    NaN  NaN   NaN    NaN
d     12   13     14     15     28   29    30     31
e    NaN  NaN    NaN    NaN     32   33    34     35