9. Pandas的DataFrame属性

前一章介绍了如何将其他类型的数据转为pandas里的DataFrame,本章介绍一下dataframe的一些属性。为了更好的演示,可以先读一下前章节的iris.data文件到dataframe里,https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data 这个文件是csv格式的可以用read_csv函数读取。

import pandas as pd
fn = "iris.data"
cols_name = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class']
df = pd.read_csv(fn, names = cols_name)
print df

程序运行结果:

     sepal length  sepal width  petal length  petal width           class
0             5.1          3.5           1.4          0.2     Iris-setosa
1             4.9          3.0           1.4          0.2     Iris-setosa
2             4.7          3.2           1.3          0.2     Iris-setosa
3             4.6          3.1           1.5          0.2     Iris-setosa
4             5.0          3.6           1.4          0.2     Iris-setosa
5             5.4          3.9           1.7          0.4     Iris-setosa
6             4.6          3.4           1.4          0.3     Iris-setosa
.....

9.1 columns属性

columns属性可以获得dataframe有那些列,即dataframe的index。

import pandas as pd
fn = "iris.data"
cols_name = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class']
df = pd.read_csv(fn, names = cols_name)
print df[:3]
print df.columns

程序执行结果:

   sepal length  sepal width  petal length  petal width        class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
Index([u'sepal length', u'sepal width', u'petal length', u'petal width',
       u'class'],
      dtype='object')

程序里是通过read_csv函数的names参数来指定生成的dataframe对象df的colums的,如果dataframe是通过pandas的DataFrame构造函数来创建需要使用columns形参来指定嗯dataframe对象的colums信息。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1
print df1.columns
print df1.index

9.2 shape属性

shape属性是描述dataframe的形状的。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1
print df1.shape

程序执行结果:

   ax  bx  cx
0  10  11  12
1  13  14  15
2  16  17  18
3  19  20  21
4  22  23  24
5  25  26  27
6  28  29  30
7  31  32  33
8  34  35  36
9  37  38  39
(10, 3)

9.3 size属性

dataframe的size属性返回的是dataframe的value的个数。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df.shape
print df1.size

9.4 values属性

values属性是返回当前dataframe的数据和index、columns相对应。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1.values

9.5 dtypes属性

dtypes属性是描述当前dataframe的里的每列值的数据类型。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1.dtypes

程序执行结果:

ax    int64
bx    int64
cx    int64
dtype: object

9.6 ndim属性

dataframe的ndim属性和numpy的ndim意思一样。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1.shape
print df1.ndim

程序执行结果:

(10, 3)
2

9.7 T属性

dataframe的T属性,实际是转置的意思。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print "df1", "*" * 13
print df1
print "df1.T", "*" * 11
print df1.T

程序执行结果:

df1 *************
   ax  bx  cx
0  10  11  12
1  13  14  15
2  16  17  18
3  19  20  21
4  22  23  24
5  25  26  27
6  28  29  30
7  31  32  33
8  34  35  36
9  37  38  39
df1.T ***********
     0   1   2   3   4   5   6   7   8   9
ax  10  13  16  19  22  25  28  31  34  37
bx  11  14  17  20  23  26  29  32  35  38
cx  12  15  18  21  24  27  30  33  36  39

即列变行、行变列。