Python“玩”数据的利器!

Series

Series是一个能够容纳任意数据类型的带标记的一维数组。(Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. )

初始化

>>> import numpy as np
>>> s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
>>> s
a    1.677571
b    0.277393
c    0.945309
d    1.276764
e    0.686729
dtype: float64

转化为list

1 2	>>> s.tolist() [1.6775712053562015, 0.2773932925846383, 0.9453092483775857, 1.276764091946826, 0.6867285857318276]

Dataframe

可以将DataFrame看做是一个表格。(DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.)

初始化

>>> import pandas as pd
>>> d = {"one": [1.0, 2.0, 3.0, 5.0], "two": [4.0, 3.0, 2.0, 2.0]}
>>> df = pd.DataFrame(d)
>>> df
   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

IO操作

1
2
3

df = pd.read_csv('') # 读取csv文件
df.to_csv('') # 写csv文件
df.to_csv('', index=False) # 不带index

列操作

对每列进行操作(apply)

df.apply(lambda x: x.max() - x.min())
Out[67]: 
A    2.073961
B    2.671590
C    1.785291
D    0.000000
F    4.000000
dtype: float64

某列取值分布

>>> d = {'one': [1.0, 2.0, 3.0, 4.0], 'two': [4.0, 3.0, 2.0, 1.0]}
>>> df['one'].value_counts()
4.0    1
3.0    1
2.0    1
1.0    1
Name: one, dtype: int64

行操作

遍历行

1 2	for index, row in df.iterrows(): print(row['c1'], row['c2'])

Pandas

Series

Dataframe

初始化

IO操作

列操作

行操作