logo头像
Snippet 博客主题

Pandas

Python“玩”数据的利器!


Series

Series是一个能够容纳任意数据类型的带标记的一维数组。(Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. )

初始化

1
2
3
4
5
6
7
8
9
>>> import numpy as np
>>> s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
>>> s
a 1.677571
b 0.277393
c 0.945309
d 1.276764
e 0.686729
dtype: float64

转化为list

1
2
>>> s.tolist()
[1.6775712053562015, 0.2773932925846383, 0.9453092483775857, 1.276764091946826, 0.6867285857318276]

Dataframe

可以将DataFrame看做是一个表格。(DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.)

初始化

1
2
3
4
5
6
7
8
9
10
>>> import pandas as pd
>>> d = {"one": [1.0, 2.0, 3.0, 5.0], "two": [4.0, 3.0, 2.0, 2.0]}
>>> df = pd.DataFrame(d)
>>> df
one two
0 1.0 4.0
1 2.0 3.0
2 3.0 2.0
3 4.0 1.0


IO操作

1
2
3
df = pd.read_csv('') # 读取csv文件
df.to_csv('') # 写csv文件
df.to_csv('', index=False) # 不带index

列操作

对每列进行操作(apply)

1
2
3
4
5
6
7
8
df.apply(lambda x: x.max() - x.min())
Out[67]:
A 2.073961
B 2.671590
C 1.785291
D 0.000000
F 4.000000
dtype: float64

某列取值分布

1
2
3
4
5
6
7
>>> d = {'one': [1.0, 2.0, 3.0, 4.0], 'two': [4.0, 3.0, 2.0, 1.0]}
>>> df['one'].value_counts()
4.0 1
3.0 1
2.0 1
1.0 1
Name: one, dtype: int64

行操作

遍历行

1
2
for index, row in df.iterrows(): 
print(row['c1'], row['c2'])

评论系统未开启,无法评论!