[Python] Pandas之merge groupby

    xiaoxiao2021-12-03  20

    1.merge用来连接两个DataFrame对象,参数on选择基于哪个列进行连接,how选择内连接,左连接还是全连接的方式。merge操作类似于SQL中的join操作。

    import pandas as pd df1 = pd.DataFrame({'key':['b','b','a','c','a','b'],'data1':range(6)}) df2 = pd.DataFrame({'key':['a','b','a','b','d'],'data2':range(5)}) print df1 print df2 # data1 key # 0 0 b # 1 1 b # 2 2 a # 3 3 c # 4 4 a # 5 5 b # data2 key # 0 0 a # 1 1 b # 2 2 a # 3 3 b # 4 4 d df3 = pd.merge(df1,df2,on = 'key',how = 'left') print df3 # data1 key data2 # 0 0 b 1.0 # 1 0 b 3.0 # 2 1 b 1.0 # 3 1 b 3.0 # 4 2 a 0.0 # 5 2 a 2.0 # 6 3 c NaN # 7 4 a 0.0 # 8 4 a 2.0 # 9 5 b 1.0 # 10 5 b 3.0 2.对于大数据,很可能要对于其中的部分列进行聚合,这里使用groupby

    import numpy as np import pandas as pd df = pd.DataFrame({'key1':['a','a','b','b','a'], 'key2':['one','two','one','two','one'], 'data1':np.random.randn(5), 'data2':np.random.randn(5)}) print df # data1 data2 key1 key2 # 0 0.336673 0.540336 a one # 1 -0.821839 -1.348654 a two # 2 1.066305 0.230884 b one # 3 0.788950 -0.540482 b two # 4 -0.872019 -0.813607 a one df1 = df[['data1','data2']].groupby(df['key1']).sum() print df1 # data1 data2 # key1 # a 0.827444 1.512656 # b -0.060730 -0.461793 print df.groupby('key1')[['data1','data2']].sum() #简便写法 # data1 data2 # key1 # a 0.827444 1.512656 # b -0.060730 -0.461793注意,这里还可以利用字典对表进行分组聚合,这里以行聚合为例(以列分组聚合只要修改字典并把axis设置成1)

    import numpy as np import pandas as pd df = pd.DataFrame({'key1':['a','a','b','b','a'], 'key2':['one','two','one','two','one'], 'data1':np.random.randn(5), 'data2':np.random.randn(5)}) print df # data1 data2 key1 key2 # 0 0.336673 0.540336 a one # 1 -0.821839 -1.348654 a two # 2 1.066305 0.230884 b one # 3 0.788950 -0.540482 b two # 4 -0.872019 -0.813607 a one mapping = {0:'one',1:'one',2:'two',3:'two',4:'two'} print df.groupby(mapping,axis = 0).sum() # data1 data2 # one 2.044990 0.916197 # two 2.310946 -2.240196

    转载请注明原文地址: https://ju.6miu.com/read-680019.html

    最新回复(0)