【数据分析】图书馆数据-05读者类型聚类挖掘

    xiaoxiao2021-03-25  84

          根据读者借阅图书的总册数进行分类可以大致了解借阅图书的积极性,那么还有那些因素影响着学生的借阅图书情况呢?不同类型的读者对图书的要求也是不同的,在阅读次数较多的分组中对读者进行再一次的分类,寻找读者阅读中的因素。       导入现有的读者信息,包括读者性别、所在院系以及对应的借书书目信息。

    # -*-coding:utf-8-*- import pandas as pd import numpy as np pf = pd.read_csv('new_data.csv', encoding='gbk') # print pf.head() unit = pf['read_unit'] unit = unit.str.split(' ') # 原来是空格分隔的 dapartment = unit.str[0] # 学院 print dapartment major = unit.str[1] # 专业 print major new_table = pf[['read_sex', 'book_id']] # print new_table new_table.insert(2, 'dapartment', dapartment) # 插入学院 # new_table.insert(3, 'major', major) # 插入专业 print new_table sex = new_table['read_sex'] book = new_table['book_id'] dapartment = new_table['dapartment'] print sex, book, dapartment # 输出三列 """ 算法:获取标签 """ def add_label(s): l = [] m = [] for i in range(len(s)): if i == 0: m = [] l = [1] else: m.append(s[i - 1]) if s[i] in m: if m.index(s[i]) == 0: l.append(1) else: l.append(l[m.index(s[i])]) else: l.append(max(l) + 1) return l sex_list = sex.tolist() # 格式转换: Series->list book_list = book.tolist() dapartment_list = dapartment.tolist() # print sex_list, book_list, dapartment_list sex_list = add_label(sex_list) # 通过算法获取标签 book_list = add_label(book_list) dapartment_list = add_label(dapartment_list) # print sex_list, book_list, dapartment_list new_cluster = [] new_cluster.append(sex_list) # 性别 new_cluster.append(book_list) # 书目 new_cluster.append(dapartment_list) # 学院 m = np.array(new_cluster).T # list转换为矩阵 print 'm:', m print '----------------------------------------------' from sklearn.cluster import KMeans np.set_printoptions(threshold='nan') # 将省略号里面的全部打印出来 kmeans = KMeans(n_clusters=5) print kmeans predict = kmeans.fit_predict(m) print predict print len(predict)
    转载请注明原文地址: https://ju.6miu.com/read-15044.html

    最新回复(0)