Numpy
Scipy
矩阵向量处理。
Numpy
provides a high-performance multidimensional array and basic tools to compute with and manipulate these arrays.
SciPy
builds on this, and provides a large number of functions that operate on numpy arrays and are useful for different types of scientific and engineering applications.
参考:
http://old.sebug.net/paper/books/scipydoc/numpy_intro.html
http://cs231n.github.io/python-numpy-tutorial/ (python基础,numpy, scipy, matplotlib均包含在内)
numpy用法总结:
https://github.com/zhangweijiqn/testPython/blob/master/src/NumpyTest/testNumpy.py
Scikit-learn
数据建模分析处理。
scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.
conda update sklearn:
conda update scikit
-
learn
官网:
http://scikit-learn.org/stable/index.html
文档还是很详细的,官网主页列出了很多个机器学习的项:
在user guide中列出了所有包含的项目:
http://scikit-learn.org/stable/user_guide.html
安装:
http://scikit-learn.org/stable/install.html
pip install -U scikit-learn (需要提前安装numpy and scipy)
这种方式在安装完后,
from
sklearn.ensemble
import
RandomForestClassifier ,
可能会报ImportError: cannot import name check_arrays的错误.
原因参考:
http://stackoverflow.com/questions/29596237/import-check-arrays-from-sklearn,
解决:
conda update scikit
-
learn
sklearn model selection中带有GridSearch的功能。
API:
http://scikit-learn.org/stable/modules/classes.html
sklearn提供了TFIDF算法,可以对中文提取关键词以及向量化,下面是参考博文
:
http://www.cnblogs.com/chenbjin/p/3851165.html
Pandas
数据读写相关。
powerful Python data analysis toolkit.
官方主页:
http://pandas.pydata.org/
tutorial:
http://pandas.pydata.org/pandas-docs/stable/10min.html
document:
http://pandas.pydata.org/pandas-docs/stable/index.html
读取csv文件:
http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.1/cookbook/Chapter 1 - Reading from a CSV.ipynb
视频教程:
http://www.dataschool.io/easier-data-analysis-with-pandas/
gensim
Gensim
是一个很专业的主题模型Python工具包。
Gensim
is an
open-source
vector space modeling
and
topic modeling
toolkit, implemented in the
Python
programming language. It uses
NumPy
,
SciPy
and optionally
Cython
for performance. It is specifically intended for handling large text collections, using efficient online, incremental algorithms. Gensim is commercially supported by the startup RaRe Technologies.
Gensim includes implementations of
tf-idf
,
random projections
,
word2vec
and
document2vec
algorithms,
hierarchical Dirichlet processes
(HDP),
latent semantic analysis
(LSA)
and
latent Dirichlet allocation
(LDA)
, including
distributed
parallel
versions.
gensim:
http://radimrehurek.com/gensim/index.html
github:
https://github.com/RaRe-Technologies/gensim
install: pip install gensim
转载请注明原文地址: https://ju.6miu.com/read-679543.html