nagios check

    xiaoxiao2021-03-25  150

    本文转自:http://www.ttlsa.com/mongodb/nagios-check_mongodb-plugin-to-monitor-mongodb/

    当在生产环境下使用某种服务时,相应的监控措施也应当完善起来,来检测服务是否正常和获取相关信息是很有必要的。

    下面来说说使用nagios-plugin-mongodb来监控mongodb数据库。https://github.com/mzupan/nagios-plugin-mongodb

    1. 下载check_mongodb nagios插件

    1 2 3 4 5 # cd /usr/local/nagios/libexec/ # wget --no-check-certificate https://github.com/mzupan/nagios-plugin-mongodb/archive/master.zip # unzip master # mv nagios-plugin-mongodb-master nagios-plugin-mongodb # chown -R nagios.nagios nagios-plugin-mongodb/

    2. 安装Mongo Python驱动

    需要先安装EPEL源。参见《CentOS / RHCE 可供使用的yum》。

    1 # yum install pymongo.x86_64

    或者自己下载源码包编译。

    1 2 3 4 # wget --no-check-certificate https://github.com/mongodb/mongo-python-driver/archive/master.zip # unzip mongo-python-driver-master.zip # cd mongo-python-driver-master # python setup.py install

    或通过python easy_install来安装。

    1 # easy_install pymongo

    3. check_mongodb.py 说明

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 # ./check_mongodb.py --help usage : check_mongodb . py [ options ]   This Nagios plugin checks the health of mongodb .   options :    - h , -- help             show this help message and exit    - H HOST , -- host = HOST   The hostname you want to connect to    - P PORT , -- port = PORT   The port mongodb is runnung on    - u USER , -- user = USER   The username you want to login as    - p PASSWD , -- pass = PASSWD                          The password you want to use for that user    - W WARNING , -- warning = WARNING                          The warning threshold we want to set    - C CRITICAL , -- critical = CRITICAL                          The critical threshold we want to set    - A ACTION , -- action = ACTION                          The action you want to take    -- max - lag             Get max replication lag ( for replication_lag action                          only )    -- mapped - memory       Get mapped memory instead of resident ( if resident                          memory can not be read )    - D , -- perf - data       Enable output of Nagios performance data    - d DATABASE , -- database = DATABASE                          Specify the database to check    -- all - databases       Check all databases ( action database_size )    - s , -- ssl             Connect using SSL    - r , -- replicaset       Connect to replicaset    - q QUERY_TYPE , -- querytype = QUERY_TYPE                          The query type to check                          [ query | insert | update | delete | getmore | command ] from                          queries_per_second    - c COLLECTION , -- collection = COLLECTION                          Specify the collection to check    - T SAMPLE_TIME , -- time = SAMPLE_TIME                          Time used to sample number of pages faults

    Nagios MongoDB监控插件的所有动作:

    通过参数-A来传递下列任一动作。这些动作有:'connect', 'connections', 'replication_lag', 'replication_lag_percent', 'replset_state', 'memory', 'memory_mapped', 'lock', 'flushing', 'last_flush_time', 'index_miss_ratio', 'databases', 'collections', 'database_size', 'database_indexes', 'collection_indexes', 'queues', 'oplog', 'journal_commits_in_wl', 'write_data_files', 'journaled', 'opcounters', 'current_lock', 'replica_primary', 'page_faults', 'asserts', 'queries_per_second', 'page_faults', 'chunks_balance', 'connect_primary', 'collection_state', 'row_count'

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 connect默认动作.检查连接 connections检查打开的数据库连接的百分比 memory检测内存使用量 memory_mapped检查映射内存的使用情况 lock检查锁定时间的百分比 flushing检查平均flush时间(以微秒) last_flush_time检查上次刷新时间(以微秒) index_miss_ratio检查索引命中失败率 databases检查数据库的总数 collections检查集合的总数 database_size检查特定数据库的大小 database_indexes检查特定数据库的索引大小 collection_indexes检查一个集合的索引大小 replication_lag检查复制延迟(以秒为单位) replication_lag_percent检查复制延迟(以百分比表示) replset_state检查副本集的状态 replica_primary检查副本集的主服务器 queries_per_second检查每秒查询量 connect_primary检查连接在一组中的主服务器 collection_state检查数据库中特定集合的状态

    4. 定义nagios command

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # vim /usr/local/nagios/etc/objects/commands.cfg   define command {     command_name    check_mongodb     command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ }   define command {     command_name    check_mongodb_database     command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$  -W $ARG6$ -C $ARG7$ -d $ARG8$ }   define command {     command_name    check_mongodb_collection     command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -d $ARG8$ -c $ARG9$ }   define command {     command_name    check_mongodb_replicaset     command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -r $ARG8$ }   define command {     command_name    check_mongodb_query     command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -q $ARG8$ }

    5. 创建监控项

    5.1 Check Connection 需要监控集群中每台mongodb实例。

    1 2 3 4 5 6 defineservice{     use                generic-service     hostgroup_name          MongoServers     service_description    MongoConnectCheck     check_command          check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connect!2!4 }

    5.2 Check Percentage of Open Connections 检查空闲连接率

    1 2 3 4 5 6 define service {     use                 generic-service     hostgroup_name          Mongo Servers     service_description     Mongo Free Connections     check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connections!70!80 }

    5.3 Check Replication Lag 检测复制延迟

    1 2 3 4 5 6 defineservice{     use                generic-service     hostgroup_name          MongoServers     service_description    MongoReplicationLag     check_command          check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replication_lag!15!30 }

    5.4 Check Replication Lag Percentage 检查复制滞后百分比。如果检查达到100%的话就需要完全重新同步。

    1 2 3 4 5 6 define service {     use                 generic-service     hostgroup_name          Mongo Servers     service_description     Mongo Replication Lag Percentage     check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replication_lag_percent!50!75 }

    5.5 Check Memory Usage 检查内存使用情况

    1 2 3 4 5 6 defineservice{     use                generic-service     hostgroup_name          MongoServers     service_description    MongoMemoryUsage     check_command          check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!memory!20!28 }

    5.6 Check Mapped Memory Usage 检查mongodb映射内存使用情况

    1 2 3 4 5 6 define service {     use                 generic-service     hostgroup_name          Mongo Servers     service_description     Mongo Mapped Memory Usage     check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!memory_mapped!20!28 }

    5.7 Check Lock Time Percentage 检查锁定时间百分比。如果有锁定时间通常意味着数据库已经超载。

    1 2 3 4 5 6 defineservice{     use                generic-service     hostgroup_name          MongoServers     service_description    MongoLockPercentage     check_command          check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!lock!5!10 }

    5.8 Check Average Flush Time 检查平均刷新时间。如果平均刷新时间高就意味着数据库存在大量写。

    1 2 3 4 5 6 define service {     use                 generic-service     hostgroup_name          Mongo Servers     service_description     Mongo Flush Average     check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!flushing!100!200 }

    5.9 Check Last Flush Time 检查最后刷新时间。如果最后刷新时间高就意味着服务器可能存在IO压力,需要更换更快的磁盘。

    1 2 3 4 5 6 defineservice{     use                generic-service     hostgroup_name          MongoServers     service_description    MongoLastFlushTime     check_command          check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!last_flush_time!200!400 }

    5.10 Check status of mongodb replicaset 检查的MongoDB replicaset状态

    1 2 3 4 5 6 define service {       use                     generic-service       hostgroup_name          Mongo Servers       service_description     MongoDB state       check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replset_state!0!0 }

    5.11 Check status of index miss ratio 检查索引命中失败率。如果该值高,需要考虑添加索引了。

    1 2 3 4 5 6 defineservice{       use                    generic-service       hostgroup_name          MongoServers       service_description    MongoDBIndexMissRatio       check_command          check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!index_miss_ratio!.005!.01 }

    5.12 Check number of databases and number of collections

    1 2 3 4 5 6 7 8 9 10 11 12 13 define service {       use                     generic-service       hostgroup_name          Mongo Servers       service_description     MongoDB Number of databases       check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!databases!300!500 }   define service {       use                     generic-service       hostgroup_name          Mongo Servers       service_description     MongoDB Number of collections       check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collections!300!500 }

    5.13 Check size of a database 检查数据库的大小。跟踪数据增长率。

    1 2 3 4 5 6 defineservice{       use                    generic-service       hostgroup_name          MongoServers       service_description    MongoDBDatabasesizedb_ttlsa_posts       check_command          check_mongodb_database!10.1.11.155!27017!check_mongodb!www.ttlsa.com!database_size!300!500!db_ttlsa_posts }

    5.14 Check index size of a database 检查数据库的索引大小

    1 2 3 4 5 6 define service {       use                     generic-service       hostgroup_name          Mongo Servers       service_description     MongoDB Database index size db_ttlsa_posts       check_command           check_mongodb_database!10.1.11.155!27017!check_mongodb!www.ttlsa.com!database_indexes!50!100!db_ttlsa_posts }

    5.15 Check index size of a collection 检查一个集合的索引大小

    1 2 3 4 5 6 defineservice{       use                    generic-service       hostgroup_name          MongoServers       service_description    MongoDBDatabaseindexsizedb_ttlsa_posts       check_command          check_mongodb_collection!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collection_indexes!50!100!db_ttlsa_posts!posts }

    5.16 Check the primary server of replicaset 检查replicaset的主服务器

    1 2 3 4 5 6 define service {       use                     generic-service       hostgroup_name          Mongo Servers       service_description     MongoDB Replicaset Master Monitor: replset_ttlsa       check_command           check_mongodb_replicaset!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replica_primary!0!1!replset_ttlsa }

    5.17 Check the number of queries per second 检查每秒查询数量。这将检查服务器上每秒查询数量,类型有:query|insert|update|delete|getmore|command

    1 2 3 4 5 6 defineservice{       use                    generic-service       hostgroup_name          MongoServers       service_description    MongoDBUpdatesperSecond       check_command          check_mongodb_query!10.1.11.155!27017!check_mongodb!www.ttlsa.com!queries_per_second!200!150!update }

    5.18 Check Primary Connection

    1 2 3 4 5 6 define service {     use                 generic-service     hostgroup_name          Mongo Servers     service_description     Mongo Connect Check     check_command           check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connect_primary!2!4 }

    5.19 Check Collection State 检测集合状态

    1 2 3 4 5 6 defineservice{     use                generic-service     hostgroup_name          MongoServers     service_description    MongoCollectionState     check_command          check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collection_state!db_ttlsa_posts!posts }

    转载请注明来自运维生存时间: http://www.ttlsa.com/html/4188.html

    转载请注明原文地址: https://ju.6miu.com/read-4797.html

    最新回复(0)