Skip to main content

vROPS | Cluster Health-Check Script

Hi Guys,

In vROPS, when we deploy\create multi-node cluster then sometime due to large collection of metrics and alerts vROPS database get full and start creating many issues like metric graphs doesn't populate correctly, metrics collection start getting skip the timeline, vROPS UI get slow etc.

In order to maintain the health of vROPS cluster, you can use below script which gives very clear and exact information on its metric and alert collections, its current size and state of all active nodes in cluster, space in each directory of master node and many more things. I got this script from VMware while working on such an issue so thought to share with all.

After all, sharing is caring. Isn't it? Below is the script.

echo -e "\e[1;31mHOSTNAME:\e[0m" > $HOSTNAME-status.txt | hostname >> $HOSTNAME-status.txt;getent hosts | nslookup >> $HOSTNAME-status.txt; uname -a >> $HOSTNAME-status.txt; echo -e "\e[1;31mDNS CONFIGURATION:\e[0m" >> $HOSTNAME-status.txt | cat /etc/resolv.conf >> $HOSTNAME-status.txt; cat /etc/hosts >> $HOSTNAME-status.txt; echo -e "\e[1;31mVERSION INFO:\e[0m" >> $HOSTNAME-status.txt | cat /usr/lib/vmware-vcops/user/conf/lastbuildversion.txt >> $HOSTNAME-status.txt; echo -e "" >> $HOSTNAME-status.txt;cat /etc/SuSE-release >> $HOSTNAME-status.txt; echo -e "\e[1;31mDATE:\e[0m" >> $HOSTNAME-status.txt | date >> $HOSTNAME-status.txt; echo -e "\e[1;31mSERVICES:\e[0m" >> $HOSTNAME-status.txt | service vmware-vcops status >> $HOSTNAME-status.txt; echo -e "\e[1;31mCASA:\e[0m">> $HOSTNAME-status.txt| service vmware-casa status >> $HOSTNAME-status.txt; echo -e "\e[1;31mDISKSPACE:\e[0m" >> $HOSTNAME-status.txt | df -h >> $HOSTNAME-status.txt; echo -e "\e[1;31mHEAPDUMP:\e[0m">> $HOSTNAME-status.txt | ls -lrSh /storage/heapdump/>> $HOSTNAME-status.txt; echo -e "\e[1;31mIFCONFIG:\e[0m">> $HOSTNAME-status.txt | ifconfig >> $HOSTNAME-status.txt; echo -e "\e[1;31mCASADB.SCRIPT:\e[0m" >> $HOSTNAME-status.txt | tail -n +51 /data/db/casa/webapp/hsqldb/casa.db.script >> $HOSTNAME-status.txt; echo -e "\e[1;31mROLE STATE:\e[0m">> $HOSTNAME-status.txt | grep adminroleconnectionstring /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/data/roleState.properties >>$HOSTNAME-status.txt | grep adminroleenabled /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/data/roleState.properties >>$HOSTNAME-status.txt; echo -e "\e[1;31mGEMFIRE PROPERTIES:\e[0m">> $HOSTNAME-status.txt | grep locators /usr/lib/vmware-vcops/user/conf/gemfire.* >> $HOSTNAME-status.txt; grep bind-address /usr/lib/vmware-vcops/user/conf/gemfire.* >> $HOSTNAME-status.txt; grep shardRedundancyLevel /usr/lib/vmware-vcops/user/conf/gemfire.properties >> $HOSTNAME-status.txt;grep "serversCount" /usr/lib/vmware-vcops/user/conf/gemfire.properties >> $HOSTNAME-status.txt; echo -e "\e[1;31mPERSISTENCE PROPERTIES:\e[0m">> $HOSTNAME-status.txt | grep ^db* /usr/lib/vmware-vcops/user/conf/persistence/persistence.properties >> $HOSTNAME-status.txt; grep replica* /usr/lib/vmware-vcops/user/conf/persistence/persistence.properties >> $HOSTNAME-status.txt; grep "repl.db.role" /usr/lib/vmware-vcops/user/conf/persistence/persistence.properties >> $HOSTNAME-status.txt; echo -e "\e[1;31mCASSANDRA YAML:\e[0m" >> $HOSTNAME-status.txt | grep broadcast_rpc_address: /usr/lib/vmware-vcops/user/conf/cassandra/cassandra.yaml >> $HOSTNAME-status.txt | grep listen_address: /usr/lib/vmware-vcops/user/conf/cassandra/cassandra.yaml >> $HOSTNAME-status.txt; echo -e "\e[1;31mNODE STATE INFO:\e[0m">> $HOSTNAME-status.txt | $VMWARE_PYTHON_BIN $ALIVE_BASE/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo | sed -nre '/stateMappings/,/}$/p' >> $HOSTNAME-status.txt; echo -e "\e[1;31mWRAPPER RESTARTS:\e[0m" >> $HOSTNAME-status.txt |find /usr/lib/vmware-vcops/user/log/ -name "*wrapper.log" -print -exec bash -c "grep 'Wrapper Stopped' {} | tail -5" \; | cut -d'|' -f3 >> $HOSTNAME-status.txt; echo -e "" >> $HOSTNAME-status.txt; echo -e "\e[1;4;35mPERFORMANCE RELATED INFORMATION\e[0m" >> $HOSTNAME-status.txt; echo -e "" >> $HOSTNAME-status.txt; echo -e "\e[1;31mvCPU INFO:\e[0m" >> $HOSTNAME-status.txt |grep -wc processor /proc/cpuinfo >> $HOSTNAME-status.txt; echo -e "\e[1;31mMEMORY INFO:\e[0m" >> $HOSTNAME-status.txt | awk '$3=="kB"{$2=$2/1024**2;$3="GB";} 1' /proc/meminfo | column -t | grep MemTotal >> $HOSTNAME-status.txt; echo -e "\e[1;31mTOP OUTPUT:\e[0m" >> $HOSTNAME-status.txt; /usr/bin/top -d 0.5 -n 1 -b | head -5 >> $HOSTNAME-status.txt; echo -e "\e[1;31mADAPTER TYPE OBJECT COUNTS:\e[0m" >> $HOSTNAME-status.txt; su - postgres -c "PGDATA=/storage/db/vcops/vpostgres/repl PGPORT=5433 /opt/vmware/vpostgres/current/bin/psql -d vcopsdb -c 'select count(*),adapter_kind from resource group by adapter_kind;'" | awk '{ SUM += $1; print} END {print "Total";print SUM }' | cut -d ':' -f 5 >> $HOSTNAME-status.txt; echo -e "\e[1;31mCASSANDRA ACTIVITIES:\e[0m" >> $HOSTNAME-status.txt | /usr/lib/vmware-vcops/cassandra/apache-cassandra-2.1.8/bin/./nodetool --ssl -h 127.0.0.1 --port 9008 -u maintenanceAdmin --password-file /usr/lib/vmware-vcops/user/conf/jmxremote.password  cfstats -H globalpersistence.activity_2_tbl >> $HOSTNAME-status.txt; echo -e "\e[1;31mALERT DB COUNT:\e[0m" >> $HOSTNAME-status.txt | su - postgres -c "/opt/vmware/vpostgres/9.3/bin/psql -d vcopsdb -A -t -c 'select count(*) from alert'" >> $HOSTNAME-status.txt; echo -e "\e[1;31mALARM DB COUNT:\e[0m" >> $HOSTNAME-status.txt | su - postgres -c "/opt/vmware/vpostgres/9.3/bin/psql -d vcopsdb -A -t -c 'select count(*) from alarm'" >> $HOSTNAME-status.txt; less -r $HOSTNAME-status.txt

Now, how to use this script? 


  • login vROPS CLI with root account on vROPS master node appliance
  • Copy above script and as it is paste into your CLI window of vROPS master node
  • Don't worry, it doesn't have any side-effect :)
It will give you exhaustive detail of all the information about your current active nodes in cluster. Hope you find it useful. Must share the feedback!





Thank you,
Team vCloudNotes




Comments

Popular posts from this blog

Network Migration | VLAN to VXLAN

NSX-T | Getting Started with PowerCLI

Automate Power on Operations using PS

NSX-T | Security