NSX | All about control plane

When I was learning about NSX Control plane, I couldn't find all information in a single pane or page. Information was there but scattered. I thought to gather all the info and put it in below way. I found it better to learn in future. Please don't shy to leave your feedback if you find it useful too.

Below are the components of NSX control plan
  • NSX Controller Cluster
  • Control Plane Agent (netcpa)
  • NSX Logical Router ControlVM

Now Let's explore each of above in detail. 

NSX Controller Cluster
  • NSX controllers provide control plane functionality.
  • Controllers distribute logical routing network information to ESXi hosts.
  • It is responsible for updating ESXi host on the state of the logical network components.
  • NSX cluster uses "Sharding" process to distribute workload across NSX controller cluster nodes. Sharding is the action of dividing the NSX controller workload into different shards so that each NSX controller instance has an equal portion of work
  • Supported and recommended number of controllers are at least 1 for dev and 3 for production
  • A distinct controller node acts as a Master node for given entities such as logical switching, routing and other services.
  • When one controller fails, master nodes redistribute the shards to the remaining available clusters.
  • The election of the master for each role requires a majority vote of all active and inactive nodes in the cluster. This is the primary reason why a controller cluster must always be deployed with an odd number of nodes.
  • It communicate with Control Plane Agent (Netcpa)
  • It reduce ARP flooding through ARP suppression.
  • NSX controllers creates and save ARP, MAC, VTEP and dynamic route distribution tables.
  • NSX controllers are required to implement vxlan networking.
  • It is required if you are deploying distributed routers or VXLAN in unicast or hybrid mode.
  • If one controller is down, there will be no impact on NSX control plane as other two will service in the NSX cluster.
  • If two controllers are down, there will be no impact on running routing configuration but third controller will be read-only and will not support any configuration changes.
  • NSX controllers work with CDO (Controller Disconnected Operation) mode, in case all the three controllers are down or any host lost its connectivity with control plan
  • NSX Controller communication channel
    • Management plane communicates with controller clusters over TCP 443 port number
    • The management plane directly communicates with vsfwd in esxi host over TCP 5671 by using RabbitMQ to push down firewall configuration changes
    • Control plane talk to the netcpa agent over TCP 1234 to propagate L2\L3 changes
    • netcpa propagete these changes to respective routing and VXLAN kernel modules in the ESXi host.

Control Plane User World Agent

  • Control plane agent (netcpa) is a TCP client that communicates with the controller using control plane protocol
  • It uses SSL to secure the communication with NSX controller instances
  • it mediates between controller instance and hypervisor kernel module
  • It sends information about network connectivity, IP and MAC address to NSX controllers
  • It retrieve configuration information from nsx manager through vsfwd
  • Beginning with version 6.3, an auto recovery mechanism has been added to user world agents (netcpa and vsfwd)
    • The automatic user world agent monitoring process detect the user world agent in wrong state and try to recover it automatically
    • If the user world agent reports a temporary failure due to delayed response to health check then, a warning message is reported in the VMkernel logs
  • User world agents are deployed by NSX manager on ESXi hosts through EAM during host preparations.
  • Each ESXi hosts in NSX cluster run two user world agents (UWA) that are netcpa and vsfwd.
  • Below is the classic screenshot to explain it. Taken from VMware e-book
  • netcpa auto restart operations:
  1. netcpa update the global lock periodically
  2. An external script checks the heartbeat counter to see whether netcpa failed
  3. If netcpa failed, the script restart the daemon and generate the core dump and alerts. Watchdog events are recorded in /var/log/syslog.log
  • Verify netcpa agent service in esxi host
Login ESXi host and run the command
# /etc/init.d/netcpad status 

Control VM
  • It is a component of control plane and not a data path. Mind it:)
  • It establish OSPF\BGP neighbour peering.
  • It pushes the routing updates to controller cluster
  • Control VM is must have component for High availability configuration and with dynamic routing configuration
  • NSX CVM communicates with NSX manager and controller cluster
  • It sends the routing information to controller clusters
  • NSX manager sends LIF information to controller cluster and control VM
  • In case, you have static routing then no edge\DLR in HA then you don't need it 
  • The DLR control VM can be configured to redistribute IP prefixes for all the connected logical networks into OSPF
  • NSX edge pushes the prefixes to reach IP networks in the external networks to the control VM

That's all folks for now but I will keep on adding in the list.

Feel free to ask to cover any specific topic on this.

Thank you,
Team vCloudNotes


Post a Comment