, , ,

unlock NSX local accounts for API Operations (KB#00100)

Hello Guys,

This issue is mysterious issue where NSX's local account gets locked out not for login into NSX GUI but for API operations. Creating this blog because I couldn't find the solution on web. VMware article was there explaining root cause but solution was not there too.

Symptoms

1) Any application or product which works with or uses NSX for network services, will not be able to use NSX to create, remove, update or deletion operations. However, running services will be running fine.
2) You will be able to login NSX portal but will not be able to use APIs. Not even through any other API tool like postman. You might get error like below


3) I uses VMware Cloud Director which uses NSX for networking and security service. On VMware Cloud Director, I could see error like below when you open any Edge Gateway or do any configuration change. Below are the VCD debug logs-

2021-07-12 16:18:52,955 | DEBUG    | task-service-activity-pool-128 | NetworkSecurityErrorHandler    | Response error: <!doctype html><html lang="en"><head><title>HTTP Status 403 – Forbidden</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 403 – Forbidden</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> This IP address has been blocked temporarily.</p><p><b>Description</b> The server understood the request but refuses to authorize it.</p><hr class="line" /></body></html> | requestId=18cbd3bc-2eaf-42af-b11e-4bce957bff9e,request=POST https://testvcd.com/api/admin/edgeGateway/509ed27c-724d-490b-b0c7-e25d19523017/action/redeploy,requestTime=1626106731682,remoteAddress=172.25.1.21:60592,userAgent=Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (...,accept=application/*+json;version 34.0 vcd=632554b9-0779-4ddc-8b62-be08d1c167f6,task=82c8a4fc-35ba-490a-a235-b4a65a25cace activity=(com.vmware.vcloud.backendbase.management.system.TaskActivity,urn:uuid:82c8a4fc-35ba-490a-a235-b4a65a25cace)

4) I checked the NSX Manager logs and could see multiple events like below in /usr/appmgmt-webserver/logs/localhost_access_log.2021-07-13.txt. 

Where localhost_access_log.2021-07-13.txt. This filename will change as per current date in logs.

172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/2.0/services/ipset/ipset-29 HTTP/1.1" "https-jsse-nio-443-exec-2292" 403 649 2234
172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/2.0/services/ipset/ipset-106 HTTP/1.1" "https-jsse-nio-443-exec-2334" 403 649 2241
172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/versions HTTP/1.1" "https-jsse-nio-443-exec-2366" 403 649 2243
172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/versions HTTP/1.1" "https-jsse-nio-443-exec-2325" 403 649 2224

Yellow -  This is IP address which is trying to make a connection with NSX Manager which is failing

Red - This is error code 403 which means access forbidden

Above logs means, IP address (172.25.1.239) is trying connect with NSX Manager but NSX Manager is not allowing access to connect.

Root Cause

When any application (Monitoring or non-monitoring) is configured with incorrect username and password to connect with NSX Manager then as a security feature NSX Manager blacklist that username after certain invalid authentication attempts (by default it is 10 but can be modified). Any user too can do this because someone might be trying to guess the password! be aware of that too ;)

Solution

Before correcting it, you must identify which application or user is doing this. Because if post correction, if still invalid attempts are coming then it will again block the user account. How will you identify that's your headache but you can reach me out :)

Once identified then follow below steps

1. Login NSX Manager CLI with admin

2. enter into enable mode with commad

#enable

3. Enter into engineering mode with command↓

#st en

Press Y when asked and use password :  IAmOnThePhoneWithTechSupport

4. then edit /home/secureall/secureall/sem/WEB-INF/spring/vsmconfig.properties and look for values as below in this file

#Denotes whether blacklisting is enabled
blacklist.enabed=true

You will find it true. 

5. To resolve this issue, first you need to make it false that is from 

blacklist.enabed=true 


blacklist.enabed=false

and then save the file

6. Reboot NSX Manager now with below command

#reboot

7. Once rebooted, you will find that NSX account which you used for integration with other application and which was locked out is now responding over API requests and will give you output as below now.

8. In step 5, you have changed the blacklisting configuration which can be a security vulnerability. As soon as this issue fix, then you need to revert it back to true so that future invalid attempts gets blocked.

9. Post making it true, reboot NSX Manager. Please note that everytime you change it, you have to reboot NSX manager to make it effective.

Don't forget to comment if it was useful for you. Cheers!

,

Kill the ongoing\stuck process in VMware Cloud Director(KB#00099)

Hello guys,

You will find today's post more interesting because resolution of this issue is only documented as vcloud service restart. There are many posts on the web which says that in order to clean the stuck job in VMware Cloud Director GUI, you need to restart vCD services.

Today, I faced the same issue but I decided to find some other solution. For single VM, it is never a good idea to restart entire vcloud director services. isn't it?

Issue : Task is running since long in Cloud Director GUI and not timing out even after more than 6 hours

First, let me show you what exactly a long running job or stuck job means. Below is the task detail. In vCloud Director GUI, you will also see that particular tasks is running since long. Hope it is clear


Observation : Above job was running since last 6-7 hours. Earlier, I noticed that such job used to timed-out in around 4 hours which is default time out value for any vcd tasks but this one was stubborn job. Due to this, I couldn't perform any operation on VM other than power off and power on. From vCenter, there was no such issues so this was clearly a VMware Cloud Director issue

Solution :

1. Login primary cell with root account and then login DB (if embedded. In case of external DB then login directly DB server).

2. Run below command to see the stuck task

select * from organization where name = 'org_name';

It will give you org_id. note that down.

3. Now run below command by using org_id you got in 2nd command.

select * from task where org_id = 'org_id';

It will show you the task name, job id (as in above snippet), which will confirm that this is the right task to cancel. Generally you will see this stuck task only but if you see many then search with job id rather than org_id. If you don't understood this line then, comment box is yours.

4.  Now execute below command to delete this task from DB. Take backup first please!

Delete from task where org_id = 'org_id';

Stuck task has been deleted from DB now. Doing all above operations doesn't require any downtime so go ahead without any fear!

Cheers! 

,

Fix null ip_scope_id of vcd network (KB#00098)

Hi guys,

Since the beginning of HTML5 interface of VMware Cloud Director there are some major issues with portal and the DB. Today I am going to give you fix for one of those issues for which many people over the globe are finding the solution. Few people are not getting solution from even VMware as per my discussion in VMware community. Knowing the fact that I got this fix from VMware but still I found many people struggling due to this. So, I thought that I should create a web record of this issue.

Symptoms\Observation : 

1. When you click on external network page in cloud director /provider page and browse all pages one by one, one of the page will keep on trying to browse but will fail with timeout

2. When you apply filter and search that external network tool, your search will end with timeout with no result

3. When you try to add problematic network into your vapp then it will always fail to add and will give you weired errors.

4. This network can be external network or org network or vapp network

5. Edges where problematic network is attached will intermittently loss the network packet or total loss also can be there.

Basically, you will feel that network is broken and you can't see or edit that problematic or broken network in vCD GUI at all.

Reason :

This is happening because ip_scope_id value in the cloud director DB for that particular network has become null.

Resolution :

1. If you too facing similar symptoms with your network then run below command to confirm that your network is having same issue. This command needs to run in Cloud director DB which can be postgres or SQL. Command is same in both the DB flavor-

select * from logical_network where name = 'problematic_network_name';

Below is the example output


In above snippet, I depicted clearly that where is the issue. Now, if you too see that ip_scope_id is null for your network too then run next command.

2. To solve this, we need to know ip_scope_id value and then insert in above table. to get the value run below command. Only thing to notice here in below command is, you need to change the logical_network_id value. This value you will get from first command. Copy that value and put that in below command and run it. There will be no impact of doing this as this is just get command.

select * from ip_scope where id in (select scope_id from allocated_ip_address where id in (select allocated_ip_address_id from gateway_assigned_ip where gateway_interface_id in (select id from gateway_interface where logical_network_id = 'db62f356-2f24-48b4-a841-87512e720f65')));

You will have below sample output-


ID, in above snippet is your ip_scope_id. Now when you have this let's insert it in logical_network table.

Before doing this, take the backup of your DB. If you have embedded DB then use the command

#/opt/vmware/appliance/bin/create-db-backup

Once backup is done then let's insert it by running next command

3. From 1st and 2nd command you have the logical_network_id and ip_scope_id respectively. Replace both ids in below command and then run it.

insert into logical_network_ip_scope(scope_id,logical_network_id) values('eb578ab9-2f4b-49c6-8fa6-4cd25472a1bf','5149b5ef-fc53-453d-a59d-a6f349624307'); 


 

Warning : If you are not well versed or if you are not understanding what is happening here then take help of VMware to understand or you can comment here I will help you out for sure.

Once above command is done, you will see that the network is now visible in the VCD GUI, you can find the network in the portal. Edge or vapp which are connected to this network are behaving good now.

4. Now, again run the first command and see if you have the ip_scope_id value now. This should be in-place but there are 1% chance that it is still not showing there but issue is resolved. To fix it, run below command-

update logical_network set ip_scope_id = 'd0953b40-3426-44af-a719-349b1245b3de' where id = '7353056c-2ba9-4ee0-9f75-631b3f0be77f';

Now, I think I can hope that you know what to change in above line before running it. If any doubt then I am not far from your guys. Feel free to comment.

Your issue is fixed! Enjoy 😊 


Get Cloud Director VM Inventory with Storage Info (KB#00097)

 Hello Guys,

Keep fighting with Corona and keep your spirit up. It will help 😊

I got this challenge in form of a request from my customer. He requested all VM's info including cpu, mem and disk size. Initially I thought what's a big deal in it but then when I actually started writing code, I had the feeling like "What the heck" LOL!

Guys who already dealt with it, might know my pain. I started browsing google and found that many people are asking same question and I found vmware community thread too was unanswered. After a extensive search and hard work, I could do it. Below is the code guys:


==========
Function Get-CIVMInventory {
$ppp = Read-host "enter the vm name"
$vms = Get-CIVM -Name $ppp
Foreach ($vm in $vms){
$ee = $vm.ExtensionData.GetVirtualHardwareSection().item | Where {$_.Description -like “Hard Disk”}
$ff = $ee.VirtualQuantity.value
foreach ($f in $ff){
Get-CIVM -Name $vm | Select @{N="Name";E={@($vm.Name)}}, CPUCount, MemoryGB, @{N="DiskSize(GB)";E={@($f/1024/1024/1024)}}
}
}
Get-CIVMInventory
}
=========

Above is for single VM, you might be having requests like you want to fetch for all VMs in your cloud director environment then please use below code:

========
$vms = Get-CIVM
$output = Foreach ($vm in $vms){
$ee = $vm.ExtensionData.GetVirtualHardwareSection().item | Where {$_.Description -like “Hard Disk”}
$ff = $ee.VirtualQuantity.value
foreach ($f in $ff){
Get-CIVM -Name $vm | Select @{N="Name";E={@($vm.Name)}}, CPUCount, MemoryGB, @{N="DiskSize(GB)";E={@($f/1024/1024/1024)}}
}
}
$output | Import-Csv C:\Temp\fra_vcd_vm_invent.csv
=========

It is tested and working for me. Give it a try and let me know if it doesn't work for you😊 I answered the VMware thread too.

Create/Update Metadata of Cloud Director Objects (KB#00096)

Overview

You can easily create/modify metadata for any Cloud Director object using GUI but it is a challenge if there is no option to modify or create metadata from GUI and according to your requirement. This is the issue with VMware Cloud Director 10.1.2. This version not only doesn't have this option from GUI but also existing metadata entries are not editable from GUI. This is the reason, I had to find some other way out.

As a solution, either you need to do it from APIs or from PowerShell. I will demonstrate both ways here. You may choose which sounds easy for you. Same method will be followed in future versions as well if VMware haven't plan to change the Metadata Format as they did for older versions of vCD. This is the reason that most of the articles on web don't give you exact information as those are outdated now.

Solution 1: From PowerCLI

=======
$vcd = Read-Host "Enter fqdn/IP here"
Connect-CIServer $vcd
#In below line, you need to specifiy the object whether it is VM, vAPP, Org or OrgvDC. Command will change accordingly. For example, for VM, you will use Get-CIVM. For Org, you will use Get-Org and so on...
$vapp = Get-CIVApp -Name testconsole
$metadata = $vapp.extensiondata.GetMetadata()
$metadata.MetadataEntry = New-Object VMware.VimAutomation.Cloud.Views.MetadataEntry
$metadata.MetadataEntry[0].Key = "vCnotes"
$metadata.MetadataEntry[0].TypedValue = New-Object VMware.VimAutomation.Cloud.Views.MetadataStringValue
$metadata.MetadataEntry[0].TypedValue.Value = "test"
$metadata.MetadataEntry[0].Domain = New-Object VMware.VimAutomation.Cloud.Views.MetadataDomainTag
$metadata.MetadataEntry[0].Domain.Visibility = "readonly"
$metadata.MetadataEntry[0].Domain.Value = "SYSTEM" $vapp.ExtensionData.CreateMetadata($metadata)
=======

Copy the above lines and change it according to your environment and run it. You need to modify the yellow highlighted value according to your environment.

Note that : 

1. To update existing entry, you can run above command by keeping key name same. It will update the existing keys.
2. If you are Tenant Administrator then you won't be able to see the private visible keys but still can modify using this commands.

Solution 2: From any API Tool : It is lengthy and complex one.

I will take sometime to update this post. Meanwhile, enjoy using Solution 1 ;)

For any doubt/error in powercli, feel free to comment.

Reset root account password without restarting esxi host (KB#00095)

Overiew

Generally, if you forgot root account password then you reboot the appliance and then enter grub menu by pressing key 'e' during the boot and then you put the keywords rw init=/bin/bash in the last of first line then you press F10 to save and continue for temp login with root account.

But what if you can't reboot esxi host and have to reset the root password as well. In other words, what if you want to reset root account password without rebooting it anytime.

Sounds interesting, isn't it?

Solution

I will update this post by next weekend. Stay tuned guys!!



Thank you

Check Free IP addresses in Network Pool in Cloud Director (KB#00094)

Overview

This is actually not a big deal when you can check this in cloud director GUI and you would see number of articles to explain this from GUI. But, if there is no option to check the IP address allocation from GUI then it would be far difficult for you if you only work with GUI. My customer reported me that he can't see the assigned or allocated IP addresses and had to ping each and every IP address to check whether its allocated or not. It is actually a headache.

Issue

Two issues were there

1. Checking IP allocation for external network is restricted to System Administrator. My customer was tenant administrator and wanted to check this which is not possible due to product design.
2. If you are System Administrator, you might not get any option to extract the list of IP addresses to store with you or use in excel for quick filter and highlight of free IP addresses etc..
3. Checked on VMware Cloud Director versin 10.1.2

Solution 

Credit goes to powershell here. I could create below script which easily could give you the results. Just copy below output and then paste it into powershell screen from where you can access your cloud director environment.

#

$vcd = Read-Host "Enter vCD url to connect"

Connect-CIServer $vcd

$network = Read-Host "Enter the name of network here"

$ExtNet = Get-ExternalNetwork -Name $network

$ExtNet.ExtensionData.Configuration.IpScopes.IpScope.allocatedipaddresses.IpAddress | Out-GridView

#

Below is sample out-


let's verify the sample output from GUI-


Let me know if you have any thought around it. 

Thank you.

 

Check and Modify Security Protocols in VMware Appliances(KB#00093)

Overview

It is just to centralize the configuration method of security protocols like TLS or SSL in all VMware appliances. VMware did a good job in documenting this process and I am just putting all in one view.  I will add the product in the list if I feel that it needs to be here.

For vCenter Server -

To check :

1. Connect with vCenter server appliance with SSH with its management IP address
2. Run below command#

#cd /usr/lib/vmware-vSphereTlsReconfigurator/VcTlsReconfigurator/
#./reconfigureVC scan

Below is the sample output : TLS Version is TLSv1.2. It means that TLSv1.2 is enabled and any other version is disabled. 


To update in vCenter version 6.5 and 6.7:

Managing TLS protocol configuration for vSphere 6.5/6.7 (2147469) (vmware.com)

To update in vCenter version 7.x

Enable or Disable TLS Versions on vCenter Server Systems (vmware.com)


For vCD or VMware Cloud Director

To check :

1. Login vCD or cloud director appliance
2. Run below command#

#cd /opt/vmware/vcloud-director/bin
#./cell-management-tool ssl -protocols -l

Below is the sample output :


To update:

Note : It need downtime and need to update on each cell individually. So please shutdown the vCD services first before doing this. Follow this article for this vCD | Upgrade from version 9.5 to 10.1.2 ~ My vCloud Notes (vcnotes.in)

#./cell-management-tool ssl-protocols -d SSLv3,SSLv2Hello

Follow this VMware article to update 


For vRealize Automation 

To check and update, just follow this article

For vRealize Log Insight

Good article by vendor.

For NSX for vSphere (NSX-V)

Please see this documentation.

For ESXI Host

Worth to check this page here.

For vROPS

Please click here to check this.

Kubernetes | Command Cheat Sheet(KB#00092)

Overview

Well yes, you are thinking right that I am learning Kubernetes so wanted to share some useful insights and will continue to share stuff on this. Below are some commands for daily operations while working with Kubernetes. I will keep on adding stuff here.

 

Command to Command
Check Minikube version $minikube version
Start Minikube cluster $minikube start
Check if Kubectl is installed $kubectl version
Check kubectl cluster info $kubectl cluster-info
Check kubectl node info $kubectl get node
, ,

PS | How to get HA restarted VM's Org and OrgvDC info with VM Name

Overview

You will see many blogs giving solution for fetching the VM names which are restarted by HA in event of esxi host failures using Get-VIEvent powercli command. But the extracted VM Name too is not in well format to use as it is. You have to use excel and text to column and then extract the VM Name etc. For me, I have vCD also so at the time of ESXi host failures and HA events, I not only need to fetch the VM Name but also Org and OrgvDC info to share it with my customer. It becomes more lengthy for me and I need to make it quick. So it is extended solution for such kind of scenario. Hope you will find it useful.

Let's see how I could do it using powershell.

Script

#Start here

Write-Host "This script will help you out to have VM name restarted by HA due to esxi host failuers" -ForegroundColor Yellow

Function Get-HAVM{
$Date=Get-Date
$HAVMrestartold=1
$raw = Get-VIEvent  -maxsamples 10000000 -Start ($Date).AddDays(-$HAVMrestartold) -type warning | Where {$_.FullFormattedMessage -match "restarted"} |select CreatedTime,FullFormattedMessage |sort CreatedTime -Descending
$raw.vm.name
Remove-Item -Path C:\Temp\vmlist.csv
$raw.vm.name | Out-File C:\Temp\vmlist.csv
}
Get-HAVM
$allvms = Get-Content -Path C:\Temp\vmlist.csv
$vms = Get-VM -Name $allvms
$myView = @()
foreach ($vm in $vms){
$Report = [PSCustomObject] @{
 VM_Name = $vm.Name
 Org_Name = $vm.Folder.Parent.Parent.Name
 OrgvDC_Name = $vm.Folder.Parent.Name
}
$MyView += $Report
}
$myView | Out-GridView

#End here

Any doubt? Comment box is yours :)

Let's give it more power

If you have smtp configured in your environment then simply you can mail it from the same script using Send-MailMessage command but for that you might have to do some tweak in above script. 

Hint is, You have to save final report. Change in the last line of above script like

$myView | Out-File C:\Temp\vmsrestartedbyHA.csv

then use below command

Send-MailMessage -From 'gautam.johar@vcnotes.in' -To 'my.reader@home.com', 'myreader2@home.com' -Subject 'HA Event is triggered and VM list is attached' -Body "Please find the attachment" -Attachments C:\Temp\vmsrestartedbyHA.csv -Priority High -DeliveryNotificationOption OnSuccess, OnFailure -SmtpServer 'smtp.vcnotes.in'

Change wherever applicable.

If you are good enough in PowerShell then you can have many ways to enhance the ideas. For me this is basic script which is working fine for me.

Side Note

I created this script to run perfectly in PowerShell ISE so run in that please or if you have any error in running it in simple powershell cli terminal then you might need to fix the visible errors.

Good Luck!









vRA | How to manually assign the unassigned shards

 Overview

In one of the vRA upgrade from 7.4 to 7.6, I faced this issue post upgrade. All went well except below error on VAMI page of both vRA appliances (as I had two nodes). If you have more and if you stuck with this error then you will see this error on all the nodes. 

================

Elasticsearch validation failed:

status: red
number_of_nodes: 2
unassigned_shards: 4
number_of_pending_tasks: 0
number_of_in_flight_fetch: 0
timed_out: False
active_primary_shards: 113
cluster_name: horizon
relocating_shards: 0
active_shards: 226
initializing_shards: 0
number_of_data_nodes: 2
delayed_unassigned_shards: 0

=================

If you read above error then you will understand that there are 4 unassigned shards which were not automatically assigned to any of the available vra node. 

Cause 

It happens if and when DB sync between primary and slave vra nodes are not good. When primary node was not having updated data but slave nodes were running with some additional data. Total break between Master and Replica DB replication. In my case also before upgrading there were many issues with DB.

If you recover the cluster state even then these shards might not assign automatically and give above alert. Now you have to assign the unassigned shards manually. Let's see the process.

Resolution

1. Check the state from Master node CLI with below command

#curl http://localhost:9200/_cluster/health?pretty=true

You will have this error in output

{
  "cluster_name" : "horizon",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 113,
  "active_shards" : 226,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 4,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}

2. Check the cluster information with below command

#curl -s -XGET http://localhost:9200/_cat/nodes

You will have similar output

master.mylab.local 172.25.3.199 8   d * Dreadknight
replica.mylab.local 172.25.3.200 8   d m Masque

3. Search for unassigned shards

#curl -XGET 'http://localhost:9200/_cat/shards' | grep UNAS

You will see similar output as below

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15870  100 15870    0     0   484k      0 --:--:-- --:--:-- --:--:--  484k
v3_2020-10-02  4 p UNASSIGNED
v3_2020-10-02  4 r UNASSIGNED
v3_2020-10-02  2 p UNASSIGNED
v3_2020-10-02  2 r UNASSIGNED

4. Re-assigned these using the following command, where index = v3_2020-10-02, and shards to be re-assigned are '2' and '4', while running on the master node - 'Dreadknight. Change your command according to your environment. for example, value after index will be changed, value after shard, after node will  be changed. Other infos will be same.


curl -XPOST 'localhost:9200/_cluster/reroute' -d '{"commands":[{"allocate":{"index":"v3_2020-10-02","shard":2,"node":"Dreadknight","allow_primary":"true"}}]}'

and

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{"commands":[{"allocate":{"index":"v3_2020-10-02","shard":4,"node":"Dreadknight","allow_primary":"true"}}]}'

That's it. Now shards have been assigned or allocated automatically manually.

Log out all the nodes VAMI and log in back. You will not see any such error.