Good to Know about VMware Cloud Director (KB#00103)

Hello Folks,

I got feedback from few fellows that I write up about incidents, issues, automation, short and efficient ways to do any task but all these stuff is for those who already works on VCD or on any platform being it NSX, vCenter or powershell. 

Till the date, I haven't published anything like basics of something because I feel that Internet is already full of such stuff. But still, because I received this feedback so I thought to write up for the guys who are fresher in virtualization but in my own way ;)

I have planned to mention "Good to know" things about any product I work on being it VCD, NSX, VRLI, VROPS, vCenter, Usage Meter, vRNI or powershell.

Today, I picked VCD. This is for those who just started learning VCD or want to start learning VCD but I believe few points will help those too who are working with VCD ;)

See, in current circumstances, I think, not only me is short of time but also all the readers too. I believe the information which is being shared should be crisp and to the point. I don't know whether the slide I made to describe the basics of VCD is good enough or not but you can tell me if you think that I should elaborate any other component. I am leaving this to see if it raise any question and helpful anyone out there ;)

You may fill the comment box with your thoughts and questions :)


, , ,

Move multiple disks from one vm to another vm (KB#00102)

Hello Folks,

Adding another solution for one unique challenge I got from my customer. 

Challenge - Customer used to move multiple vmdk files from one vm to another vm in vcenter server. Actual challenge is the time in hand for this operation and number of vmdks to remove are always around 15, 18 or 20 per vm. This activity used to have multiple VMs. This become more challenging at the time of roll back once we have moved all disks from source to target vm. That much of vmdks to move and in short period of time can lead to confusion and human error too.

Solution - I created below powercli script which drastically decrease the time require in such activity and roll back too become very easy with it. Chance of human error is 99% lesser now.

Below is the code -

Function move-vmdk{
clear
$time
= Get-Date
#It will record the timestamp before starting this activity

Write-Host
"Current time is $time"
$sourcevm
= Read-Host "Enter the source VM Name "
Write-Host
"Total number of disk on source vm is" (Get-HardDisk -VM $sourcevm).count -ForegroundColor yellow
$fd
= Get-HardDisk -VM $sourcevm
$TargetVM
= Read-Host "Enter the target VM Name "
Write-Host
"Total number of disk on target vm is" (Get-HardDisk -VM $TargetVM).count -ForegroundColor Green
Write-Host
"This script works with file name of disk to move so please mention the filename of each disk to move in notepad and save it in C:\temp with disklock.txt name"
#save the disk’s location for all the targeted disks to path C:\Temp in notepad file named diskloc.txt
$diskfile
= Get-Content -Path C:\Temp\diskloc.txt
$confirm
= Read-Host -Prompt "Are you sure you want to process for this disk migration (Y/N) "
If
($confirm -eq "y") {
Foreach
($loc in $diskfile){
$trgVM
= Get-VM -Name $TargetVM
$disk
=get-vm -name $SourceVM | Get-HardDisk | Where-Object {($_.Filename -eq $loc)}
Remove-HardDisk
$disk -Confirm:$false
New-HardDisk
-VM $trgVM -DiskPath $loc
Write-host
(" ")
}
}
$time
= Get-Date
#It will record the timestamp after completion of this activity
Write-host
"Operation has been completed succesfully"
Write-Host
"Current time is $time"
Write-Host
"Total number of disk on target vm is" (Get-HardDisk -VM $TargetVM).count -ForegroundColor Green
Write-Host
"Total number of disk on source vm is" (Get-HardDisk -VM $sourcevm).count -ForegroundColor yellow
}

Below is the sample output

Note that time taken is just 32 seconds to move four VMDK files. Just to add, size of vmdk doesn't change the time for this migration.

Hope you will find it useful if you too have such requirement! Any doubt or thought, plesae feel free to comment.

Cheers!

 

, ,

Delete an object in VMware Usage Meter (KB#00101)

Hello Guys,

If you ever got stuck with any issue with VMware Usage Meter then you might know that there are not enough troubleshooting KBs from VMware or enough troubleshooting articles on it. I too got stuck in one issue and couldn't find any article which could help me. I raised a SR too but it was pending since weeks without any support from VMware Support team. 

I continued to work on it and finally could resolve it by my own hence thought to create a KB here on vcnotes.in.

Issue : Duplicate entry found of same vcenter server through two different vROPS instances. Now need to delete vROPS server from usage meter from product page (Usage Meter Version 4.4). In my case, I didn't want three vROPS entry here and want to delete one which is bringing in the duplicate vcenter entry. This KB will help you to understand how to delete any product from product list in usage meter.

Roadblock : There is no option to delete the vROPS product in version 4.4's GUI. Please note that this version comes with HTML5 interface. In flex interface you could do it but in HTML5, there is no such option for vROPS. Reason is, Usage meter pick vROPS from vCenter MOB extension. I was having below kind of state. I had to change the snippet with vcenter name.


You can see that in vRealize operation page there is a vcenter named vcenter2.vcnotes.in is showing against two different vrops nodes that are 172.17.1.238 and 172.17.1.239. 

This is a mess and we need to clean this up. I want to delete the entry against 172.17.1.239 but there is no option to delete. 

Solution : Solution is to use API.

1. Connect Usage Meter in API tool as explained below

Query -

Header -

Credentials -


Once you entered the filed as above, hit the send button, it will give you output like below

Note that in above snippet you have message "202 Accepted", means you are logged in now. Also, you have sessionid in Body. It will be used as a header for further work. Let's check further.

Now, the requirement is to delete the product. Use below delete query-

DELETE https://172.25.2.198/api/v1/product?id=8&productType=VROPS&forget=true

Where 

api call is -  https://172.25.2.198/api/v1/product

id - this is the prdouct ID, well visible in the HTML5 web page when you login in usage meter

product tye - it is to mention that the product you are trying to delete is vcenter, vcd, vrops or something else

forget - it is to delete the history and for full cleanup. You should always use it.

Headers will be - 

Accept : Application/json (used earlier)
session id : extracted after login in above steps

Once you have given all the required parameters, hit the send button and targeted product will be delete from VMware Usage meter UI page.

Now when you read above stuff, you should be able to understand this VMware documentation for more information on API operations with VMware usage Meter. 

Thank you.

, , ,

unlock NSX local accounts for API Operations (KB#00100)

Hello Guys,

This issue is mysterious issue where NSX's local account gets locked out not for login into NSX GUI but for API operations. Creating this blog because I couldn't find the solution on web. VMware article was there explaining root cause but solution was not there too.

Symptoms

1) Any application or product which works with or uses NSX for network services, will not be able to use NSX to create, remove, update or deletion operations. However, running services will be running fine.
2) You will be able to login NSX portal but will not be able to use APIs. Not even through any other API tool like postman. You might get error like below


3) I uses VMware Cloud Director which uses NSX for networking and security service. On VMware Cloud Director, I could see error like below when you open any Edge Gateway or do any configuration change. Below are the VCD debug logs-

2021-07-12 16:18:52,955 | DEBUG    | task-service-activity-pool-128 | NetworkSecurityErrorHandler    | Response error: <!doctype html><html lang="en"><head><title>HTTP Status 403 – Forbidden</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 403 – Forbidden</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> This IP address has been blocked temporarily.</p><p><b>Description</b> The server understood the request but refuses to authorize it.</p><hr class="line" /></body></html> | requestId=18cbd3bc-2eaf-42af-b11e-4bce957bff9e,request=POST https://testvcd.com/api/admin/edgeGateway/509ed27c-724d-490b-b0c7-e25d19523017/action/redeploy,requestTime=1626106731682,remoteAddress=172.25.1.21:60592,userAgent=Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (...,accept=application/*+json;version 34.0 vcd=632554b9-0779-4ddc-8b62-be08d1c167f6,task=82c8a4fc-35ba-490a-a235-b4a65a25cace activity=(com.vmware.vcloud.backendbase.management.system.TaskActivity,urn:uuid:82c8a4fc-35ba-490a-a235-b4a65a25cace)

4) I checked the NSX Manager logs and could see multiple events like below in /usr/appmgmt-webserver/logs/localhost_access_log.2021-07-13.txt. 

Where localhost_access_log.2021-07-13.txt. This filename will change as per current date in logs.

172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/2.0/services/ipset/ipset-29 HTTP/1.1" "https-jsse-nio-443-exec-2292" 403 649 2234
172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/2.0/services/ipset/ipset-106 HTTP/1.1" "https-jsse-nio-443-exec-2334" 403 649 2241
172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/versions HTTP/1.1" "https-jsse-nio-443-exec-2366" 403 649 2243
172.25.1.239 - - [13/Jul/2021:23:21:38 +0200] "GET /api/versions HTTP/1.1" "https-jsse-nio-443-exec-2325" 403 649 2224

Yellow -  This is IP address which is trying to make a connection with NSX Manager which is failing

Red - This is error code 403 which means access forbidden

Above logs means, IP address (172.25.1.239) is trying connect with NSX Manager but NSX Manager is not allowing access to connect.

Root Cause

When any application (Monitoring or non-monitoring) is configured with incorrect username and password to connect with NSX Manager then as a security feature NSX Manager blacklist that username after certain invalid authentication attempts (by default it is 10 but can be modified). Any user too can do this because someone might be trying to guess the password! be aware of that too ;)

Solution

Before correcting it, you must identify which application or user is doing this. Because if post correction, if still invalid attempts are coming then it will again block the user account. How will you identify that's your headache but you can reach me out :)

Once identified then follow below steps

1. Login NSX Manager CLI with admin

2. enter into enable mode with commad

#enable

3. Enter into engineering mode with command↓

#st en

Press Y when asked and use password :  IAmOnThePhoneWithTechSupport

4. then edit /home/secureall/secureall/sem/WEB-INF/spring/vsmconfig.properties and look for values as below in this file

#Denotes whether blacklisting is enabled
blacklist.enabed=true

You will find it true. 

5. To resolve this issue, first you need to make it false that is from 

blacklist.enabed=true 


blacklist.enabed=false

and then save the file

6. Reboot NSX Manager now with below command

#reboot

7. Once rebooted, you will find that NSX account which you used for integration with other application and which was locked out is now responding over API requests and will give you output as below now.

8. In step 5, you have changed the blacklisting configuration which can be a security vulnerability. As soon as this issue fix, then you need to revert it back to true so that future invalid attempts gets blocked.

9. Post making it true, reboot NSX Manager. Please note that everytime you change it, you have to reboot NSX manager to make it effective.

Don't forget to comment if it was useful for you. Cheers!

,

Kill the ongoing\stuck process in VMware Cloud Director(KB#00099)

Hello guys,

You will find today's post more interesting because resolution of this issue is only documented as vcloud service restart. There are many posts on the web which says that in order to clean the stuck job in VMware Cloud Director GUI, you need to restart vCD services.

Today, I faced the same issue but I decided to find some other solution. For single VM, it is never a good idea to restart entire vcloud director services. isn't it?

Issue : Task is running since long in Cloud Director GUI and not timing out even after more than 6 hours

First, let me show you what exactly a long running job or stuck job means. Below is the task detail. In vCloud Director GUI, you will also see that particular tasks is running since long. Hope it is clear


Observation : Above job was running since last 6-7 hours. Earlier, I noticed that such job used to timed-out in around 4 hours which is default time out value for any vcd tasks but this one was stubborn job. Due to this, I couldn't perform any operation on VM other than power off and power on. From vCenter, there was no such issues so this was clearly a VMware Cloud Director issue

Solution :

1. Login primary cell with root account and then login DB (if embedded. In case of external DB then login directly DB server).

2. Run below command to see the stuck task

select * from organization where name = 'org_name';

It will give you org_id. note that down.

3. Now run below command by using org_id you got in 2nd command.

select * from task where org_id = 'org_id';

It will show you the task name, job id (as in above snippet), which will confirm that this is the right task to cancel. Generally you will see this stuck task only but if you see many then search with job id rather than org_id. If you don't understood this line then, comment box is yours.

4.  Now execute below command to delete this task from DB. Take backup first please!

Delete from task where org_id = 'org_id';

Stuck task has been deleted from DB now. Doing all above operations doesn't require any downtime so go ahead without any fear!

Cheers!