Accessing QEMU CLI on CentOS 7

By default, the QEMU CLI module is not “in-path” on CentOS 7.

This is because RedHat (by proxy) does not want you calling QEMU CLI directly, in favor of some abstraction layer (like libvirt).

That said, there are situations where you might want to call it directly. If so, it is located in /usr/libexec.

One quick example would be checking all supported processors and cpu flags for the specific version of qemu-kvm.

[jon@centos7 libexec]$ pwd
/usr/libexec
[jon@centos7 libexec]$ ./qemu-kvm -cpu help
x86 qemu64 QEMU Virtual CPU version 1.5.3
x86 phenom AMD Phenom(tm) 9550 Quad-Core Processor
x86 core2duo Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
x86 kvm64 Common KVM processor
x86 qemu32 QEMU Virtual CPU version 1.5.3
x86 kvm32 Common 32-bit KVM processor
x86 coreduo Genuine Intel(R) CPU T2600 @ 2.16GHz
x86 486
x86 pentium
x86 pentium2
x86 pentium3
x86 athlon QEMU Virtual CPU version 1.5.3
x86 n270 Intel(R) Atom(TM) CPU N270 @ 1.60GHz
x86 cpu64-rhel6 QEMU Virtual CPU version (cpu64-rhel6)
x86 Conroe Intel Celeron_4x0 (Conroe/Merom Class Core 2)
x86 Penryn Intel Core 2 Duo P9xxx (Penryn Class Core 2)
x86 Nehalem Intel Core i7 9xx (Nehalem Class Core i7)
x86 Westmere Westmere E56xx/L56xx/X56xx (Nehalem-C)
x86 SandyBridge Intel Xeon E312xx (Sandy Bridge)
x86 Haswell Intel Core Processor (Haswell)
x86 Opteron_G1 AMD Opteron 240 (Gen 1 Class Opteron)
x86 Opteron_G2 AMD Opteron 22xx (Gen 2 Class Opteron)
x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron)
x86 Opteron_G4 AMD Opteron 62xx class CPU
x86 Opteron_G5 AMD Opteron 63xx class CPU
x86 host KVM processor with all supported host features (only available in KVM mode)

Recognized CPUID flags:
pbe ia64 tm ht ss sse2 sse fxsr mmx acpi ds clflush pn pse36 pat cmov mca pge mtrr sep apic cx8 mce pae msr tsc pse de vme fpu
hypervisor rdrand f16c avx osxsave xsave aes tsc-deadline popcnt movbe x2apic sse4.2|sse4_2 sse4.1|sse4_1 dca pcid pdcm xtpr cx16 fma cid ssse3 tm2 est smx vmx ds_cpl monitor dtes64 pclmulqdq|pclmuldq pni|sse3
smap adx rdseed rtm invpcid erms bmi2 smep avx2 hle bmi1 fsgsbase
3dnow 3dnowext lm|i64 rdtscp pdpe1gb fxsr_opt|ffxsr mmxext nx|xd syscall
perfctr_nb perfctr_core topoext tbm nodeid_msr tce fma4 lwp wdt skinit xop ibs osvw 3dnowprefetch misalignsse sse4a abm cr8legacy extapic svm cmp_legacy lahf_lm
pmm-en pmm phe-en phe ace2-en ace2 xcrypt-en xcrypt xstore-en xstore
kvm_pv_unhalt kvm_pv_eoi kvm_steal_time kvm_asyncpf kvmclock kvm_mmu kvm_nopiodelay kvmclock
pfthreshold pause_filter decodeassists flushbyasid vmcb_clean tsc_scale nrip_save svm_lock lbrv npt

In-Kernel or Not: Hyper-Converged Storage Services Hokey Pokey

To Kernel or Not To Kernel…are we putting services in Kernel, or leaving them out…This has been a hot topic in the Hyper-Converged storage market.

I’d like to take a few minutes to relay some great thoughts that Nutanix CEO Dheeraj Pandey posted on another blog’s comment section a while back. Reading through it now, I firmly believe it needs it’s own proper post. This content is from Dheeraj, with small edits for blog readability.

The whole management argument of integration is being broken apart. Had that been true, Oracle apps would have continued to rule, and people would never have given Salesforce, Workday, ServiceNow, and others a chance. And this has been true for decades. Oracle won the DB war against IBM, even though IBM was a tightly integrated stack, top-to-bottom. After a certain point, even consumers started telling Facebook that their kitchen-sink app is not working, which is why FB started breaking apart that experience into something cleaner, usable, and user-experience-driven.

These are the biggest advantages of running above the kernel:
Fault isolation: If storage has a bug, it won’t take compute down with it. If you want to quickly upgrade storage, you don’t have to move VMs around. Converging compute and storage should not create a toxic blob of infrastructure; isolation is critical, even when sharing hardware. That is what made virtualization and ESX such a beautiful paradigm.

Pace of Innovation: User-level code for storage has ruled for the last 2 decades for exactly this reason. It’s more maintainable, its more debuggable, and its faster-paced. Bugs don’t bring entire machines down. Exact reason why GFS, HDFS, OneFS, Oracle RDBMS, MySQL, and so on are built in user space. Moore’s Law has made user-kernel transitions cheap. Zero-copy buffers, epoll, and O_DIRECT IO, etc. makes user-kernel transitions seamless. Similarly, virtual switching and VT-x technologies in hypervisors make hypervisor-VM transitions seamless.

Extensibility and Ecosystem Integration: User-space code makes it more extensible and lends itself to a pluggable architecture. Imagine connecting to AWS S3, Azure, compression library, security key management code, etc. from the kernel. The ecosystem in user-space thrives, and storage should not lag behind.

Rolling Upgrades: Compute doesn’t blink when storage is undergoing a planned downtime.

Migration complexity (backward compatibility): It is extremely difficult to build next-generation distributed systems without using protobufs and HTTP for self-describing data format and RPC services. Imagine migrating 1PB of data if your extents are not self-describing. Imagine upgrading a 64-node cluster if your RPC services are not self-describing. Porting protobufs and HTTP in kernel is a nightmare, given the glibc and other user library dependencies.

Performance Isolation: Converging compute and storage doesn’t mean storage should run amuk with resources. Administrators must be able to bound the CPU, memory, and network resources given to storage. Without a sandbox abstraction, in-kernel code is a toxic blob. Users should be able to grow and shrink storage resources, keeping the rest of application and datacenter needs in mind. Performance profiles of storage could be very different even in a hyperconverged architecture because of application nuances, flash-heavy nodes, storage-heavy nodes, GPU-heavy, and so on.

Security Isolation: The trusted computing base of the hypervisor must be kept lean and mean. Heartbleed and ShellShock are the veritable tips of the iceberg. Kernels have to be trusted, not bloated. See T. Garfinkel, B. Pfaff, J. Chow, M., Rosenblum, and D. Boneh, “Terra: A virtual machine-based platform for trusted computing,” in Proceedings of the 19th ACM Symposium on Operating Systems Principles, pp. 193–206, 2003. Also see P. England, B. Lampson, J. Manferdelli, M. Peinado, B. Willman, “A Trusted Open Platform,” IEEE Computer, pp. 55–62, July 2003.

Storage is just a freakin’ app on the server. If we can run databases and ERP systems in a VM, there’s no reason why storage shouldn’t. And if we’re arguing for running storage inside the kernel, let’s port Oracle and SAP to run inside the hypervisor!

In the end, we’ve to make storage an intelligent service in the datacenter. For too long, it has been a byte-shuttler between the network and the disk. If it needs to be an active system, it needs {fault|performance|security} isolation, speed of innovation, and ecosystem integration.

One more thing: If it can run in a Linux VSA, it will run as a container in Docker as well. It’s future-proof.

Automatically Email Nutanix NCC Results

NCC stands for Nutanix Cluster Check, and it is the core framework behind the Health construct in Nutanix OS.

This framework is separate from NOS, in that it can be updated async from NOS. Nutanix Engineering releases NCC updates approximately every 4-6 weeks, but can be quicker depending on the criticality of new health checks as they are developed.

There are plenty of blogs out there talking about NCC’s core functionality, and how to most effectively utilize its power, but I wanted to take some time and highlight one new feature, Automatic Email Digests for NCC Results.

Once enabled, NCC will run a full health check of the system according to the value you set, which can be anywhere between 4-24 hours. This enables even more proactive and aggressive health reporting, outside of all of the standard enterprise grade alerting we have today (Pulse Auto Support Call Home, SNMPv3, Syslog, NOS Alerts, etc).

NCC Email Alerts are configured once with the following command:

nutanix@cvm$ ncc --set_email_frequency=num_hrs
nutanix@cvm$ ncc --show_email_config

Once configured successfully, the email will contain a summary of the NCC output and includes details
for any checks that return a FAIL status. To see details for all checks, examine any NCC log files
located in /home/nutanix/data/logs.

These alerts are emailed to the users configured in “Alert Email Configuration”, so make sure your sysadmin DL (or users) are configured (good idea either way).

NOTE: NCC 1.3.1 results emailed to Nutanix support do not automatically create support cases, that is the job for traditional break/fix Alerts and Pulse.

Investigating Nutanix Pulse ASUP’s Caught in Email Filter

Nutanix cluster’s have an AutoSupport mechanism, called Pulse, that can be configured to sent back configuration/health/coredump information on a daily basis. This is used by Nutanix Support SRE’s to do proactive support, analyze common configurations, and so on.

This data can either be sent directly to Nutanix over the web, or be relayed through a customer controlled SMTP server. Many customers have message/attachment rules/restrictions, and sometimes Pulse ASUP’s can get snagged, especially with large clusters, and especially when the “ALL” setting on Pulse is configured, which grabs a very high amount of coredump and diagnostic data.

This data is zipped into a .gz and attached to an email, so even if it is chalked full of data, it is usually pretty small, maybe 6-10 MB daily.I recently ran into an issue where the customer’s email system was configured to drop messages with attachments greater than 100 MB, and also attachments that, when uncompressed, were greater than 100MB.

This filter rule, combined with the ALL level was causing ASUP’s to not be delivered.

To diagnose this, you can see the email data that the cluster is doing by checking out the /home/nutanix/data/email/ folder.

Since the process that generates the emails is a shared service, it is possible for more than one CVM to generate emails over the life of a cluster.

To track the most recent data down, try this command allssh ‘ls -lah /home/nutanix/data/email’ —- This will give you an output like this one from my lab.

Below, you can see that 10.1.222.62 is doing the most recent work, and there are both “.sent” files, where are JSON formatted files that contain all of the email info, except for the attachments.

In the attachments directory, you will find the .gz files that are being generated and, in this case, flagged.

This discovery method can be used to pull down files that were never delivered, and validate their contents to see why an email filter might be tripping them up.

nutanix@NTNX-14SM15040014-A-CVM:10.1.222.60:~$ allssh 'ls -lah /home/nutanix/data/email'
Executing ls -lah /home/nutanix/data/email on the cluster
================== 10.1.222.60 =================
total 48K
drwx------.  3 nutanix nutanix  20K Jan  4 01:15 .
drwxr-xr-x. 21 nutanix nutanix 4.0K Oct 29 22:54 ..
drwx------.  2 nutanix nutanix  20K Jan  4 01:15 attachments
================== 10.1.222.61 =================
total 32K
drwx------.  3 nutanix nutanix  12K Jan  3 23:15 .
drwxr-xr-x. 21 nutanix nutanix 4.0K Oct 29 22:50 ..
drwx------.  2 nutanix nutanix  12K Jan  3 23:15 attachments
-rwx------.  1 nutanix nutanix  125 Nov 24 02:02 autosupport.1416816003.174225
================== 10.1.222.62 =================
total 36K
drwx------.  3 nutanix nutanix  12K Jan  5 00:15 .
drwxr-xr-x. 21 nutanix nutanix 4.0K Oct 29 22:56 ..
-rwx------.  1 nutanix nutanix  376 Jan  4 00:03 1420351431.354287.sent
-rwx------.  1 nutanix nutanix  376 Jan  5 00:03 1420437834.801072.sent
drwx------.  2 nutanix nutanix  12K Jan  5 00:15 attachments

Nutanix Metro Availability configuration and failover

Nutanix Foundation 2.0 Released

Nutanix Foundation, for those who don’t know, is the tool used to perform Multi-Hypervisor imaging in the field. All Nutanix nodes/blocks are shipped from the factor with KVM pre-loaded, and Foundation is used to optionally change over to VMware ESXi or Hyper-V (or perform a re-image on KVM).

There are some good articles about previous versions of Foundation that cover the basics well.

http://myvirtualcloud.net/?p=6389

http://vmwaremine.com/2014/08/04/nutanix-basics-nutanix-foundation-part-1

http://vmwaremine.com/2014/08/13/nutanix-basics-deploy-block-using-nutanix-foundation

Anyhow, Foundation 2.0 is new and improved, and includes the following enhancements:
  • Redesigned user interface with multiple screens for easier configuration.
  • Support for models supplied by Dell.
  • Support for creating multiple clusters after imaging the nodes.
  • Support for running performance diagnostics and NCC health tests after creating a cluster.
  • Support for reusing an existing Phoenix ISO when re-imaging with the same ISO image.
  • Numerous bug fixes to make the process more robust and refined.
  • Enhanced Hyper-V support:
    • Image up to 20 nodes simultaneously (now on par with ESXi and KVM).
    • Five minute faster imaging time per node.
    • Support for multiple SKU types (free, standard, and datacenter).
    • Note: Foundation of NX-7110 with Hyper-V is not supported in this release (this is right around the corner).

Upgrading Prism Central from 4.0.1 to Newer Code

This article details the steps to upgrade a Nutanix Prism Central 4.0.1 instance to 4.0.1.1 (or future releases). There are a couple of different methods that can be used, all are covered below.

For those not familiar with Prism Central, it is the Nutanix centralized management platform, and is targeted at environments with 2 or more Nutanix 4.0+ clusters. Prism Central is literally the same Prism HTML5 GUI our customers have come to love, but with the ability to manage multiple clusters, at once in a scale-out fashion.

Prism Central gives administrators and operators the ability to see aggregated stats/alerts/info for all connected clusters, and it gives the ability “hop into” any cluster, using seamless pass through authentication. Prism Central was introduced in 4.0.1, and can manage any cluster running NOS 4.0+. Note that Pro or Ultimate Edition licensing is required on a cluster level, but there is no license file that gets applied to Prism Central itself.

NOTE: This article assumes Prism Central 4.0.1 is already deployed, and does not cover the initial install of the OVA. Installation is covered in detail by Baz Raayman’s Blog, as well as in the portal.nutanix.com documentation library.

NOTE: This blog is for informational and educational purposes only. While I have validated this upgrade process several times in my team’s lab, this may not cover your particular implementation situation. For production deployments, it is always a good idea to consult Nutanix Support. Nutanix SRE’s should be the sole source of truth for target code guidance and related upgrade steps. As with literally every enterprise software, DO NOT randomly apply new code to your production environment, without first consulting the proper resources.

There are two primary ways that a system administrator can apply an update to Prism Central.

  1. Upgrade via Auto Update in the GUI
  2. Upgrade via CLI

Method 1: Upgrade via Auto Update (GUI)

In 4.0+, both Prism and Prism Central give administrators the ability to upgrade apply upgrades via the GUI. If your CVM’s have Internet access, they can pull the upgrade binaries directly from the Nutanix Releases API, which means no manual download / transfer process is required. Optionally, you can have the clusters periodically poll the releases API and automatically download new bundles as they are published.

NOTE: Configuring Auto Download does not automatically apply updates. Applying updates always requires “1 click” by a cluster admin.

To access to Auto Update feature, click the “Actions” gear in the top right hand corner, then select “Upgrade Prism Central”. This will bring up the “Upgrade Software” context window.

CLICK ZE BUTTON!

Click “Upgrade Prism Central”

As mentioned previously, if you have Internet access on the CVM, the CVM will poll the releases API and check for a compatible upgrade. If there is an upgrade available, the option(s) will be presented to you, with a big “Download” button to the right of the proposed software release. Optionally, you have the ability to “Enable Automatic Download” via the check box in the lower left hand corner.

GET TO ZE DOWNLOADS, NOWWW

Upgrade Software Window

If your CVM’s don’t have Internet access, you will have to download the target code from the Nutanix Support Portal.

This is where you get the BITZ, yo!

Nutanix Releases Download

NOTE: If you have to manually download a NOS release, you will need both the Binary File and the Metadata File. The Binary file, delivered as a tarball, contains all of the actual NOS code. The Metadata file, delivered as a JSON, and is the descriptor for the associated Binary file. This contains, among other things, the expected file size and MD5 hash. These are used together to ensure valid and accurate Binary file delivery.

Making sure your bitz ain't junk, yo!

Contents of Metadata JSON File

To use the Binary and Metadata file, choose the “upload a NOS binary” in the “Upgrade Software” window, select the appropriate files, and then click “Upload Now”. Just like the “Automatic Download” function, this will not apply the upgrade until a cluster admin selects “Install”, after the upload (or download) is complete.

all ur bitz are belong to cluster

Manual Binary/Metadata Upload

Once the files are uploaded (or downloaded), an “Install” button will appear (not pictured), which will kick off the rolling upgrade. In the case of Prism Central 4.0.1, this will apply the code, and reboot the VM upon success.

After clicking “Install”, you will see a blue circle up in the top left hand corner, which will indicate the progress of the upgrade.

Prism Central: Pre-Upgrade Progress

Prism Central: Pre-Upgrade Progress

Prism Central: Upgrade In Progress

Prism Central: Upgrade In Progress

After the upgrade is complete, and the VM has been rebooted, you will be kicked back to the Prism Central login screen. After re-authenticating, you can check the “About Nutanix” screen, located under the “User” menu in the top right hand corner.

About Screen: Upgrade Success!

About Screen: Upgrade Success!

Method 2: Upgrade via CLI

If you are glutton for hard work, or just enjoy the peace and quiet of a black and white command prompt, you can also upgrade Prism Central via CLI, utilizing both SSH and SCP.

Upload the Binary and Metadata files (described in the previous method) to /home/nutanix using your favorite SCP client (I like WinSCP).

Second, log into the Prism Central VM via SSH, and execute the following commands:

tar -zxvf nutanix_installer_filename_here
/home/nutanix/install/bin/cluster -i /home/nutanix/install upgrade

This will kick off the upgrade process, just like in the GUI. You should immediately see the effect in the GUI, which is the blue progress circle. In the CLI, you will see console output of whats actually going on.

It's like the matrix, wowweeee

CLI Upgrade Console Output

After that console output ceases, the upgrade will continue to process in the background. To check status, use the “upgrade_status” command.

upgrade_status command output

upgrade_status command output

As with the GUI method, this will take a few minutes, then reboot the VM upon success.

If you want to check the version/upgrade history, run “cat /home/nutanix/config/upgrade.history”.

Tee Hee, you get to use cat.

Upgrade History, note date and version!