Created attachment 1801252 [details] engine.log, vdsm.log from a host, output from getStats commands Description of problem: The ip address and fqdn are getting to the ovirt-engine. The interface shows an "!" and says that the latest guest addition needs to be installed and running on the guest OS even though it is installed. The vagrant plugin we use queries the API to sort out what IP it needs to communicate with vm's that are being brought up by our development team and fails because it can't sort out what IP the vm has. Some vm's seem to show the correct information while others based on the same template do not. Migrating a vm from one host to another sometimes caused the ip to get into the system. Version-Release number of selected component (if applicable): How reproducible: In my case, it is very easy to reproduce, but the issue does not always happen. If the server has been rebooted or vdsm has been restarted recently then things seem to work OK for a while. Steps to Reproduce: Up a vm from a previously known, good template. Actual results: IP intermittently does not appear in ovirt-engine. Expected results: IP and FQDN should be properly sent to ovirt-engine as qemu-guest-agent is installed Additional info: This affecting vms based on templates running Ubuntu 18.04/20.04 and Redhat 7/8.
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
There is no obvious issue on VDSM side. The VMs are queried regularly but there is no guest info in the stats. Once you notice this behavior again on the VM could you please share output of the following commands: virsh guestinfo noel-ovirt-biaa-11 virsh domifaddr noel-ovirt-biaa-11 --source agent virsh qemu-agent-command noel-ovirt-biaa-11 '{"execute":"guest-network-get-interfaces"}' Hopefully this will give us some clues.
The vm (noel-ovirt-biaa-11) that was not displaying an IP is now. I ran the requested commands on several vms that were not displaying an IP in the web interface The last vm was one I spun up with vagrant and it is currently hanging and will eventually timeout. Vagrant run stage: ==> i8manage_nopuppet_test_ov: Waiting for VM to get an IP address... Hope this is not too much, but ran this on multiple nodes with vms running different OS's in case there is a clue in there somewhere. Thanks, Gervais [root ~]# virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # guestinfo noel-ovirt-dhs-22 user.count : 0 os.id : ubuntu os.name : Ubuntu os.pretty-name : Ubuntu 20.04.2 LTS os.version : 20.04.2 LTS (Focal Fossa) os.version-id : 20.04 os.machine : x86_64 os.kernel-release : 5.4.0-66-generic os.kernel-version : #74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021 timezone.name : UTC timezone.offset : 0 hostname : noel-ovirt-dhs-22 fs.count : 8 fs.0.name : sda2 fs.0.mountpoint : / fs.0.fstype : ext4 fs.0.total-bytes : 500507766784 fs.0.used-bytes : 28232536064 fs.0.disk.count : 1 fs.0.disk.0.alias : sda fs.0.disk.0.serial : 0QEMU_QEMU_HARDDISK_8a9e76e7-6827-4248-9dd4-29c3d1c9771c fs.0.disk.0.device : /dev/sda2 fs.1.name : sda2 fs.1.mountpoint : /sftp/sftp_user/ingest_data fs.1.fstype : ext4 fs.1.total-bytes : 500507766784 fs.1.used-bytes : 28232536064 fs.1.disk.count : 1 fs.1.disk.0.alias : sda fs.1.disk.0.serial : 0QEMU_QEMU_HARDDISK_8a9e76e7-6827-4248-9dd4-29c3d1c9771c fs.1.disk.0.device : /dev/sda2 fs.2.name : loop0 fs.2.mountpoint : /snap/core18/2074 fs.2.fstype : squashfs fs.2.total-bytes : 58195968 fs.2.used-bytes : 58195968 fs.2.disk.count : 0 fs.3.name : loop1 fs.3.mountpoint : /snap/core18/1880 fs.3.fstype : squashfs fs.3.total-bytes : 57671680 fs.3.used-bytes : 57671680 fs.3.disk.count : 0 fs.4.name : loop2 fs.4.mountpoint : /snap/lxd/16099 fs.4.fstype : squashfs fs.4.total-bytes : 74842112 fs.4.used-bytes : 74842112 fs.4.disk.count : 0 fs.5.name : loop3 fs.5.mountpoint : /snap/lxd/20326 fs.5.fstype : squashfs fs.5.total-bytes : 70909952 fs.5.used-bytes : 70909952 fs.5.disk.count : 0 fs.6.name : loop5 fs.6.mountpoint : /snap/snapd/12398 fs.6.fstype : squashfs fs.6.total-bytes : 33947648 fs.6.used-bytes : 33947648 fs.6.disk.count : 0 fs.7.name : loop4 fs.7.mountpoint : /snap/snapd/8542 fs.7.fstype : squashfs fs.7.total-bytes : 31457280 fs.7.used-bytes : 31457280 fs.7.disk.count : 0 virsh # domifaddr noel-ovirt-dhs-22 --source agent Name MAC address Protocol Address ------------------------------------------------------------------------------- lo 00:00:00:00:00:00 ipv4 127.0.0.1/8 - - ipv6 ::1/128 enp1s0 56:6f:65:8b:00:09 ipv4 192.168.173.25/16 - - ipv6 fe80::546f:65ff:fe8b:9/64 virsh # qemu-agent-command noel-ovirt-dhs-22 '{"execute":"guest-network-get-interfaces"}' {"return":[{"name":"lo","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"127.0.0.1","prefix":8},{"ip-address-type":"ipv6","ip-address":"::1","prefix":128}],"statistics":{"tx-packets":853428,"tx-errs":0,"rx-bytes":427632008,"rx-dropped":0,"rx-packets":853428,"rx-errs":0,"tx-bytes":427632008,"tx-dropped":0},"hardware-address":"00:00:00:00:00:00"},{"name":"enp1s0","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"192.168.173.25","prefix":16},{"ip-address-type":"ipv6","ip-address":"fe80::546f:65ff:fe8b:9","prefix":64}],"statistics":{"tx-packets":62180,"tx-errs":0,"rx-bytes":128789706,"rx-dropped":171968,"rx-packets":1166301,"rx-errs":0,"tx-bytes":20379261,"tx-dropped":0},"hardware-address":"56:6f:65:8b:00:09"}]} virsh # ^C [root ~]# exit Connection to ovirt1 closed. ssh ovirt2 Last login: Thu Jul 15 19:49:35 2021 from 192.168.168.1 node status: OK See `nodectl check` for more information Admin Console: https://10.10.1.2:9090/ or https://192.168.168.11:9090/ or https://10.10.2.2:9090/ node status: OK See `nodectl check` for more information Admin Console: https://10.10.1.2:9090/ or https://192.168.168.11:9090/ or https://10.10.2.2:9090/ Welcome to ovirt2.dgi [root ~]# virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # guestinfo alexander-2004 user.count : 0 os.id : ubuntu os.name : Ubuntu os.pretty-name : Ubuntu 20.04.2 LTS os.version : 20.04.2 LTS (Focal Fossa) os.version-id : 20.04 os.machine : x86_64 os.kernel-release : 5.4.0-66-generic os.kernel-version : #74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021 timezone.name : UTC timezone.offset : 0 hostname : vagrantbase-ubuntu2004 fs.count : 7 fs.0.name : sda2 fs.0.mountpoint : / fs.0.fstype : ext4 fs.0.total-bytes : 500507766784 fs.0.used-bytes : 15814242304 fs.0.disk.count : 1 fs.0.disk.0.alias : sda fs.0.disk.0.serial : 0QEMU_QEMU_HARDDISK_c1bcd399-bbcb-4758-a8a1-4632d79e5d1e fs.0.disk.0.device : /dev/sda2 fs.1.name : loop1 fs.1.mountpoint : /snap/lxd/16099 fs.1.fstype : squashfs fs.1.total-bytes : 74842112 fs.1.used-bytes : 74842112 fs.1.disk.count : 0 fs.2.name : loop0 fs.2.mountpoint : /snap/core18/1880 fs.2.fstype : squashfs fs.2.total-bytes : 57671680 fs.2.used-bytes : 57671680 fs.2.disk.count : 0 fs.3.name : loop2 fs.3.mountpoint : /snap/snapd/8542 fs.3.fstype : squashfs fs.3.total-bytes : 31457280 fs.3.used-bytes : 31457280 fs.3.disk.count : 0 fs.4.name : loop3 fs.4.mountpoint : /snap/snapd/12398 fs.4.fstype : squashfs fs.4.total-bytes : 33947648 fs.4.used-bytes : 33947648 fs.4.disk.count : 0 fs.5.name : loop4 fs.5.mountpoint : /snap/core18/2074 fs.5.fstype : squashfs fs.5.total-bytes : 58195968 fs.5.used-bytes : 58195968 fs.5.disk.count : 0 fs.6.name : loop5 fs.6.mountpoint : /snap/lxd/20326 fs.6.fstype : squashfs fs.6.total-bytes : 70909952 fs.6.used-bytes : 70909952 fs.6.disk.count : 0 virsh # domifaddr alexander-2004 --source agent Name MAC address Protocol Address ------------------------------------------------------------------------------- lo 00:00:00:00:00:00 ipv4 127.0.0.1/8 - - ipv6 ::1/128 enp1s0 56:6f:65:8b:00:0c ipv4 192.168.173.28/16 - - ipv6 fe80::546f:65ff:fe8b:c/64 virsh # qemu-agent-command alexander-2004 '{"execute":"guest-network-get-interfaces"}' {"return":[{"name":"lo","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"127.0.0.1","prefix":8},{"ip-address-type":"ipv6","ip-address":"::1","prefix":128}],"statistics":{"tx-packets":637,"tx-errs":0,"rx-bytes":60896,"rx-dropped":0,"rx-packets":637,"rx-errs":0,"tx-bytes":60896,"tx-dropped":0},"hardware-address":"00:00:00:00:00:00"},{"name":"enp1s0","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"192.168.173.28","prefix":16},{"ip-address-type":"ipv6","ip-address":"fe80::546f:65ff:fe8b:c","prefix":64}],"statistics":{"tx-packets":52758,"tx-errs":0,"rx-bytes":433033164,"rx-dropped":447839,"rx-packets":2944392,"rx-errs":0,"tx-bytes":6164774,"tx-dropped":0},"hardware-address":"56:6f:65:8b:00:0c"}]} virsh # exit [root ~]# virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # guestinfo alexandria2 user.count : 0 os.id : centos os.name : CentOS Stream os.pretty-name : CentOS Stream 8 os.version : 8 os.version-id : 8 os.machine : x86_64 os.kernel-release : 4.18.0-315.el8.x86_64 os.kernel-version : #1 SMP Mon Jun 28 19:09:44 UTC 2021 timezone.name : ADT timezone.offset : -10800 hostname : alexandria.dgi fs.count : 2 fs.0.name : dm-0 fs.0.mountpoint : / fs.0.fstype : xfs fs.0.total-bytes : 29827903488 fs.0.used-bytes : 3050864640 fs.0.disk.count : 1 fs.0.disk.0.alias : vda fs.0.disk.0.serial : d4b74174-3778-4fee-9 fs.0.disk.0.device : /dev/vda2 fs.1.name : vda1 fs.1.mountpoint : /boot fs.1.fstype : ext4 fs.1.total-bytes : 952840192 fs.1.used-bytes : 299028480 fs.1.disk.count : 1 fs.1.disk.0.alias : vda fs.1.disk.0.serial : d4b74174-3778-4fee-9 fs.1.disk.0.device : /dev/vda1 disk.count : 6 disk.0.name : /dev/vda1 disk.0.partition : yes disk.0.dependency.count: 1 disk.0.dependency.0.name: /dev/vda disk.1.name : /dev/vda2 disk.1.partition : yes disk.1.dependency.count: 1 disk.1.dependency.0.name: /dev/vda disk.2.name : /dev/vda disk.2.partition : no disk.2.serial : d4b74174-3778-4fee-9 disk.2.alias : vda disk.3.name : /dev/dm-0 disk.3.partition : no disk.3.dependency.count: 1 disk.3.dependency.0.name: /dev/vda2 disk.3.guest_alias : cl-root disk.4.name : /dev/sr0 disk.4.partition : no disk.4.serial : QEMU_DVD-ROM_QM00005 disk.4.alias : sdc disk.5.name : /dev/dm-1 disk.5.partition : no disk.5.dependency.count: 1 disk.5.dependency.0.name: /dev/vda2 disk.5.guest_alias : cl-swap virsh # domifaddr alexandria2 --source agent Name MAC address Protocol Address ------------------------------------------------------------------------------- lo 00:00:00:00:00:00 ipv4 127.0.0.1/8 - - ipv6 ::1/128 enp1s0 00:50:56:bf:47:53 ipv4 192.168.180.135/16 - - ipv6 fe80::9b49:b62d:1800:49d3/64 virsh # qemu-agent-command alexandria2 '{"execute":"guest-network-get-interfaces"}' {"return":[{"name":"lo","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"127.0.0.1","prefix":8},{"ip-address-type":"ipv6","ip-address":"::1","prefix":128}],"statistics":{"tx-packets":652,"tx-errs":0,"rx-bytes":56859,"rx-dropped":0,"rx-packets":652,"rx-errs":0,"tx-bytes":56859,"tx-dropped":0},"hardware-address":"00:00:00:00:00:00"},{"name":"enp1s0","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"192.168.180.135","prefix":16},{"ip-address-type":"ipv6","ip-address":"fe80::9b49:b62d:1800:49d3","prefix":64}],"statistics":{"tx-packets":755882,"tx-errs":0,"rx-bytes":2107436985,"rx-dropped":95063,"rx-packets":2266605,"rx-errs":0,"tx-bytes":97774500,"tx-dropped":0},"hardware-address":"00:50:56:bf:47:53"}]} ssh ovirt2 Last login: Thu Jul 15 20:19:19 2021 from 192.168.168.1 node status: OK See `nodectl check` for more information Admin Console: https://10.10.1.2:9090/ or https://192.168.168.11:9090/ or https://10.10.2.2:9090/ ^[[A node status: OK See `nodectl check` for more information Admin Console: https://10.10.1.2:9090/ or https://192.168.168.11:9090/ or https://10.10.2.2:9090/ Welcome to ovirt2.dgi [root ~]# virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # guestinfo gervais-ovirt-i8manage-test user.count : 0 os.id : ubuntu os.name : Ubuntu os.pretty-name : Ubuntu 20.04.2 LTS os.version : 20.04.2 LTS (Focal Fossa) os.version-id : 20.04 os.machine : x86_64 os.kernel-release : 5.4.0-66-generic os.kernel-version : #74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021 timezone.name : UTC timezone.offset : 0 hostname : vagrantbase-ubuntu2004 fs.count : 4 fs.0.name : sda2 fs.0.mountpoint : / fs.0.fstype : ext4 fs.0.total-bytes : 500507766784 fs.0.used-bytes : 4842360832 fs.0.disk.count : 1 fs.0.disk.0.alias : sda fs.0.disk.0.serial : 0QEMU_QEMU_HARDDISK_eafaac44-c1ca-4ee4-81f4-2b0db5a88f46 fs.0.disk.0.device : /dev/sda2 fs.1.name : loop2 fs.1.mountpoint : /snap/lxd/16099 fs.1.fstype : squashfs fs.1.total-bytes : 74842112 fs.1.used-bytes : 74842112 fs.1.disk.count : 0 fs.2.name : loop0 fs.2.mountpoint : /snap/core18/1880 fs.2.fstype : squashfs fs.2.total-bytes : 57671680 fs.2.used-bytes : 57671680 fs.2.disk.count : 0 fs.3.name : loop1 fs.3.mountpoint : /snap/snapd/8542 fs.3.fstype : squashfs fs.3.total-bytes : 31457280 fs.3.used-bytes : 31457280 fs.3.disk.count : 0 virsh # domifaddr gervais-ovirt-i8manage-test --source agent Name MAC address Protocol Address ------------------------------------------------------------------------------- lo 00:00:00:00:00:00 ipv4 127.0.0.1/8 - - ipv6 ::1/128 enp1s0 56:6f:65:8b:00:67 ipv4 192.168.173.119/16 - - ipv6 fe80::546f:65ff:fe8b:67/64 virsh # qemu-agent-command gervais-ovirt-i8manage-test '{"execute":"guest-network-get-interfaces"}' {"return":[{"name":"lo","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"127.0.0.1","prefix":8},{"ip-address-type":"ipv6","ip-address":"::1","prefix":128}],"statistics":{"tx-packets":132,"tx-errs":0,"rx-bytes":10980,"rx-dropped":0,"rx-packets":132,"rx-errs":0,"tx-bytes":10980,"tx-dropped":0},"hardware-address":"00:00:00:00:00:00"},{"name":"enp1s0","ip-addresses":[{"ip-address-type":"ipv4","ip-address":"192.168.173.119","prefix":16},{"ip-address-type":"ipv6","ip-address":"fe80::546f:65ff:fe8b:67","prefix":64}],"statistics":{"tx-packets":11345,"tx-errs":0,"rx-bytes":122338065,"rx-dropped":66,"rx-packets":80174,"rx-errs":0,"tx-bytes":926907,"tx-dropped":0},"hardware-address":"56:6f:65:8b:00:67"}]}
@
Apologies for the above comment. This issue is still huge for our organization. Please let me know if there is anything else I can do to help move this issue along.
I went through the provided data several times and I still don't see where could the stats get lost. The only thing I noticed that the VMs were migrated to that host (and not started fresh). Do you experience the behavior only after a migration?
Hi Tomáš, It does not work on boxes that are started fresh on a host. If I put a host into maintenance mode and then activate it, vm's started on that host show an IP properly. If I migrate a vm that is not showing its IP to the newly reactivated host, it displays its IP. Hosts that have active VM's running for any length of time eventually no longer consistently display their IP's in the interface. The odd one or two show an IP. It is not always the same VM that continues to display an IP and I can not find a pattern. If I migrate a vm off of the host that was recently put into maintenance mode and activated to a host that has been running untouched for some time, the vm immediately stops displaying its IP. This includes the HostedEngine.
My HostedEngine is still running CentOS 8. Does this have any bearing on this issue? Should I move it to CentOS Stream?
We fail to reproduce this issue and while we have the relevant logs, it's still quite difficult to understand what's going on. Assuming there are still VMs with no guest agent information (ip addresses and fqdn) in your environment, could you please pick one of them and: 1. Provide the output of getStats 2. Specify the exact time getStats was triggered 3. Provide VDSM log from that host that covers this point in time and few hours back
The issue definitely continues. We have 74 vm's running in our cluster and about 1/3 have no IP and FQDN being displayed in the oVirt engine. We have 3 hosts in our cluster (ovirt1, ovirt2, ovirt3 - I know crazy naming scheme). I pushed most of the vm's off to ovirt2 and ovirt3. oVirt1 has only 8 vm's running on it and they all show the correct information in the engine. If a new box is brought up by one of our developers, it almost always comes up on oVirt1 (because it has the least vm's) and it does come up ok and display the correct information in the ovirt-engine. If for some reason, the box starts on ovirt2 or ovirt3, I can force it to migrate to ovirt1 and it will display the IP and fqdn properly. As soon as I migrate that new vm to ovirt2 or ovirt3 (the hosts with the majority of the vms running on them), the ip and fqdn are not longer displayed in the engine interface. I'm attaching vdsm logs and output of getStats as requested. the file is called ovirt-logs-2021-08-09.zip getStats was run at Mon Aug 9 13:21:37 UTC 2021
Created attachment 1812462 [details] getStats,out vdsm.log vdsm.log.1.xz
I may have already asked this, but I cannot find this in a bug. Are all the hosts up to date? What are the versions of vdsm, libvirt and qemu? In previous attachment you posted a getStats for a VM on broken host (as requested). For comparison, could you please share also result of getStats for the same VM but running on ovirt1 when it has all the guest information visible in engine? It pretty much looks like a host issue, but let's double check it is not related to bug 1901572. Please try clearing the hash for NICs info in engine, like in: https://bugzilla.redhat.com/show_bug.cgi?id=1901572#c16 Will this fix the issue or change anything?
Clearing the hash did not seem to have any effect. Output from getStats on same vm after migrating it to ovirt1 ``` [root ~]# vdsm-client VM getStats vmID=72ae23f6-7487-42e1-9ceb-dcf189a6f264 [ { "acpiEnable": "true", "appsList": [ "kernel-3.10.0-1160.21.1.el7.x86_64", "qemu-guest-agent-2.12.0" ], "balloonInfo": { "balloon_cur": "8388608", "balloon_max": "8388608", "balloon_min": "8388608", "balloon_target": "8388608", "ballooning_enabled": true }, "clientIp": "", "cpuSys": "0.27", "cpuUsage": "6530000000", "cpuUser": "0.44", "disks": { "sdc": { "apparentsize": "0", "flushLatency": "0", "readBytes": "0", "readLatency": "0", "readOps": "0", "readRate": "0.0", "truesize": "0", "writeLatency": "0", "writeOps": "0", "writeRate": "0.0", "writtenBytes": "0" }, "vda": { "apparentsize": "200540160", "flushLatency": "1413051.5", "imageID": "563abb58-e91a-44da-9382-9a850fff0c4e", "readBytes": "0", "readLatency": "0", "readOps": "0", "readRate": "0.0", "truesize": "214777344", "writeLatency": "21563795.0", "writeOps": "1", "writeRate": "102.40785782986511", "writtenBytes": "1536" } }, "disksUsage": [], "displayInfo": [ { "ipAddress": "192.168.168.13", "port": "5909", "tlsPort": "", "type": "vnc" } ], "elapsedTime": "962703", "guestCPUCount": -1, "guestFQDN": "gervais-centos7-test-php56.dgi", "guestIPs": "", "guestName": "gervais-centos7-test-php56.dgi", "guestOs": "3.10.0-1160.21.1.el7.x86_64", "guestOsInfo": { "arch": "x86_64", "codename": "", "distribution": "CentOS Linux", "kernel": "3.10.0-1160.21.1.el7.x86_64", "type": "linux", "version": "7" }, "guestTimezone": { "offset": -180, "zone": "ADT" }, "hash": "-5327289600703475597", "kvmEnable": "true", "memUsage": "10", "memoryStats": { "majflt": 0, "mem_free": "7491276", "mem_total": "8006768", "mem_unused": "7491276", "minflt": 2, "pageflt": 2, "swap_in": 0, "swap_out": 0 }, "monitorResponse": "0", "netIfaces": [ { "hw": "00:00:00:00:00:00", "inet": [ "127.0.0.1" ], "inet6": [ "::1" ], "name": "lo" }, { "hw": "56:6f:65:8b:00:36", "inet": [ "192.168.173.70" ], "inet6": [ "fe80::5547:5c32:3c7c:842e" ], "name": "eth0" } ], "network": { "vnet26": { "macAddr": "56:6f:65:8b:00:36", "name": "vnet26", "rx": "28134", "rxDropped": "0", "rxErrors": "0", "sampleTime": 523369.025767464, "speed": "1000", "state": "unknown", "tx": "940", "txDropped": "0", "txErrors": "0" } }, "session": "Unknown", "status": "Up", "statusTime": "2670852671", "timeOffset": "0", "username": "", "vcpuCount": "1", "vcpuPeriod": 100000, "vcpuQuota": "-1", "vmId": "72ae23f6-7487-42e1-9ceb-dcf189a6f264", "vmJobs": {}, "vmName": "gervais-centos7-test-php56", "vmType": "kvm" } ] ``` Everything is up to date: ``` [root ~]# dnf clean all ; dnf upgrade 99 files removed Ceph packages for x86_64 176 kB/s | 283 kB 00:01 dell-system-update_independent 144 kB/s | 150 kB 00:01 dell-system-update_dependent 155 kB/s | 195 kB 00:01 oVirt Node Optional packages from CentOS Stream 8 - BaseOS 6.7 MB/s | 8.8 MB 00:01 oVirt Node Optional packages from CentOS Stream 8 - AppStream 522 kB/s | 12 MB 00:24 Latest oVirt 4.4 Release 1.3 MB/s | 2.2 MB 00:01 Extra Packages for Enterprise Linux 8 - x86_64 7.6 MB/s | 10 MB 00:01 CentOS-8 - Gluster 8 157 kB/s | 64 kB 00:00 virtio-win builds roughly matching what will be shipped in upcoming RHEL 102 kB/s | 66 kB 00:00 Copr repo for EL8_collection owned by sbonazzo 615 kB/s | 402 kB 00:00 Copr repo for gluster-ansible owned by sac 28 kB/s | 7.3 kB 00:00 Copr repo for ovsdbapp owned by mdbarroso 6.9 kB/s | 2.0 kB 00:00 Advanced Virtualization CentOS Stream packages for x86_64 230 kB/s | 153 kB 00:00 CentOS-8 Stream - oVirt 4.4 1.0 MB/s | 928 kB 00:00 CentOS-8 - OpsTools - collectd 287 kB/s | 148 kB 00:00 CentOS-8 - NFV OpenvSwitch 92 kB/s | 45 kB 00:00 OpenStack Train Repository 661 B/s | 257 B 00:00 Dependencies resolved. Nothing to do. Complete! ``` Versions I am running: ``` [root ~]# rpm -qa | grep vdsm vdsm-client-4.40.70.6-1.el8.noarch vdsm-common-4.40.70.6-1.el8.noarch vdsm-http-4.40.70.6-1.el8.noarch vdsm-4.40.70.6-1.el8.x86_64 vdsm-gluster-4.40.70.6-1.el8.x86_64 vdsm-network-4.40.70.6-1.el8.x86_64 vdsm-jsonrpc-4.40.70.6-1.el8.noarch vdsm-python-4.40.70.6-1.el8.noarch vdsm-api-4.40.70.6-1.el8.noarch vdsm-yajsonrpc-4.40.70.6-1.el8.noarch [root ~]# rpm -qa libvirt libvirt-7.4.0-1.el8s.x86_64 [root ~]# rpm -qa | grep qemu qemu-kvm-block-ssh-5.2.0-16.el8s.x86_64 qemu-kvm-block-gluster-5.2.0-16.el8s.x86_64 qemu-guest-agent-5.2.0-16.el8s.x86_64 qemu-img-5.2.0-16.el8s.x86_64 qemu-kvm-common-5.2.0-16.el8s.x86_64 qemu-kvm-block-iscsi-5.2.0-16.el8s.x86_64 libvirt-daemon-driver-qemu-7.4.0-1.el8s.x86_64 qemu-kvm-ui-opengl-5.2.0-16.el8s.x86_64 qemu-kvm-block-rbd-5.2.0-16.el8s.x86_64 qemu-kvm-ui-spice-5.2.0-16.el8s.x86_64 qemu-kvm-block-curl-5.2.0-16.el8s.x86_64 qemu-kvm-5.2.0-16.el8s.x86_64 ipxe-roms-qemu-20181214-8.git133f4c47.el8.noarch qemu-kvm-core-5.2.0-16.el8s.x86_64 ```
I've noticed that there have been some small updates to packages and have applied them. None of them have made any difference to the issue described on this ticket. Is there anything I can do on my end? Should I do a reinstall of my hosts from the UI in the ovirt-engine? Could that make the situation worse?
(In reply to GervaisdeM from comment #14) > I've noticed that there have been some small updates to packages and have > applied them. None of them have made any difference to the issue described > on this ticket. > > Is there anything I can do on my end? Should I do a reinstall of my hosts > from the UI in the ovirt-engine? Could that make the situation worse? Yes, I wanted to suggest reinstalling the host. So if you can please try that. Also, compare configuration in /etc/vdsm if there are any differences between the host that works OK and the faulty hosts.
I haven't gotten to reinstalling yet. Will try to one evening this week. Before I do, should I have any concerns about reinstalling? My cluster is hyperconverged with all three servers being being part of the underlying gluster filesystems that I need for vm's and hosted engine. Anything I need to be aware of before running the reinstall? All three hosts in my cluster are affected by this issue. The only difference in my /etc/vdsm directories on my servers is the vdsm.id Example: [root ~]# scp -r ovirt1:/etc/vdsm . svdsm.logger.conf 100% 734 124.2KB/s 00:00 mom.conf 100% 3149 3.9MB/s 00:00 vdsm 100% 159 177.3KB/s 00:00 00-defines.policy 100% 323 18.6KB/s 00:00 04-cputune.policy 100% 2276 3.1MB/s 00:00 02-balloon.policy 100% 5369 6.2MB/s 00:00 05-iotune.policy 100% 1105 1.6MB/s 00:00 03-ksm.policy 100% 4651 621.0KB/s 00:00 01-parameters.policy 100% 1 1.8KB/s 00:00 vdsm.conf 100% 56 94.5KB/s 00:00 logger.conf 100% 2167 3.2MB/s 00:00 vdsm.id 100% 36 49.5KB/s 00:00 [root ~]# diff -r /etc/vdsm vdsm diff -r /etc/vdsm/vdsm.id vdsm/vdsm.id 1c1 < 4c4c4544-0056-3510-8056-c2c04f383432 \ No newline at end of file --- > 4c4c4544-0056-3410-8057-c2c04f383432 \ No newline at end of file
There was another minor upgrade showing for my hosts... I applied the update to all of my hosts and then ran the reinstall one at a time for all three oVirt hosts. This is exactly what I did: 1. Put first server into maintenance mode. 2. Selected Installation -> Reinstall from the menu 3. Left everything as the default, except set Hosted Engine -> Deploy I did that for all three ovirt hosts. My issue remains exactly the same. Is there something in my process above that I should change?
Any next steps that I can take that might help sort this out?
(In reply to GervaisdeM from comment #20) > Any next steps that I can take that might help sort this out? That depends on if you would be willing to patch one of your servers with: https://gerrit.ovirt.org/c/vdsm/+/116839/ If yes, then please enable DEBUG logs for "vds" logger in /etc/vdsm/logger.conf and restart vdsmd service. Let the VM run for a couple of minutes (5 is more than enough), run the VM.getStats API call with vdsm-client and provide again the vdsm.log plus result of API call. Hopefully that would give us more clues.
Created attachment 1826026 [details] vdsm-debug-log-2021-09-24
I put my host in Maintenance Mode Updated the files with https://gerrit.ovirt.org/c/vdsm/+/116839/ by downloading them to one of my hosts and mv'ing some files around so the new ones are being used. Added the DEBUG flag to /etc/vdsm/logger.conf. Rebooted Shuffled enough vm's manually to that host so that some of them stop showing IP's in the overt-engine web interface (if there are only a couple of vm's on the host, they DO show their ip) The vdsm.log (vdsm-debug-log-2021-09-24) is attached and here is the output of the vdsm-client command: [root ~]# vdsm-client VM getStats vmID=a64feb94-475a-484f-87bc-6d6213cb7fe1 [ { "acpiEnable": "true", "appsList": [], "balloonInfo": { "balloon_cur": "5242880", "balloon_max": "5242880", "balloon_min": "5242880", "balloon_target": "5242880", "ballooning_enabled": true }, "clientIp": "", "cpuSys": "1.13", "cpuUsage": "67110000000", "cpuUser": "2.91", "disks": { "sda": { "apparentsize": "4141088768", "flushLatency": "1901196.5", "imageID": "07b9dfb0-1678-45eb-ac61-50c5064f3326", "readBytes": "86016", "readLatency": "0", "readOps": "9", "readRate": "0.0", "truesize": "4320882688", "writeLatency": "3066841.4", "writeOps": "3621", "writeRate": "10635.814409007802", "writtenBytes": "23912448" }, "sdc": { "apparentsize": "0", "flushLatency": "0", "readBytes": "0", "readLatency": "0", "readOps": "0", "readRate": "0.0", "truesize": "0", "writeLatency": "0", "writeOps": "0", "writeRate": "0.0", "writtenBytes": "0" } }, "displayInfo": [ { "ipAddress": "192.168.168.13", "port": "5911", "tlsPort": "", "type": "vnc" } ], "elapsedTime": "1905267", "guestCPUCount": -1, "guestFQDN": "", "guestIPs": "", "hash": "-1536474874257087895", "kvmEnable": "true", "memUsage": "57", "memoryStats": { "majflt": 0, "mem_free": "2207892", "mem_total": "5053548", "mem_unused": "261804", "minflt": 66, "pageflt": 66, "swap_in": 0, "swap_out": 0 }, "monitorResponse": "0", "network": { "vnet36": { "macAddr": "56:6f:65:8b:00:2f", "name": "vnet36", "rx": "1901928", "rxDropped": "0", "rxErrors": "0", "sampleTime": 6876.242791271, "speed": "1000", "state": "unknown", "tx": "695171", "txDropped": "0", "txErrors": "0" } }, "session": "Unknown", "status": "Up", "statusTime": "2154359889", "timeOffset": "3", "username": "Unknown", "vcpuCount": "2", "vcpuPeriod": 100000, "vcpuQuota": "-1", "vmId": "a64feb94-475a-484f-87bc-6d6213cb7fe1", "vmJobs": {}, "vmName": "gervais-ovirt-i8manage-nagios", "vmType": "kvm" } ]
Thanks for the logs. It gave me couple more clues but we're still far from knowing what is really happening here. Would you have also the logs before 18:55:43 from that day? What I would like to see is the lifetime of one of the VMs without stats from its start/migration to host. It looks like these VMs were without stats: 86f85996-3306-4489-9449-569803f5fa85 a64feb94-475a-484f-87bc-6d6213cb7fe1 b42368a2-d6b7-40ac-9842-8a38edf4fdb1 38e5240c-6ef0-43d4-a44d-1bd9bcc40d05 Hopefully the logs are not yet rotated away. Feel free to reach me by email instead of attaching files to this BZ.
Hi Tomáš I've sent you an email with what looks like the correct log file attached (based on the timestamp). The problem is that this issue is intermittent. For me to get the debugging turned on, I had to turn the server to maintenance mode. By doing so, the problem (on that one server) is fixed when it is re-activated until a more vm's are migrated to it and then it starts "breaking." I manually moved vm's around to simulate this and then sent you the last set of logs. It is very possible that the vm's you listed above are not on the host (oVirt1 in this case) earlier in the day. :( Let me know if there is anything else I can do.
Hi Tomáš, Any next steps?
I think I may have found the source of this issue. Could you please try applying the trivial change from: https://gerrit.ovirt.org/c/vdsm/+/117211 Let me know if this really fixes the issue. If not, I have few more changes that increase amount of debug logs in the VDSM handling of qemu-ga.
(In reply to Tomáš Golembiovský from comment #28) > I think I may have found the source of this issue. Could you please try > applying the trivial change from: https://gerrit.ovirt.org/c/vdsm/+/117211 > > Let me know if this really fixes the issue. If not, I have few more changes > that increase amount of debug logs in the VDSM handling of qemu-ga. This patch fixes the issue for me (oVirt 4.4.8 on AlmaLinux 8.4).
I can also confirm the fix. patch applied on RHVH 4.4.8, vdsm-4.40.80.6-1.el8ev.x86_64.
I'm looking at a list of vm's that are ALL displaying IP addresses! Looks like the patched worked for me. oVirt 4.4.8.6-1 vdsm-4.40.80.6-1.el8.x86_64
(In reply to GervaisdeM from comment #31) > I'm looking at a list of vm's that are ALL displaying IP addresses! > > Looks like the patched worked for me. > oVirt 4.4.8.6-1 > vdsm-4.40.80.6-1.el8.x86_64 Thanks for the confirmation.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.