Time Nick Message 11:48 agoddard pdurbin: (HA-Worker-3:work-9) VM state transitted from :Stopped to Starting) < CS High Availability in action 12:43 pdurbin agoddard: what is that? pulling the power cord on a hypervisor and watching cloudstack move VMs around automatically? 12:45 agoddard just killing the VM on KVM and watching cloudstack see it's gone and bring it up again. I only just added a HA service offering, so can't test it properly by killing a whole host yet, but it should behave the same 12:46 pdurbin ok. but that's the holy grail, right? power cut. another hypervisor takes over. vmware does this. we're losing this by going to bare bone kvm 12:47 pdurbin for a little context for others, see agoddard's tweet which traces back to me talking about 43 VMs went down when we lost a hypervisor yesterday: https://twitter.com/anthonygoddard/status/231356077516660736 12:48 pdurbin shuff: the irony is not lost on my that as that hypervisor rebooting itself, we were discussing high availability of cisco's linux thing over lunch 12:49 pdurbin Cisco becomes a major Linux server vendor overnight | The Open Road - CNET News - http://news.cnet.com/8301-13505_3-10370165-16.html 12:49 pdurbin UCS: http://en.wikipedia.org/wiki/Cisco_Unified_Computing_System 12:50 pdurbin Cisco Unified Computing System 12:50 agoddard ya, libvirt/KVM do what they do well, then we need another layer on top to manage clusters,resources,provisioning,HA,storage, etc.. 12:51 pdurbin agoddard: right, and you're happy enough with cloudstack? how much effort to get it up and running well? be honest :) 12:51 agoddard so I think the best approaches have always tools which work with the libvirt api, so they don't have overreach and let libvirt do its job - plus it means you have a familiar env with libvirt/kvm. 12:52 pdurbin absolutely. i'm for tools that use libvirt as a foundation. i'm very happy with libvirt 12:52 agoddard CS is really quick to install, quicker than OS for sure - it's all a big monolithic java app.. and it's very flexible with Networking 12:52 pdurbin it's not like i'm surprised this happened. i never planned for high availability or automatic failover in this iteration of our kvm/libvirt platform 12:53 pdurbin OS being openstack, of course 12:55 pdurbin as we've discussed a couple times, i took some notes during a cloudstack presentation: http://irclog.perlgeek.de/crimsonfu/2012-06-26#i_5759658 12:57 pdurbin oh good. the tar.gz expands to a bunch of RPMs: http://www.cloudstack.org/download.html would make an install on centos easy 12:58 agoddard BUT... the UI is slow to navigate, error reporting sucks (errors in the API/UI are terrible and the logging can be so verbose it's hard to track them down).. and I hit a really annoying bug the other day where I had to modify the DB to get machines provisioning again 12:58 agoddard also, the community just isn't there yet.. 13:00 agoddard I think KVM snapshots only work with centos, but that's not doc'd 13:01 agoddard so I make a CS agent, and it doesn't work.. the one click install does 70% of the work, but now I have a long bash script to get the other 30% done, all stuff that's documented in random forum posts, sometimes linking to bug reports which have no info, or link to other trackers that require logins.. 13:06 pdurbin can you tell us how you really feel? ;) 13:06 pdurbin so you don't do kvm snapshots at all? 13:08 agoddard http://www.youtube.com/watch?v=XFf8HV_OPdw&t=2m5s < ;) 13:08 agoddard I've done a few, have only been using CS for a few weeks.. I switched to CentOS hosts for my last two clusters 13:08 pdurbin we have been doing snapshots with `qemu-img snapshot` on qcow2 disk images but recently, to my horror, i discovered that the disk images are getting corrupted according to `qemu-img check`. working on a nagios check to detect this. and will be going through each disk image that's corrupt and fixing it 13:09 pdurbin this is how i remove the corruption: qemu-img convert -f qcow2 -O qcow2 /vm_storage02/images/git2-disk0 /tmp/git2-oneoff 13:10 pdurbin qemu-img snapshots are lost, but that's ok. i'm just happy the disk is clean then, according to `qemu-img check` 13:11 pdurbin i don't know why the corruption is happening. or why some of our disk images are ok. maybe the problem is that we're hosting these qcow2 images on NFS 13:11 agoddard we're looking at our whole backup setup this iteration, I really want to focus as much as possible on backing up data only and being able to rebuild fast - then using snapshots where we must.. knowing that data only + rebuild isn't going to be ideal in some places 13:12 agoddard but being able to recover fast also means being able to migrate to new infra fast, between clouds fast, etc.. 13:12 pdurbin but anyway, we disabled these qemu-img snapshots for now. may re-introduce them slowly on a few test VMs because it was such a nice feature 13:13 pdurbin for a while we were even splitting off the snapshots into their own disk images, which was nice because you could just drop the disk image in place over the current one to do a restore. very simple. just shut down the vm and do a cp 13:14 agoddard ya, and maintaining small VM images is definitely key.. 13:15 pdurbin the thing is, a lot of our VMs are special snowflakes. not easy to rebuild because we didn't even build them 13:15 agoddard we have some hosts which are built 100% by chef, and nobody ever logs in to, state is never stored on, so rebuilding them with chef is the fastest recover possible. takes a 10 minutes to re-provision a dozen machines with the cloudstack plugin 13:16 pdurbin maybe we need to evangelize more to our users that they should plan how they would rebuild their VMs... 13:16 agoddard pdurbin: yeah we have that issue too.. but thankfully we're moving away from it reasonably well 13:17 pdurbin agoddard: i'll just point our users to you. "look at this guy. use chef or whatever. rebuild your VMs like he does" :) 13:17 agoddard in some cases, we manage their apps with chef and if they want a change, they can send a pull request.. but that obviously doesn't work for all users.. and then it comes down to levels of service they can expect.. If they want 100% control of an instance, and manually install everything, maybe we can't support it, but we could do snapshots for them for example 13:17 agoddard pdurbin: :D 13:18 agoddard pdurbin: I've got some stuff migrated off a host which I want to install OS on again. When I did the CS install it was 'cause time was short, and some Euro collaborators of ours were testing it too, but I think it'd be awesome if RC & us rocked the same orchestration 13:19 pdurbin None of us is as smart as all of us 13:20 agoddard pdurbin: +1000 13:21 pdurbin it's one of our guiding principles, of course: http://crimsonfu.github.com 13:21 pdurbin i'm starting to think another one should be "please tell me when i'm doing something stupid" 13:21 pdurbin but people are kind of naturally good at that 13:22 pdurbin i mean, is qcow2 over NFS insane? i don't know 13:23 pdurbin if we're dropping the qcow2 snapshots anyway i guess we could switch to raw 13:23 agoddard what caused the crash? 13:24 pdurbin i haven't even looked yet. 15 year anniversary. babysitter last night 13:25 agoddard oh man, that sucks!! 13:25 pdurbin yeah. not the best timing 13:26 agoddard though.. congrats by the way! 13:26 pdurbin thanks :) 13:27 pdurbin anyway, trying to keep a few tickets and other things in motion. then i'll look at logs, i guess 13:27 agoddard storage is the other problem I wanna solve for our VM stuff, then networking.. 13:27 pdurbin we had another one of these Dell PowerEdge C6145's mysteriously reboot itself the other month. never could figure out why 13:27 agoddard yergh 13:28 agoddard we have some R710's that take down switch stacks when they kernel panic. awesome :| 13:28 pdurbin yeah, we have the NFS traffic spread out across 4 NFS servers at least. but still i'm worried it's not a good solution. the red hat docs say as much. "don't use this in production" to paraphrase 13:29 agoddard :D 13:31 pdurbin "This example uses NFS to share guest images with other KVM hosts. This example is not practical for large installations, this example is only for demonstrating migration techniques and small deployments. Do not use this example for migrating or running more than a few virtualized guests." -- http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/Virtualization/index.html#sect-Virtualization-KVM_live_migration-Share_storage_exam 13:32 pdurbin JoeJulian will lead me to the land of milk, honey, and glusterfs 13:32 agoddard CS seems happy with it.. I still wanna find a good iSCSI solution though 13:32 pdurbin i think oVirt would manage iSCSI for us well... 13:35 agoddard with CLMV? 13:35 agoddard CLVM 13:36 pdurbin hmm, that sounds familiar, but i don't know 13:42 * pdurbin wonders if there's a gitweb for the ovirt-docs: http://wiki.ovirt.org/wiki/Documentation#Source_Control 13:44 agoddard my gut at the moment says to go with OpenStack 'cause the community seems to be collecting around it, and maturing quick... it's always a trade off though.. oVirt and ProxMox would give us some nice storage out of the box with KVM, but we'd need to write the tooling around them for chef provisioning integration etc.. and I'd imagine with the turnkey storage we'd make tradeoffs for flexibility 13:45 agoddard one thing I still haven't tested is provisioning iSCSI LUNs directly to guest OS's for places where fast storage is needed, though that could be tested on any KVM setup.. 13:45 pdurbin westmaas: and we have you on our side :) #openstack 13:46 agoddard ..and there's that :D 13:46 pdurbin in the short term, open stack seems like a great way to have your own ec2. but ec2 is pretty different than traditional vmware. vsphere i guess you'd call it. where VMs are not so ephemeral 13:47 pdurbin at the red hat summit it was interesting to see how red hat is talking up both ovirt/rhev and openstack. sort of a portfolio of virtualization offerings 13:48 agoddard agreed, I like the idea of architecting for ephemeral VMs and persistent storage 13:48 agoddard though it's not as easy, I think it's the right move in the long run 13:49 westmaas woo! 13:49 westmaas :) 13:49 westmaas sorry I've been absent from here so much, just had a big release :) 13:49 pdurbin agoddard: you can read out iscsi and ovirt at http://www.ovirt.org/w/images/a/a9/OVirt-3.0-Installation_Guide-en-US.pdf 13:50 westmaas I know I was a staple before! :p 13:50 pdurbin westmaas: congrats 13:50 westmaas thanks 13:51 westmaas I can now confirm that openstack can be in production on thousands of compute nodes (not sure that I can say exactly how many thousands!) 13:51 pdurbin agoddard: the problem is that many of our users are *not* architecting for ephemeral VMs. you are the exception. they want their VMs to be persistent 13:52 pdurbin and openstack is working on persistent VMs but they're optimized for ephemeral VMs at the moment. hence, red hat is advocating ovirt/rhev for persistent VMs right now. ideally RHEV, from their per$pective :) 13:52 agoddard westmaas: saw the rackspace announcement, that's awesome :) 13:53 pdurbin link? 13:53 agoddard a few links in this post: http://www.openstack.org/blog/2012/08/celebrating-an-openstack-milestone/ 13:54 westmaas including a spam link! 13:55 westmaas I guess this is our official post: http://www.rackspace.com/blog/the-open-rackspace-cloud-better-faster-more-affordable/ 13:56 agoddard lulz.. didn't see that spam link ;) 13:56 pdurbin thanks 13:57 pdurbin "Today OpenStack marks another milestone in its young existence". that's the thing. it *is* young 14:01 pdurbin nice post 14:01 * pdurbin wonders how much rackerhacker is involved in ops for rackspace open cloud 14:02 rackerhacker deeply 14:02 rackerhacker :) 14:02 pdurbin heh 14:02 westmaas only if I keep yelling at him 14:02 westmaas rackerhacker: do more things 14:03 * rackerhacker rattles his chains 14:04 pdurbin oh, semiosis will help with the gluster fu too 14:12 ventz Anyone know of a good "gui" client for kvm management 14:13 ventz i do everything via the xml files/by hand and it's getting difficult 14:13 pdurbin ventz: virt-manager 14:13 ventz specifically -- when having to add drives/change boot order, etc...since that requires a decent modification of xml 14:13 ventz pdurbin: the thing i am afraid w/ virt-manager is that it doesn't create the proper xml files on the backend 14:14 pdurbin sure it does 14:14 ventz i still want to keep all the extra settings (like extended drivers, and block to prevent IP + arp spoofing) 14:14 ventz as in, if i create the xml template, will virt manager just modify what i do w/ it from that point on (ex: add CD drive, image, boot from it -- vs creating a new template) 14:15 pdurbin as a child, were you burned by microsoft word fscking up your loving hand crafted html? 14:15 ventz heh 14:15 ventz also, another question -- anything web based? 14:15 ventz my server is running on a system w/ no X 14:15 pdurbin in my experience, virt-manager is pretty non-invasive. please let me know if it destroys your xml, takes out stuff and what not 14:15 ventz yes, i can forward, but prever something web based anyway 14:16 pdurbin i've got nothin' 14:17 ventz there's this: http://www.linux-kvm.org/page/Management_Tools 14:17 ventz but some are half broken 14:17 ventz looked at: 14:17 ventz http://www.convirture.com/community.php 14:17 ventz and 14:17 ventz http://www.ovirt.org 14:17 ventz ovirt being "rhel" centric is annoying -- yes you can build for debian/ubuntu, but it's a pain 14:18 agoddard http://archipelproject.org always looked kinda neat 14:18 agoddard ovirt is built by the RHELV guys I think 14:18 pdurbin ovirt is upstream for rhev 14:18 agoddard rhev.. 14:18 agoddard ya 14:18 pdurbin fedora is upstream for rhel, etc. 14:19 pdurbin crimsonfubot: lucky RHEV 14:19 crimsonfubot pdurbin: http://www.redhat.com/products/virtualization/ 14:20 ventz pdurbin: looks like i found it: https://www.webvirtmgr.net 14:22 ventz there's also: http://karesansui-project.info 14:23 pdurbin ventz: cool. please let us know how they work out 14:23 ventz will try and let you know 14:25 pdurbin thanks 14:54 semiosis did somebody say gluster? 15:02 pdurbin :) 15:05 agoddard we were gonna upgrade our gluster today, but now we can't access it for some reason 0_o 15:05 * agoddard calls network admin 16:19 semiosis java devs: http://timberglund.com/blog/2012/08/03/the-maven/ 17:47 pdurbin sigh. [Mrusers-list] CBSCentral - https://lists.hcs.harvard.edu/pipermail/mrusers-list/2012-August/000794.html 18:19 pdurbin yay, it's back alive 18:43 agoddard https://groups.google.com/forum/#!topic/comp.protocols.time.ntp/vhVlH4ENsJQ 19:14 pdurbin agoddard: "WARNING: someone's faking a leap second tonight" huh? 19:29 pdurbin ha! a co-worker just said to me, "hey, perl, i mean, phil" :) 19:30 pdurbin i answer to both :) 20:06 agoddard semiosis: that maven poem is awesome 20:14 pdurbin to time to read it but i like the idea :) 21:02 pdurbin other people are reporting the same qemu-img corruption i mentioned above: qemu-img snapshot qcow2 problem - http://forum.proxmox.com/threads/7109-qemu-img-snapshot-qcow2-problem 21:03 pdurbin i just cleaned up a disk image and it didn't help the vm boot 21:03 pdurbin i may have to re-migrate it from vmware 21:03 pdurbin or forget about all this kvm business and pay for vmware :) 21:04 pdurbin ok, my kids aren't going to pick themselves up. have a good weekend, all