Time Nick Message 01:05 pdurbin i'm only a couple clicks into the weechat website and i'm already offended by a screenshot of a woman in bondage: http://www.weechat.org/screenshots/weechat_2010-02-22_caleb.png/ terrible first impression 01:07 pdurbin i'm reminded of how i spent a little time on http://geekfeminism.wikia.com/wiki/FLOSS the other week. open source has a long way to go, sadly 01:12 pdurbin i just linked back to here and http://irclog.perlgeek.de/crimsonfu/2012-09-25#i_6017406 in #weechat and let them know i got a bad first impression 02:49 pdurbin you guys know who Dave Winer is, I hope. I finally tried out his "symphony of software" and wrote about it here: Philip Durbin - Google+ - As a fan of owning my own data (and data liberation) I've… - https://plus.google.com/107770072576338242009/posts/bFaQpQNfb6J 02:49 pdurbin ah. exited wine. might be time for a shower ;) 12:09 boegel hi y'all! 12:10 pdurbin boegel: good morning :) 12:11 boegel this is a room full of hard-core sysadmins? scary! 12:11 pdurbin heh 12:11 boegel Itkovian is my colleague @ HPC-UGent 12:11 * Itkovian ducks 12:11 pdurbin Itkovian: hi 12:11 Itkovian yeah. hi too :-p 12:11 boegel if EasyBuild is broken, blame him :P 12:12 pdurbin so you guys want to talk about my tweet? https://twitter.com/philipdurbin/status/249898230073135104 12:12 boegel pdurbin: sure! 12:12 pdurbin " use "module load" on your HPC cluster? try building with easybuild. @fasrc plans to http://hpcugent.github.com/easybuild/ " 12:12 boegel pdurbin: tell us how you picked up on EasyBuild, and how you like it so far 12:12 Itkovian I'll just lurk for a while, got work to do :-) 12:12 pdurbin me too :/ 12:12 boegel don't we all... 12:13 pdurbin didn't one of you leave a comment on james cuff's blog? 12:13 boegel pdurbin: ah, yes, jgtimmer did 12:13 boegel pdurbin: he's not in today though 12:13 boegel pdurbin: we're trying to promote EasyBuild where it seems relevant 12:13 pdurbin anyone have a link? 12:13 boegel pdurbin: we've only started to make it available in public since a couple of months, and are trying to get feedback mostly 12:14 boegel pdurbin: the blog was down yesterday, I think 12:14 pdurbin here it is. from Jens: http://blog dot jamesdotcuff dot net: scientific software as a service sprawl... - http://blog.jcuff.net/2012/07/scientific-software-as-service-sprawl.html?showComment=1348407332993#c6395090447973051400 12:15 pdurbin i work for james. when he saw that comment, he forwarded it to us 12:15 pdurbin i haven't really had a chance to look at easybuild 12:15 pdurbin oh, did you see the recent blog post about modules from dell? 12:15 boegel pdurbin: no 12:16 pdurbin it's by Jeff Layton: Auditing Environment Modules - Home Base for HPC Professionals - http://hpc.admin-magazine.com/Articles/Auditing-Environment-Modules/ 12:16 boegel pdurbin: your tweet seemed to suggest you looked into it and really liked it, which we were happy about 12:16 boegel pdurbin: we're working hard on EasyBuild v0.9, which is going to include quite a bit of changes compared to v0.8 12:16 pdurbin boegel: sorry to give you that impression. really i only looked at the home page 12:17 boegel pdurbin: so, I wanted to contact you to make sure you don't start coding like hell against v0.8, cause v0.9 will break that work 12:17 pdurbin heh. ok 12:17 boegel pdurbin: our v0.9 milestone is set for end of Sept, but it's going to have to shift a bit probably 12:17 boegel pdurbin: does james cuff's blog work on your end? it doesn't here... 12:18 pdurbin my involvement with our modules system so far has centered around exposing our list of modules as JSON and writing a little script to query it: https://github.com/fasrc/api/blob/master/modules 12:18 boegel pdurbin: hpc admin magazine is a Dell thing? 12:18 boegel pdurbin: because I did see that article, it's not the first time they've run that series on module 12:18 pdurbin uh. i think Jeff Layton works for dell? 12:18 boegel *modules 12:18 * boegel doesn't know 12:19 boegel pdurbin: LinkedIn says he does, yes 12:19 pdurbin ok #notcrazy 12:19 boegel pdurbin: so, anyway, are you guys planning to look into EasyBuild? 12:20 pdurbin personally... i have a lot on my plate at the moment. but i think we should 12:20 pdurbin other people on my team do the software building 12:20 boegel how are you guys handling software builds now? any framework or somesuch you're using? 12:20 pdurbin i'll certainly point them to this conversation 12:21 * pdurbin checks docs 12:21 boegel :) 12:21 boegel pdurbin: are you, or the software build guys in your team, planning to attend SC'12? 12:21 pdurbin i see a docs/cluster/modules/template.mdwn... 12:22 boegel that's a template module? or? 12:22 boegel ah, now, it's MarkDown probably 12:22 pdurbin yeah, we use ikiwiki. (i love ikiwiki) 12:23 pdurbin ha! "modulefile_template.very_simple is what I always use; modulefile_template.with_prereqs has some logic for auto-unloading conflicting modules, but I don't know how it works" 12:23 pdurbin "There's a little helper named generate_setup.sh in the hpc/rc module to make this easier and more consistent. Just run it with the -m switch and the directory you're trying to setup, and it'll search for the variables that need to be set." 12:24 pdurbin i'm not sure if this is helpful at all :) 12:24 boegel pdurbin: ok, that's for creating modules 12:24 pdurbin isn't that way easybuild is for? creating modules? 12:24 boegel pdurbin: but how about building the software packages themselves? 12:24 Itkovian pdurbin: creating the module files is a side effect, mostly :-) 12:24 boegel pdurbin: it builds and installs software in a custom path, with a specified compiler toolchain, and then also creates module files for that software, yes 12:25 pdurbin ok, that's what i thought 12:25 Itkovian no building => no module files 12:25 boegel pdurbin: we have a workshop paper on EasyBuild that we've submitted to the PyHPC workshop at SC'12, we can show you a preprint if that's helpful 12:26 pdurbin if you can make it public you could post a link on james's blog 12:26 pdurbin (if it's up) :P 12:26 boegel pdurbin: we'll, we can't make it public yet, we're awaiting the acceptance decision for the workshop 12:27 boegel pdurbin: but we can mail it for you (and your colleagues) to read 12:27 boegel pdurbin: either way, we should get the acceptance decision Oct 1st 12:27 pdurbin sure. if you mail it to rchelp@fas.harvard.edu it will land in our ticketing system (RT) 12:29 boegel is that a good idea? I don't want to give the impression we're spamming the Harvard helpdesk :) 12:29 pdurbin heh. no it's cool. i'll take the ticket 12:29 pdurbin hey, have you guys looked at this? SoftwareCollections - FedoraProject - https://fedoraproject.org/wiki/SoftwareCollections 12:29 pdurbin i keep meaning to add that as a comment on james's blog. the same post you commented on 12:30 pdurbin "The concept of Software Collections allows multiple versions of software to be installed at the same time without interfering in any negative way with the standard versions provided by the system." 12:30 boegel pdurbin: we briefly looked into it, well, the Red Hat counterpart at least 12:30 pdurbin ok. i chatted with some red hat guys about it at their summit this summer 12:30 boegel pdurbin: that's what EasyBuild does, but our focus in on HPC software 12:30 Itkovian Mind that EasyBuild development has started three years ago, when there were no SoftwareCollections :-) 12:30 pdurbin sure. and that's our focus as well 12:31 boegel pdurbin: any of you guys planning to attend SC'12? 12:31 Itkovian but yeah, they seem to have the same idea 12:31 pdurbin well, how old is modules itself? 20 years? :) 12:31 pdurbin i think james always goes to SC... 12:31 Itkovian however, it would be hard for us to rely on RPMs, since we have custom/commercial software that meeds to be installed too 12:31 pdurbin Itkovian: sure. us too. commercial software 12:32 Itkovian so yeah .. you know how that works out ... 12:32 boegel pdurbin: I'd love to chat with you guys at SC'12... Jens and I will be there the whole week 12:33 pdurbin http://blog dot jamesdotcuff dot net: disruptive things spotted so far at #SC11 - http://blog.jcuff.net/2011/11/disruptive-things-spotted-so-far-at.html 12:34 pdurbin boegel: i'll ask james to look for you :) 12:34 boegel pdurbin: I can schedule an (informal) meeting with him, if he's OK with that 12:34 boegel pdurbin: is he into beers? :) 12:35 pdurbin he's english. of course he is 12:36 boegel pdurbin: :D 12:36 boegel pdurbin: mailed our paper to the rchelp@ address 12:36 boegel pdurbin: maybe show it to James, and ask him if he's OK with meeting up at SC'12? 12:37 pdurbin sounds like a plan 12:42 boegel pdurbin: so, what's crimsonfu about? grouping together sysadmins who release their tools as open-source? 12:42 pdurbin it's an experiment of mine. not a harvard thing 12:43 pdurbin http://crimsonfu.github.com is my attempt to explain it :) 12:43 pdurbin we talk about puppet, chef, kvm, etc. you name it. you're welcome to hang out here 12:45 boegel pdurbin: we will :) 12:45 boegel pdurbin: ever heard of Quattor? 12:45 pdurbin crimsonfubot: google quattor 12:45 crimsonfubot pdurbin: quattor - fabric management for grids and clouds: <http://quattor.sourceforge.net/>; Quattor: <http://www.quattor.com.br/>; Quattor - Wikipedia, the free encyclopedia: <http://en.wikipedia.org/wiki/Quattor>; Quattor (company) - Wikipedia, the free encyclopedia: <http://en.wikipedia.org/wiki/Quattor_(company)>; Quattor, Quattor Petroquímica S.A., Company Profiles,: (1 more message) 12:45 pdurbin nope. thanks 12:45 boegel pdurbin: it's what we use here instead of Puppet to deploy all our systems 12:46 pdurbin interesting 12:46 boegel pdurbin: right now, about 500 spread out over 5 clusters 12:46 pdurbin sjoeboo just arrived with his suitcase. he's off to puppet conf :) 12:46 boegel pdurbin: :) 12:54 pdurbin Science Collaboration Framework | MIND Informatics - http://www.mindinformatics.org/node/3 12:54 pdurbin About | Science Collaboration Framework - http://sciencecollaboration.org 12:55 pdurbin "The Science Collaboration Framework (SCF) is a software toolkit to establish web-based virtual team organizations for researchers in biomedicine" 12:55 pdurbin "eXframe - a subproject of SCF - is a reusable framework for building genomics experiments repositories" http://sciencecollaboration.org/exframe 12:56 boegel nice 13:19 boegel pdurbin: how big is the HPC support team at Harvard, including sysadmins? 13:19 Itkovian also, how big are the cluster(s)? 13:20 Itkovian large variety of software? 13:21 pdurbin we don't run the only cluster at harvard but http://rc.fas.harvard.edu/about-rc/research-computing-staff/ 13:22 boegel about 15 people, ncie 13:22 boegel pdurbin: the systems are not university-wide? 13:22 pdurbin it's complicated :) 13:22 boegel pdurbin: it's only for the arts and sciences faculty? 13:22 boegel Itkovian: http://software.rc.fas.harvard.edu/ganglia/ganglia2_master/ 13:23 boegel Itkovian: 20000 cores in total, about 1750 systems 13:23 Itkovian nice 13:25 pdurbin boegel: more on the way: http://en.wikipedia.org/wiki/Massachusetts_Green_High_Performance_Computing_Center 13:26 boegel pdurbin: that sounds promising :) 13:33 pdurbin :) 14:23 whorka Harvard has Google apps now: http://g.harvard.edu/ 14:23 Itkovian Is everybody here from harvard? 14:24 boegel Itkovian: well, we're not :) 14:31 Pax Not everyone but a bunch :) 14:37 pdurbin Itkovian: invite your friends :) 14:37 Itkovian lol 14:37 Itkovian except for boegel, I think only jgtimmer lurks on IRC 14:53 boegel I'm starting to screw things up, so almost time to go home... 14:53 pdurbin heh. see ya 14:53 Itkovian I will not reply to that 14:55 * pdurbin reads http://techtalk.daudfam.net/mysql/mysqlinnodb-unable-to-lock-issue 15:02 pdurbin oh yeah, i forgot there's a http://dba.stackexchange.com 15:19 boegel ttyl guys 15:19 boegel and gals 15:19 pdurbin do we even have any gals? we should recruit some 15:55 pdurbin "got it started with innodb_force_recovery=5" 15:57 Pax So I'm probably the last person to have caught this.. but triggers in cobbler are cool! 15:57 Pax https://github.com/cobbler/cobbler/wiki/Triggers 16:06 pdurbin jimi_c: ^^ 16:08 pdurbin Pax: we have /var/lib/cobbler/triggers/install/pre/clean_puppet.sh 16:30 pdurbin whorka: nothing under "additional services" for me :( #googleapps 16:40 whorka I had to set security questions and do a password change per http://g.harvard.edu/g-start.html 16:41 pdurbin yeah, i did that 16:41 pdurbin maybe i'm being punished for jumping the gun... for trying yesterday 16:41 whorka nah, we had people signing up yesterday too 19:05 pdurbin "heavy inserts" 19:08 pdurbin boegel: for the record, i got your paper via ticket #29067 19:19 ventz pdurbin: links reminder :) 19:23 pdurbin ventz: oh, the qcow2 corruption? 19:23 pdurbin ventz: here you go http://irclog.perlgeek.de/crimsonfu/2012-08-03#i_5871320 19:24 pdurbin related tweet: https://twitter.com/philipdurbin/status/233280438884515840 19:25 pdurbin which i sent to the author of this paper, whom i met at the red hat summit in july or whenever that was: The QCOW2 Image Format - http://people.gnome.org/~markmc/qcow-image-format.html 19:25 pdurbin nice guy. i'm sure he's busy 19:31 ventz pdurbin: did he ever respond to your tweet? 19:31 pdurbin nope 19:32 pdurbin in practice, we're not treating that corruption super seriously... the VMs seem fine. but i do want to clean it up. i have a half written nagios check 19:38 pdurbin ventz: i'm glad you've never seen this. it gives me hope 19:48 ventz what scares me, i have'nt seen it on the old version either 19:48 ventz so now the question becomes did it screw up a file somewhere and i just never noticed 19:48 ventz but you are saying it screws up the whole VM image 19:55 pdurbin `qemu-img check` says so. but again the VM seems to work fine... 19:55 pdurbin though snapshots are a mess. i'm sure they aren't reliable. if even accessible 19:59 ventz hmm 20:02 pdurbin ventz: but you're a fan of NFS for VM disk images? 20:02 pdurbin we sometimes blame NFS for the qcow2 corruption. but we really don't know 20:02 pdurbin and haven't looked into it deeply 20:03 pdurbin i wonder what oVirt does. if it does the same qemu snapshotting under the hood 20:03 pdurbin JoeJulian likes gluster for VM disk image storage. i think :) 20:11 ventz pdurbin: i am, i like NFS in general and hate iscsi 20:11 ventz i looked a lot into iscsi b/c of the performance mentions 20:11 ventz and once i read up on it I gave up (well after i did my own tests and verified the performance mentioned in some white papers) 20:14 pdurbin agoddard: you love iscsi. fight! fight! 20:15 Pax why do you hate iscsi? 20:15 Pax we've got lots of it, and haven't seen any performance issue.. what were you guys seeing? 20:19 pdurbin Pax: i think this was in ventz's basement :) 20:19 pdurbin ventz: no offense :) 20:22 pdurbin i like NFS too. easy 20:22 pdurbin but i worry about too many VMs on NFS 20:25 Pax :) 20:26 pdurbin don't make me link to red hat's doc again. the one that says don't use NFS for VM disk images in production 20:26 Pax I feel like often we "over buy" thing like storage… so we spend more on FC or high end disk when lower end, cheaper solutions perform as good, or similarly 20:27 semiosis i hope if you're doing vms over nfs you're at least using tcp,noac,sync 20:27 semiosis but it still seems dangerous 20:27 agoddard we've had great success with iSCSI backed LVMs backing our old xen cluster, but there's a lot of crap involved in making sure you don't corrupt Volume Groups in homegrown setups.. 20:28 agoddard I like the idea of orchestrations tools using SAN APIs to carve out LUNs for VMs, but then it's $$ gear. 20:28 Pax agoddard: good, cheap or fast.. pick two :) 20:28 agoddard more and more I'm a fan of (insert as raw as possible, fast, shared storage) for persistent volumes, and then local LVM devices for ephemeral storage 20:29 semiosis pdurbin: have you heard of the work going on to make qemu talk directly to gluster (without going throuh a FUSE mount?) 20:29 agoddard Pax: +1 20:29 agoddard there's also ATAoE which seems pretty nice, but we haven't used it yet 20:30 agoddard Ceph Rados block devices sound cool too :D 20:30 pdurbin semiosis: somewhat. yes. the gluster thing 20:31 pdurbin comptona: you can keep telling us about ceph :) 20:31 pdurbin ata over ethernet, i guess? sounds like iscsi... 20:32 pdurbin crimsonfubot: lucky ataoe 20:32 crimsonfubot pdurbin: http://en.wikipedia.org/wiki/ATA_over_Ethernet 20:37 comptona pdurbin: is there anything in particular you'd like to know about Ceph? 20:40 pdurbin_m is it awesome? 20:40 JoeJulian Last I checked, and that was a long time ago in the area of clustered storage (and my ability to remember things) ceph had a central metadata server. Is that true today? 20:40 pdurbin_m are you using it in production yet? 20:41 comptona JoeJulian: yes, it has a central metadata server 20:41 comptona and yes, I'm using it in production, but at a very small scale (four nodes) 20:41 JoeJulian So how does it handle redundancy wrt metadata then? 20:42 comptona oh, I misunderstood your question 20:43 comptona so, the metadata service is provided by one daemon at a time 20:43 comptona but you can run as many of them as you want, and they talk amongst each other 20:43 comptona so if the current active one fails, the remainder hold an election and one of them becomes the new active service 20:44 comptona I'm not 100% sure of the mechanism by which the metadata servers coordinate 20:44 JoeJulian So it's active/passive[/passive...] Doesn't that create a bottleneck? 20:45 comptona possibly? I imagine the metadata is very small relative to actual data 20:45 comptona we don't actually use the metadata service ourselves 20:46 JoeJulian Oh? I thought that was a necessity in order to address your data. 20:46 comptona nope 20:47 comptona so, ceph has several different access methodologies 20:47 comptona you can use the librados library, which talks to the object store directly 20:47 comptona you'd manage metadata yourself in that case 20:47 comptona or you can use the kernel driver to create a block device 20:48 comptona which you'd have to format with a standard filesystem, and then handle metadata that way 20:49 comptona I think you only need the metadata service if you're using the S3-style object store or mounting a volume directly ("mount -t ceph x.x.x.x:/ /mnt/ceph") 20:49 comptona since we're just providing volumes to openstack instances, we don't have any file-level access directly to ceph 20:49 JoeJulian In the case of a block device, it's "striped and replicated across the entire storage cluster". How does it know which server to find any particular offset? 20:50 comptona basically a super-fancy hashing algorithm 20:50 comptona http://ceph.com/wiki/Custom_data_placement_with_CRUSH 20:52 comptona I won't pretend I understand all of the details behind the placement group stuff 20:54 JoeJulian Judging by the wiki, it's definately grown up a lot over the last 2 1/2 years. :D 20:55 comptona I just learned about it for this project, but it definitely seems impressive 20:56 comptona the performance is generally really good, too 20:58 comptona the one thing I'm running into is that there's a pretty bad dropoff of small-block write speed from the raw performance level 20:58 comptona everything else is about as fast as I could expect it to be