Time  Nick        Message
12:37 pdurbin     another day, another kernel panic
13:04 SEJeff_work only 1?
13:05 SEJeff_work pdurbin, Do you run an internal kerneloops server?
13:14 pdurbin     SEJeff_work: is a kerneloops server like a netdump server?
13:14 * pdurbin   googles
13:15 SEJeff_work pdurbin, Ever seen kerneloops.org?
13:15 pdurbin     heh. cowsay: http://en.wikipedia.org/wiki/File:Linux-2.6-oops-parisc.jpg
13:16 SEJeff_work Ha!
13:16 SEJeff_work <3 HP-UX on Parisc
13:16 SEJeff_work There is a HP Visualize B2000 Parisc under my desk at home
13:16 pdurbin     visiting http://kerneloops.com for the first time right now
13:17 SEJeff_work The .com isn't it, I guess the org is down. Weird
13:17 pdurbin     oh wait, .ORG
13:17 pdurbin     The server at kerneloops.org is taking too long to respond.
13:17 SEJeff_work It is a site for tracking kernel oopses that upstream looks at for trends and whatnot
13:17 SEJeff_work Fedora / Redhat's bug tracker thingy abrtd has a kerneloops plugin
13:17 pdurbin     huh. cool. when it works :)
13:18 SEJeff_work So it will detect oopses and send them to a kerneloops server
13:18 SEJeff_work if you run your own kernel oops server, it can be configured to send them all there
13:18 SEJeff_work We haxor our kernels a bit so we need that sometimes :)
13:19 pdurbin     wow, not much here: http://wayback.archive.org/web/*/http://kerneloops.org
13:20 pdurbin     ok, here's one: http://web.archive.org/web/20080511210114/http://www.kerneloops.org/
13:20 pdurbin     "Oopses are collected from the linux-kernel mailing list (and a few related lists), the bugzilla.kernel.org bugzilla and from the client application that you can download from this page."
13:23 pdurbin     process: rm, call trace: xfs
13:23 pdurbin     our xfs history is a bit checkered: http://blog.jcuff.net/search/label/XFS
13:24 pdurbin     http://blog dot jamesdotcuff dot net: of search and discovery: when storage goes bump in the night - http://blog.jcuff.net/2012/01/of-search-and-discovery-when-storage.html
13:25 pdurbin     http://blog dot jamesdotcuff dot net: xfs minus fun and profit. - http://blog.jcuff.net/2012/04/xfs-minus-fun-and-profit.html
13:26 pdurbin     though i did enjoy dave chinner's talk on XFS: James Cuff - Google+ - http://blog.jcuff.net/2012/01/of-search-and-discovery-when-… - https://plus.google.com/111523359226039496180/posts/RDvMDvzbUag
13:26 pdurbin     i've been meaning to check if the slides are available yet
13:27 pdurbin     XFS: Recent and Future Adventures in Filesystem Scalability - Dave Chinner - YouTube - http://www.youtube.com/watch?v=FegjLbCnoBw
13:28 pdurbin     oh good, the slides are at http://xfs.org/images/d/d1/Xfs-scalability-lca2012.pdf via http://xfs.org/index.php/XFS_Papers_and_Documentation
13:30 pdurbin     "There's a White Elephant in the Room.... BTRFS will soon replace ext4 as the default Linux filesystem thanks to its unique feature set. Ext4 is now being outperformed by XFS on its traditionally strong workloads, but is unable to compete with XFS where it is traditionally strong. Ext4 has serious scalability challenges to be useful on current, sub-$10,000 server hardware. Ext4 has become an aggregation of semi-finished projects that don't play w
13:32 SEJeff_work xfs... color me not impressed
13:32 SEJeff_work It has the crash resiliency of reiserfs3
13:32 SEJeff_work meaning it eats itsself just about every time the system is powered off hard and there is something writing to disk
13:33 SEJeff_work However it does well with big 12Tb + filesystems. We use ext4 instead of xfs as ext4 tends to break less
13:34 SEJeff_work especially now that ext4 has online resize that doesn't suck
13:34 pdurbin     my experience with xfs, which is limited to a few months, has been mostly negative (per the blog posts above) but dave chinner's talk is worth watching.  it's sort of a "state of the filesystems" talk at the end
13:35 pdurbin     i'm fuzzy on the details, would need to watch again, but i was left with the impression that all is not well in ext4 land
13:36 SEJeff_work Yeah I watched a talk from Ted Tso (ext* head cheese) at LPC last year about it. He changed some of the features like the writeback stuff to similar to how XFS does it
13:37 pdurbin     meanwhile, /proc/mdstat says this resync will be done in ~68 hours: "[>....................]  resync =  1.0% (79990656/7811891008) finish=4065.9min speed=31693K/sec"
13:37 SEJeff_work However, it still handles crashes better than xfs. If ext4 would just add sane snapshotting, it would be *good enough* for most use cases.
13:37 SEJeff_work ouch! How big is the volume?
13:47 pdurbin     `blockdev --getsize64 /dev/md1` returns 31997505568768, which is in bytes, so ~29 TB. 512-byte sectors, per `blockdev --getss /dev/md1` also: http://stackoverflow.com/questions/1027037/determine-the-size-of-a-block-device/2802956#2802956
13:54 SEJeff_work And it runs xfs?
13:54 pdurbin     yep
13:55 pdurbin     echo $[ `cat /sys/block/md1/size` * 512] gives me the same number, 31997505568768
13:58 pdurbin     what's the quickest, dirtiest way to set up graphing of website performance?
13:58 pdurbin     shuff: in opsview we called this performance data, i think
13:58 pdurbin     plots of server room temperature, for example
13:58 shuff       perfdata is a nagiosism, not an opsviewism
13:58 shuff       but you are correct
13:58 pdurbin     oh good!
13:59 pdurbin     opsview and nagios are terribly conflated in my mind
13:59 shuff       the standard Nagios check_http binary returns perfdata
14:01 pdurbin     shuff: so with stock nagios (3.3.1), can i just flip a switch and start collecting perfdata for a check_http check?
14:01 shuff       if you want to leverage your existing Nagios infrastructure, just put up http://nagiosgraph.sourceforge.net/
14:02 shuff       i suspect you are already collecting the perfdata - look at /var/log/nagios/perf* iirc
14:02 pdurbin     this looks to be it... http://nagios.sourceforge.net/docs/3_0/configmain.html#process_performance_data
14:04 pdurbin     nagios.cfg:process_performance_data=0
14:05 shuff       there should be nagiosgraph RPMs available from the usual purveyors
14:06 SEJeff_work Longer term, consider setting up graphite
14:06 SEJeff_work Ever see the talks Metrics, Metrics, Everywhere! Or Metrics Driven Engineering?
14:06 pdurbin     i was wondering if sensu collects and graphs perfdata out of the box
14:07 pdurbin     i've heard good things about graphite as well
14:08 SEJeff_work pdurbin, Graphite is a good backend and it is great for on the fly reporting stuff, but for a "monitoring dashboard" sort of thing, nothing beats slapping gdash ontop of graphite
14:08 SEJeff_work pdurbin, Also we have a salt user who uses sensu and has some states online for it: https://github.com/blast-hardcheese/blast-salt-states/tree/master/sensu if you are interested
14:11 pdurbin     nice. agoddard is into sensu: http://irclog.perlgeek.de/search.pl?channel=crimsonfu&nick=&q=sensu
14:13 SEJeff_work Not really sold on logstash
14:13 SEJeff_work It is for converting logs into different formats, which is cool, but all of my logs are syslog
14:13 SEJeff_work About 30G / day
14:13 SEJeff_work So we skipped over logstash and are playing with graylog2 and elasticsearch. So far so good
14:45 * agoddard  reads backlog
14:57 agoddard    Graylog2 is awesome, but only listens on syslog & GELF. We originally switched to logstash so we could use elasticsearch (Graylog only rocked mongoDB). Now we just use both. Logstash instances collect & ship logs around everywhere, then indexers grab the logs from rabbitMQ and throw them into ES and Graylog2
14:57 agoddard    graylog2 gives us user managment etc, so they work really well together
14:57 SEJeff_work agoddard, graylog2 moved away from mongo
14:57 SEJeff_work all syslog data is in elasticsearch
14:58 agoddard    sensu metrics checks can go to AMQP or straight to openTSB, graphite etc.. they do the timestamp,name,value thing and plugins can be written in any language
14:58 SEJeff_work mongodb is fail. Graylog2 still uses mongo, but just for user preferences and graphing, which is annoying, but fine
14:58 SEJeff_work Thats pretty sexy
14:58 agoddard    SEJeff_work: ya, it was awesome when they switched to ES.
14:58 SEJeff_work Has anyone built something ontop of openTSB like graphite?
14:59 agoddard    we actually tunnel our logs from RC to MBL, logstash is killer for this w/amqp
14:59 SEJeff_work RC and MBL? Forgive me, I'm new here.
14:59 agoddard    afaik, no.. we're not using openTSDB yet in prod, but might end up having a similar setup to our LS/Graylog thing, where we have an archive and then more realtime source for metrics
14:59 agoddard    RC->MBL, basically Data center 1 to data center 2
15:00 SEJeff_work agoddard, Have you looked at graphite + ceres?
15:00 SEJeff_work Oh right
15:00 SEJeff_work We have 40+ locations with servers
15:00 SEJeff_work centralized syslog replicated to 2 main locations
15:00 SEJeff_work with a cluster local aggregated copy in each cluster
15:02 agoddard    nice, what happens if sites can't see the centralized syslog?
15:02 SEJeff_work Well to minimize network issues, we have every self sustainable
15:02 agoddard    re: ceres, nope (/me googles..)
15:03 SEJeff_work ie: 2 ldap + vip 2 dns + vip local syslog aggregator + vip
15:03 SEJeff_work then the local aggregator forwards to the central aggregator
15:03 SEJeff_work rsyslog can cache on disk or in memory and forward when it can reconnect
15:03 agoddard    nice. We queue the shiz out of everything, so things catch up when the netsplit is over, but with Sensu we're going to add servers at each site too, so they can deal with a global queue
15:05 SEJeff_work Do you use rabbit or activemq with sensu?
15:05 SEJeff_work and why
15:06 agoddard    rabbit just 'cause we were already using it for a few apps & logstash
15:06 * agoddard  has a note to read this: http://blog.aggregateknowledge.com/tag/zeromq/
15:10 SEJeff_work agoddard, Have you seen the video of the guy who wrote amqp talk about why he wrote zeromq?
15:10 SEJeff_work also rabbitmq doesn't really do mesh at all if I recall
15:11 SEJeff_work activemq does it. It is a beast to configure in their gawdy xml config, but we have a 4 way mesh setup and it is stable
15:11 agoddard    it does fanout, which.. might be similar? our topology is pretty basic & needs some love & attention
15:20 agoddard    SEJeff_work: haven't seen the video, would love to, I need to get more up to speed on 0mq
15:20 agoddard    SEJeff_work: you at harvard?
15:26 SEJeff_work agoddard, I am not at harvard
15:27 SEJeff_work I work for a "high frequency trading" company
15:27 SEJeff_work http://twit.tv/show/floss-weekly/195
15:27 agoddard    oh nice. I'm not at Harv either (but a lot of our environment is there, so we work a lot with the awesome research computing folks)
15:28 agoddard    thanks for the vid, will check it out
15:28 SEJeff_work Yeah I never went to college. Instead, I decided to fly remote control spy planes for the US Army. The Hunter and Shadow 200 in specific.
15:32 agoddard    nice. I used to human controlled planes that did way less interesting stuff, just got my old logbook shipped to me, going to do a few hours flying this summer hopefully..
15:32 agoddard    I crashed the only RC plane I ever owned pretty quickly :(
15:35 * agoddard  has to netsplit, back on after lunch
15:38 SEJeff_work later
15:39 SEJeff_work Well this was a "rc" that had a radius range of 160km
15:39 SEJeff_work and a ceiling of 14k msl
16:36 pdurbin     moved our website to new hardware, bumped the ram. perfdata indicates almost no change but the user is happier. *shrug*  i'm going to lunch
16:39 pdurbin     grep -P "$IP\thttp\t" /tmp/service-perfdata | perl -lane 'print "@F[1,15]"' | while read i j; do echo "`date --date=@$i +%Y-%m-%d_%R` $j seconds"; done
16:43 pdurbin     i had simply uncommented '#service_perfdata_file=/tmp/service-perfdata' shuff. doesn't go to /var/log/nagios by default. anyway. lunch!
17:32 pdurbin     is `puppet agent --noop --no-daemonize --onetime --verbose --no-splay --debug` the way to see what puppet will do to a host without actually doing anything?
17:32 SEJeff_work pdurbin, puppetd --test --noop also works
17:33 SEJeff_work --debug is really intense
17:33 SEJeff_work Normally only necessary when you're troubleshooting an exec and the exec is failing for instance
17:35 pdurbin     --debug *is* intense, but i guess i need it to see what would happen
17:38 Pax         to see a noon run I normally just do 'puppets -tv --noop'
17:40 pdurbin     yeah, that makes sense. thanks
17:40 pdurbin     wait -t implies -v, i think
17:41 pdurbin     yeah, it does
17:54 pdurbin     i highly recommend that zeromq video.  you can also just listen to the audio, like i did (twice). i don't think i missed anything only having the audio.  just talking heads. http://www.podtrac.com/pts/redirect.mp3/twit.cachefly.net/floss0195.mp3