Time  Nick      Message
01:08 jimi_c    pdurbin: nice
01:08 jimi_c    we're talking about moving solr in-house, vs. using a 3rd party hosted service
03:09 pdurbin   jimi_c: i thought i'd be able to just throw any plain text into it (like the crimsonfu logs from https://github.com/crimsonfu/crimsonfu.github.com/tree/master/irclogs ) but it wants certain data
03:09 pdurbin   "Solr is built around the concept of schemas; it needs to know the shape of the data it is going to accept" -- Building a Search Engine with Nutch and Solr in 10 minutes | Building Blocks Knowledge Share - http://blog.building-blocks.com/building-a-search-engine-with-nutch-and-solr-in-10-minutes
03:09 pdurbin   so maybe nutch can help with this
03:16 jimi_c    you might like elasticsearch better than
03:17 jimi_c    then
03:17 jimi_c    http://www.elasticsearch.org/
03:18 jimi_c    their big selling point is that it's schema free
03:27 pdurbin   jimi_c: ah. i didn't realize. thanks
03:29 pdurbin   semiosis: you were trying to sell me on elasticsearch. i didn't know about the schema then: http://irclog.greptilian.com/sourcefu/2012-12-11#i_796
03:30 semiosis  schema?
03:30 semiosis  idk about the schema now
03:32 * pdurbin looks at http://solr-vs-elasticsearch.com again
03:33 pdurbin   jimi_c: but you can only feed elasticsearch json?
03:35 jimi_c    yes you feed it json, but you can feed it whatever json you want and it will index it based on how you want
03:38 pdurbin   ok. for the app at work, i think we'll be feeding it xml... DDI to be specific: Data Documentation Initiative - http://en.wikipedia.org/wiki/Data_Documentation_Initiative
03:38 pdurbin   and this sounds like us, honestly... "For our project we want to decide and to know which search-fields we have, so we need a schema" -- sentric » Why we chose Solr 4.0 instead of ElasticSearch - http://www.sentric.ch/blog/why-we-chose-solr-4-0-instead-of-elasticsearch
03:39 jimi_c    yea i'd agree with that
03:39 jimi_c    if that data's already in xml solr would be the way to go
03:39 pdurbin   yeah
03:39 pdurbin   here's the solr ticket i'm working on: https://redmine.hmdc.harvard.edu/issues/2656
03:39 jimi_c    you've got a constrained document format, creating a schema for that should be painless (and someone else has probably already done it)
03:40 pdurbin   jimi_c: right, I *think* someone has already written one for DDI: https://github.com/btp/ssedl-solr/blob/master/ddi-to-solr.xsl
03:41 jimi_c    oh xsl... i do not miss having to deal with that
03:41 pdurbin   http://irclog.iq.harvard.edu/dvn/2013-01-20 ... i was taking some notes earlier
03:42 pdurbin   i *would* like some more search options for the crimsonfu logs. i usually use ack locally. the search at http://irclog.perlgeek.de/search.pl?channel=crimsonfu is a little hit or miss
03:43 pdurbin   i mentioned nutch+solr to the developer of ilbot, the logging bot: http://irclog.perlgeek.de/ilbot/2013-01-20#i_6356316
03:43 pdurbin   at some point he had mentioned wanting to improve the search
03:45 pdurbin   jimi_c: here's the crimsonfu log data if you have any ideas: https://github.com/crimsonfu/crimsonfu.github.com/tree/master/irclogs ... not sure what the best solution would be
03:46 pdurbin   westmaas: heh. my favorite commit ever :) ... "I don't really do work" --westmaas · 4e7d079 · crimsonfu/crimsonfu.github.com - https://github.com/crimsonfu/crimsonfu.github.com/commit/4e7d0796e03d8af871080dbedda9b431ae4b6d6d
03:48 westmaas  pdurbin: :D
03:49 westmaas  did you get your sourcefu registration taken care of?
03:51 pdurbin   nope
03:53 pdurbin   jimi_c: oh. this was interesting... curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@tutorial.html"
03:54 pdurbin   i was able to throw an html file into solr with that. via How do I index HTML files into Apache SOLR? - Stack Overflow - http://stackoverflow.com/questions/13179754/how-do-i-index-html-files-into-apache-solr/13192390#13192390
04:13 pdurbin   "Google BBS Terminal – What Google would have looked like in the 80s" -- http://www.masswerk.at/googleBBS/
04:26 jimi_c    pdurbin: that's interesting
04:26 jimi_c    we're using drupal + the solr module, so i figure it does something similar to index its own pages
12:50 pdurbin   jimi_c: gotcha. yeah, there's a whole group: http://groups.drupal.org/lucene-nutch-and-solr
12:52 pdurbin   speaking of search, i guess *have* mentioned http://lucy.apache.org here before: http://irclog.perlgeek.de/crimsonfu/2012-08-02#i_5868501
12:53 pdurbin   lucy/kinosearch is what moritz, the ilbot guy mentioned: http://irclog.perlgeek.de/ilbot/2013-01-21#i_6356809
12:57 pdurbin   here's what powers http://irclog.perlgeek.de/search.pl?channel=crimsonfu today: https://github.com/moritz/ilbot/commits/master/cgi/search.pl
21:16 semiosis  pdurbin: will you announce when it's safe to return to #sourcefu?
23:04 pdurbin   semiosis: absolutely i will :)