
--- Log opened Sat Feb 18 00:00:00 2012
--- Day changed Sat Feb 18 2012
rdbthose are pretty much the three reasons people come to amsterdam00:00
yashgarothI'm meeting family there anyway, but surely there's museums and stuff00:00
yashgaroththough I was there a couple years ago and saw all the museums, so hookers & drugs it is00:00
rdbyou used to be able to buy psilocybin mushrooms as a tourist, but they banned that in 2008, you can only buy a growkit now.  psilocybin truffles are still legal though00:03
Stee|rdb: How far is anywhere in the netherlands from anywhere else in the netherlands?00:03
* rdb hates governments.00:03
yashgaroth10 minutes00:03
rdbStee|, how do you mean that?00:03
Stee|like, how far would it take you to get to amsterdam00:03
rdbI live relatively close, but it probably would take me still an hour or two00:04
Stee|how long, rather00:04
Stee|clearly you should go get drunk with yashgaroth00:04
yashgarothwhere you at rdb, the hague?00:04
rdbI don't drink alcohol.00:04
rdbI don't think that ethanol brings me any effects that I find useful, so... plus its destructive, unlike many other drugs00:05
rdbyashgaroth, near gouda00:05
yashgarothah, I hope to visit the cheese market00:06
rdb<3 gouda cheese00:06
yashgarothawww yeeee00:06
Stee|I'mm going to go lay down I think00:23
yashgarothsame here, g'night00:25
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has quit [Quit: Leaving]00:25
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap00:54
-!- Mokbortolan_1 [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Quit: Leaving.]00:55
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has quit [Ping timeout: 245 seconds]01:03
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has joined ##hplusroadmap02:23
-!- klafka [~textual@ip-64-139-28-14.sjc.megapath.net] has joined ##hplusroadmap02:38
-!- marainein [~marainein@114-198-65-190.dyn.iinet.net.au] has joined ##hplusroadmap02:55
-!- chris_99 [~chris_99@unaffiliated/chris-99/x-3062929] has joined ##hplusroadmap03:06
rdbthe more I learn about it, the more I want to get a magnetic implant03:48
-!- yottabit [~heath@unaffiliated/ybit] has joined ##hplusroadmap04:19
-!- yottabit [~heath@unaffiliated/ybit] has quit [Quit: Konversation terminated!]04:32
-!- ThomasEgi [~thomas@pppdyn-6e.stud-ko.rz-online.net] has joined ##hplusroadmap04:51
-!- ThomasEgi [~thomas@pppdyn-6e.stud-ko.rz-online.net] has quit [Changing host]04:51
-!- ThomasEgi [~thomas@panda3d/ThomasEgi] has joined ##hplusroadmap04:51
archelskanzure: crazy prices for this old mechanical stuff, http://www.sciquip.com/browses/detailed_item_view.asp?productID=26051&Mfg=MICROMANIPULATOR&Mdl=55005:06
chris_99what's that archels?05:16
archelstwo XYZ micromanipulators05:21
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Read error: Connection reset by peer]05:41
-!- augur [~augur@] has quit [Remote host closed the connection]06:12
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has joined ##hplusroadmap06:45
-!- ParahSailin__ [~parahsail@adsl-69-151-205-240.dsl.hstntx.swbell.net] has joined ##hplusroadmap07:15
-!- ParahSailin [~parahsail@unaffiliated/parahsailin] has quit [Ping timeout: 248 seconds]07:17
-!- JayDugger [~duggerj@pool-173-74-78-36.dllstx.fios.verizon.net] has quit [Quit: Leaving.]07:25
-!- anelma [~elmom@hoas-fe3ddd00-25.dhcp.inet.fi] has quit [Remote host closed the connection]08:33
-!- augur [~augur@] has joined ##hplusroadmap08:33
rdbevening actually08:42
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Quit: jmil]08:43
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has joined ##hplusroadmap08:59
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has quit [Remote host closed the connection]09:01
-!- jmil [~jmil@SEASNet-148-05.seas.upenn.edu] has joined ##hplusroadmap09:17
-!- elmom [~elmom@hoas-fe3ddd00-25.dhcp.inet.fi] has joined ##hplusroadmap09:19
-!- pasky_ [pasky@nikam.ms.mff.cuni.cz] has joined ##hplusroadmap09:37
-!- jrayhawk_ [~jrayhawk@nursie.omgwallhack.org] has joined ##hplusroadmap09:37
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Ping timeout: 240 seconds]09:37
-!- pasky [pasky@nikam.ms.mff.cuni.cz] has quit [Ping timeout: 240 seconds]09:37
-!- jrayhawk [~jrayhawk@nursie.omgwallhack.org] has quit [Ping timeout: 240 seconds]09:37
-!- gedankenstuecke [~bastian@phylomemetic-tree.de] has quit [Ping timeout: 240 seconds]09:37
-!- epitron [~epitron@unaffiliated/epitron] has quit [Ping timeout: 240 seconds]09:37
-!- epitron [~epitron@bito.ponzo.net] has joined ##hplusroadmap09:37
-!- gedankenstuecke [~bastian@phylomemetic-tree.de] has joined ##hplusroadmap09:37
-!- epitron [~epitron@bito.ponzo.net] has quit [Changing host]09:37
-!- epitron [~epitron@unaffiliated/epitron] has joined ##hplusroadmap09:37
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap09:38
chris_99anyone heard of transcranial direct current stimulation09:43
archelsSome dude in ##neuroscience just asked about it. ;)09:49
mag1strateisnt it mostly used for pschological disorders?09:50
chris_99yeah i know, thats why i was asking archels09:51
chris_99and yes it does seem so, although it might be similar to TMS09:52
chris_99in some ways09:52
mag1stratehmmm I had no clue09:54
mag1strateit would be interesting to look into something like this09:54
chris_99yeah and it looks easy to experiment with09:56
chris_99as its just low current DC09:56
chris_99low voltage too09:56
chris_99although i don't really fancy attaching electrodes with electricity to my head09:57
mag1stratethats usually a smart move lol09:58
mag1stratethe problem is I dont see any practical applications for it09:58
mag1stratemost is used to treat disease09:59
mag1strateunless we can control where the current will be flowing to09:59
chris_99could could make a DIY TMS device09:59
mag1strateyou can make one10:00
chris_99oops bad spelling there10:00
chris_99has anyone done that?10:00
mag1stratebut it would be nice to experiement where the electrodes would go to make it benificial10:00
kanzureyes lots of others in here have heard about tdcs10:12
kanzureand i think one or two built a tdcs setup10:12
kanzurealthough i think collecctively this channel has more experience with magnetic stimulation10:13
-!- strages_home [~strages@adsl-98-67-175-14.shv.bellsouth.net] has joined ##hplusroadmap10:13
chris_99has anyone built a magnetic stimulation device?10:15
kanzuresuperkuh worked on something10:18
kanzureat the moment i'm more interested in ultrasound stimulation10:18
mag1strateI enjoy that I can't give myself and want to share with someone special.10:19
mag1strateI enjoy that I can't give myself and want to share with someone special.10:19
mag1stratemy middle click button is a paste and enter button10:19
kanzure"it is thought that the nonthermal actions of US are understood in terms of cavitation - for example, radiation force, acoustic streaming, shock waves, and strain neuromodulation, where US produces fluid-mechanical effects on the cellular environments of neurons to modulate their resting membrane potentials."10:20
kanzure"The direct activation of ion channels by US may also represent a mechanism of action, since many of the voltage-gated sodium, potassium, and calcium channels influencing neuronal excitability possess mechanically sensitive gating kinetics (Morris and Juranka, 2007)."10:20
chris_99oh i've not heard of ultrasound stimulation10:20
mag1stratedo you know of any positive effects of this kanzure?10:21
chris_99i'm hopefully in the process of ordering some ultrasound transducers from china10:21
kanzuremag1strate: 2mm targetting of regions in the brain10:21
mag1strateI can see maybe positive effects on the cellular environemtn level10:21
kanzureit's neural stimulation10:21
kanzuresoo if you have a 2mm chunk you want to stimulate somewhere.. it's pretty useful10:21
mag1strateI've never really seen US used on the brian10:21
kanzurerTMS has more like 1cm resolution10:21
kanzuremag1strate: check those papers..10:21
kanzureone of the studies was to remove an inoperable brain tumor10:22
kanzureby melting the tumor.10:22
mag1stratewas it successful?10:25
mag1strateif it was for melting the tumor, would it effect actual brain tissue?10:26
kanzurereally the ideal setup would be one where you can apply a certain amount of energy to any location within the brain10:26
kanzuremag1strate: the brain tumor study was just a high-power version10:26
kanzurehere's a low power version:10:27
mag1strateit seemed the low power version seemed almost like an impulsive shock to the brain area10:28
kanzureyes.. it's a mechanical compression wave that goes into the skull10:29
kanzurewhen you have 10 or 50 transducers the compression waves add up10:29
kanzureso when they geometrically intersect the power delivery increases10:30
kanzureerm.. the total mW/mm^2 increases. you get the idea.10:30
kanzurei just had the most fascinating time traveling dream10:31
kanzureapparently it's a dream of mine to one day own giraffes and t-rex's and force them to fight against each other10:32
mag1strateThat would be really cool actually10:32
mag1strateI have always wanted sharks with lazer beams on their heads :/10:33
kanzurea shark tank doesn't cost that much10:33
chris_99is their presentation online?10:40
mag1stratekanzure: lol10:40
kanzurehowever you might have to go wrestle your own shark off the coast10:41
mag1strateThats the easy part10:48
mag1stratethe hardest part is the lazer beams10:48
ParahSailin__as insty would say "faster, please"10:49
kanzurei prefer this edit of doc brown: http://www.youtube.com/watch?v=KJRh-37H4fA11:02
chris_99isn't it rather dangerous someone could get the DNA for the pneumonic plague off the net?11:12
rkosi think synthesizing companies ban sequences of too dangerous things11:14
chris_99thats scary as hell to me though11:14
kanzurechris_99: doesn't matter, they already have it11:14
chris_99who already have it?11:15
kanzureobscurity is not security11:15
chris_99true i agree with that normally, but in this case11:15
kanzurethe best defense against plagues is a biological solution11:15
kanzurewe have an immune system for a reason11:15
chris_99is there a vaccine for the plague?11:16
kanzureif there isn't, sounds like an important thing to make, no?11:16
chris_99it does yeah, but i'm really suprised the DNA is available11:17
Stee|no hangover, this is good11:18
rkosis it available?11:18
rkosbut dont dna foundries refuse to synthesize sequences of viruses etc?11:19
kanzurejrayhawk_: do you have any interest in doing a mirror of ftp://ftp.ncbi.nlm.nih.gov/genomes/11:20
jrayhawk_Could do.11:20
-!- jrayhawk_ is now known as jrayhawk11:21
kanzureto my knowledge, there are no mirrors11:21
kanzurewhich is bad.11:21
kanzurewell, there should be a non-institutional mirror somewhere11:23
jrayhawkWhat's wrong with institutional mirrors11:23
kanzureand some of these mirrors look a bit stale (that last one was from 2010?)11:23
kanzurei don't trust universities to always keep them up11:23
kanzureif feds come knocking, etc.11:23
jrayhawkAh, I see.11:23
kanzurealso, apparently i don't trust these universities to keep their mirrors current o__o11:24
kanzurenice set of backups: http://ftp.cbi.pku.edu.cn/pub/database/11:24
kanzureweird they're using some perl module i think, but it doesn't appear on cpan11:26
-!- delinquentme [~asdfasdf@c-67-171-66-113.hsd1.pa.comcast.net] has joined ##hplusroadmap11:26
Urchinmirrors of what?11:49
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has quit [Ping timeout: 260 seconds]12:03
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has joined ##hplusroadmap12:08
kanzureUrchin: gnomes12:27
jrayhawkthe world has enough gnome mirrors12:29
-!- Technicus [~Technicus@] has joined ##hplusroadmap12:33
-!- Technicus [~Technicus@] has quit []12:40
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has joined ##hplusroadmap12:47
kanzurehi jrayhawk12:50
kanzureerm.. Jaakko9612:50
ParahSailin__kanzure, i saw that mutual banking essay on your pdf directory -- this is similar http://praxeology.net/FDT-VS.htm12:59
-!- Mokbortolan_1 [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap13:20
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Ping timeout: 265 seconds]13:22
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has joined ##hplusroadmap13:35
-!- archels [~foo@sascha.esrac.ele.tue.nl] has quit [Ping timeout: 245 seconds]13:35
kanzurehi yashgaroth13:37
kanzurewhy would onLoadFinished() be called 3 times for saks, but not 3 times for heybryan? http://pastebin.com/4csM0qSC13:41
kanzure^for anyone who wants to help out with some javascript13:41
-!- archels [~foo@sascha.esrac.ele.tue.nl] has joined ##hplusroadmap13:47
-!- ParahSailin__ [~parahsail@adsl-69-151-205-240.dsl.hstntx.swbell.net] has quit [Ping timeout: 260 seconds]13:51
-!- ParahSailin__ [~parahsail@adsl-69-151-205-240.dsl.hstntx.swbell.net] has joined ##hplusroadmap14:07
kanzureaha.. http://code.google.com/p/phantomjs/issues/detail?id=12214:13
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has quit [Ping timeout: 240 seconds]14:14
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has quit [Ping timeout: 260 seconds]14:21
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has quit [Quit: Nettalk6 - www.ntalk.de]14:28
-!- strages_home [~strages@adsl-98-67-175-14.shv.bellsouth.net] has quit [Ping timeout: 245 seconds]14:36
-!- strages_home [~strages@adsl-98-67-175-14.shv.bellsouth.net] has joined ##hplusroadmap14:38
-!- jmil [~jmil@SEASNet-148-05.seas.upenn.edu] has quit [Read error: Operation timed out]14:38
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has joined ##hplusroadmap14:40
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has quit [Client Quit]14:42
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has joined ##hplusroadmap14:59
kanzurehi ianmathwiz714:59
ThomasEgi hoho ianmathwiz7 , long time no chat^15:06
ianmathwiz7I've been on the ##biohack channel from time to time15:06
ianmathwiz7but I haven't been spending too much time on IRC, lately15:07
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has joined ##hplusroadmap15:13
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has quit [Quit: ChatZilla 0.9.88 [SeaMonkey 2.7.1/20120208224119]]15:18
kanzurewin 1315:22
-!- augur [~augur@] has quit [Remote host closed the connection]15:23
bkerolose 1215:23
-!- chris_99 [~chris_99@unaffiliated/chris-99/x-3062929] has quit [Quit: Leaving]15:29
ParahSailin__like bitcoin but different16:05
kanzurewhat is this?16:05
kanzureseems to be 24 MB16:05
kanzurehmm this is a very poorly organized collection16:06
kanzure<tr><td><a href="http://bib.tiera.ru//ShiZ/Homelab/spec116/Kulagina M.A., Kiseleva N.A. Osnovy tehnologicheskogo proektirovaniya sborochno-svarochnyh cehov.16:07
kanzure<tr><td><a href="http://bib.tiera.ru//ShiZ/Homelab/spec107/Evstifeev A.V. Mikrokontrollery AVR semejstva Mega. (2007).djvu">16:08
kanzure<tr><td><a href="http://bib.tiera.ru//dvd54/Cavaleiro A. - Nanostructured Coatings(2006)(955).pdf">Cavaleiro A. - Nanostructured Coatings</a><td>pdf<td>en<td>16:08
kanzurehow the fuck am i supposed to fix this?16:08
superkuhI don't know. But at least it's easy to search through and extract URLs. Neat.16:10
superkuhTangentially, http://erewhon.superkuh.com/library/ - always on (fast) daily mirror of my library for when I turn off the main server.16:10
kanzuresuperkuh: we need some way of helping people who keep these mirrors to have better indexes16:11
kanzurefor instance, at the moment i can't confirm that their copy of journal xyz is complete or not16:11
kanzureand they probably don't know which papers belong to which journals either16:11
kanzurebibtex could probably work for this? with some tools to search through directories and get file hashes16:12
-!- marainein [~marainein@114-198-65-190.dyn.iinet.net.au] has quit [Ping timeout: 260 seconds]17:29
-!- ThomasEgi [~thomas@panda3d/ThomasEgi] has quit [Remote host closed the connection]17:43
uniqanomalyhttp://news.slashdot.org/story/12/02/18/2130245/universities-agree-to-email-monitoring-for-copyright-agency :>18:14
kanzureeww slashdot18:31
kanzurehow is that not dead yet18:31
kanzure<tr><td><a href="http://bib.tiera.ru//DVD-034/Agrawal_S._Protocols_for_Oligonucleotide_Conjugates[c]_Synthesis_and_Analytical_Techniques_(1993)(en)(390s).pdf">Agrawal S. Protocols for Oligonucleotide Conjugates[c] Synthesis and Analytical Techniques</a><td>pdf<td>en<td>255<td>21154034</tr>18:56
kanzurehttp://bib.tiera.ru//dvd57/Foster G. D. (Ed), Taylor S. (Ed) - Plant Virology Protocols, Vol. 81(1998)(571).rar19:01
kanzureprobably useless http://bib.tiera.ru//DVD-022/Graham_C.A._(ed.),_Hill_A._(ed.)_DNA_Sequencing_Protocols_(2001)(2-nd)(en)(244s).rar19:01
fennbooks books books books books books books books books19:05
kanzurefenn: help me figure out a way to fix this19:05
fennwhat's the problem?19:05
kanzurelots of people scrape and download lots of this content (which is great)19:05
kanzuretheir organization skills suck19:05
kanzureand they can't show that they have a complete collection for a given journal, edition, volume, or whatever19:06
kanzureideally there's some folder organization scheme, or metadata file format that we should all be using19:06
kanzureand writing tools to use.19:06
kanzureand then some web service that collects this information and maps it all together.19:06
fenni assume the metadata is publically available19:06
kanzureit's definitely available at the original soruce19:06
fennthen it's a "simple matter of coding" to match the files to the metadata19:06
kanzurebut most people just take a pdf and give it a title and they feel they are done19:07
kanzurei know that's what i did (because i'm an idiot)19:07
fennunfortunately you will probably have to rewrite parsing code for each publisher19:07
kanzurewell, how about bibtex19:07
fennyeah it's the curse of filesystems19:07
fennwhat, have a bibtex file for every pdf? i think the problem is people only download a pdf so the (computer parseable) metadata gets lost19:08
kanzurewell let's only consider scraping scenarios19:08
fennyou could build up a hash table of pdf to metadata19:08
kanzurewhere the programmer can afford to take extra metadata19:08
fennsha256(foo.pdf) => metadata19:08
kanzureexcept the pdfs are modified each time you download them19:08
fenno rly19:08
fennwhat doesn't change then19:09
kanzureyeah like AdobePDFStamperPro (watermarking)19:09
kanzuresometimes there's metadata in the file itself19:09
fennis there a title field or something at least that stays the same?19:09
kanzuresometimes, but for the vast majority of content it's just a scanned image19:09
fennhow does mendeley do it?19:09
kanzureproprietary ocr19:09
kanzureas a fallback.19:09
fennhow hard is OCR then?19:10
fenni mean, you just need to match the text to something in a list of titles19:10
kanzuretesseract was pretty awful. i don't know, i think the pain of ocr probably goes down for sufficiently large collections19:10
kanzureanyway, let's assume this isn't a problem19:10
kanzurelet's assume that programmers will scrape metadata too19:10
kanzurei'm thinking of a particular scenario where people are scraping content19:10
kanzureand contributing it to the master collection19:11
kanzureand then being able to say "Ok, the internet has pages 1-400 of journal xyz" based on collected records on the server19:11
kanzure"THE INTERNET" well.. "this public service"19:11
kanzurethen if you are feeling like you want to contribute, maybe you'd write a compatible scraper to gather, dump and upload data for "x, y and z" that the server says is missing19:12
fennhow to keep it from falling apart again when the site gets shut down?19:13
fennpresumably people would have backups or partial backups19:13
kanzureand the software is separate anyway19:13
fennbut if the pdf hash changes..19:13
kanzure*shrug* that just hurts verification19:14
kanzureweb of trust bullshit can maybe offset that19:14
kanzure"hey look, all the contributions from anonymous user with this public key are all awful"19:14
fennoh i didnt even think of that19:14
fennsomething like git annex would be preferred19:15
fennso you as the library maintainer would only accept patches that look legit19:15
fenni've never seen any working web of trust software19:15
kanzuresure. and again different computers and friendlies will contribute 'chunks' of scraped content / metadata that gets distributed from some primary server19:15
fenn(doesnt mean it doesnt exist)19:15
kanzurewell, i do think contributions would need to be reviewed somehow19:16
kanzurea basic method might be "show us your scraper"19:16
fennwouldnt looking at the content make more sense?19:16
kanzureyes but how am i going to manually look at 7 million articles a month19:17
kanzurethere might be some computational way to determine if a paper looks like what it says it is19:17
fennyou do a random sampling19:17
kanzurewell ok.19:17
fennyes, document clustering19:17
kanzureso far i've met nobody that has indicated they would want to spam a service like this19:17
fennthere really arent very many people working for journals19:17
fennmany many more pissed off students who don't have access and want to fix this broken situation19:18
kanzureright.. and "Here's 20,000 files with some bad names" does not help as much as it could19:18
fenni think you're just going to have to ignore filenames19:18
kanzurefine by me19:18
fennstart with the nature archive19:19
kanzurei don't want to bother with parsing everyone's horrible dump19:19
fennget metadata, figure out how to computationally extract matching metadata from the articles themselves19:19
kanzurewhat's wrong with me forcing them to go get metadata19:19
kanzureyeah, i can't easily reverse from the nature archive "what the original url was so i can go grab metadata"19:19
fennwell, how do you attach the metadata to the files in the first place?19:19
kanzurebibtex, some simple yaml format, let's make something up19:20
kanzurethen that will be the definition, and the service will grow around that definition.19:20
* fenn reads about bibtex19:20
kanzureit's latex except a citation subset19:20
kanzurei don't know. everything supports bibtex.19:20
fenn"It is possible to use BibTeX outside of a LaTeX-Environment, namely MS Word using the tool Bibshare. " okay but how is this supposed to help for the pdf scenario19:21
kanzureexample: http://diyhpl.us/~bryan/sciencedirect/microelectronics.journal.txt19:21
fenncat you just cat foo.bibtex >> foo.pdf19:21
kanzurewell i've always considered a .pdf.tar format that would include that, but whatever19:21
kanzureyes there's metadata portions in the pdf format, and probably some way to attach files.. but i don't care that much19:22
kanzureso my example link19:22
kanzurei had a bibtex file for each volume of each issue of this journal19:22
kanzureand cat'd it together. so think of this as a single issue of a journal with a looong list of articles19:22
fenn.pdf.tar sucks because your pdf indexing software won't read it19:23
kanzureproprietary pdf indexing software is not the solution anyway19:23
kanzureit's a part of the problem. proprietary ocr? thanks mendeley, that doesn't actually help19:24
fenntracker-search does pdf indexing19:24
fenntime tracker-search transcranial19:24
fennreal    0m0.080s19:24
fennunfortunately it returns urls instead of paths19:24
kanzurehow is this better than just keeping the original metadata from the webpage clean19:25
fennit works with just the pdf files19:25
kanzurei don't trust pdf files to have this information, and i don't trust ocr that much19:25
fennessentially you have to write a scraper either way19:25
fennscrape metadata from the webpage, or scrape it from the pdf19:25
kanzurewell you have to get the webpage anyway to get to the pdf19:26
fenni'd rather scrape from the pdf because we already have those19:26
kanzurebut you can't do that reliably :/19:26
fennyou're guaranteed to have the pdf, but the webpage could disappear, change, or never exist in the first place19:26
kanzurelots of pdfs are just images19:26
kanzurewell you only get the webpage once, you see19:26
kanzureafter you extrat all the data it might as well disappear (who cares)19:26
fennokay maybe you should just do that first19:26
fennget all the journal metadata19:27
fennthere's "only" 80,000 journals19:27
kanzureright, i'm presently doing that for most of elsevier (although it seems they hide some of their metadata to me unless i'm on a university network)19:27
fennwow really? what do they hide?19:27
kanzurelike issues19:27
kanzureone of the journals i was looking at went back to 200319:27
kanzurebut on another computer, i saw that it went back to 1990something19:28
kanzuresame site.19:28
fennis this data considered "copyright"? i mean it's basically a library card catalog, so there shouldn't be any problem with hosting journal metadata out in the open19:28
kanzurei'm sure they will complain anyway19:28
fennbut the mere existence would help others doing the same sort of thing19:29
fenni mean there's no reason everyone should have to scrape metadata from the web19:29
kanzureand then we can coordinate multiple scrapers simultaneously or at least help people to not duplicate work19:29
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has joined ##hplusroadmap19:29
kanzureso, i guess it's just a matter of bibtex+pdf?19:34
fennbibtex is just a text format19:34
fennhow does that fix anything?19:34
kanzurewell ultimately what we need from a scraper is the pdf plus metadata19:34
kanzurei guess not. that doesn't fix it.19:35
fennyou want to know if a collection is "complete", no?19:35
kanzurei want each collection to have an index and know what it has19:35
kanzureand then to describe each item.19:35
fennthe index could just be bibtex19:35
fennone big file with bibtex entries for all pdf files i the collection19:36
kanzurei guess so. does that cover everything?19:36
fennalternatively, you could have one bibtex file for each pdf file19:36
fennthey amount to the same thing i guess19:36
fennneither one "sticks" to the pdf file though19:37
fennit would be nice to somehow append bibtex to the pdf19:37
fenni dont know enough about modifying pdf's to know if this is easy19:37
fennpdf ends with %%EOF so you could just literally append bibtex and it shouldnt hurt anything19:38
kanzurepdftk html_tidy.pdf attach_files command_ref.html to_page 24 output html_tidy_book.pdf19:38
fennmetadata doesn't need to be visible in the pdf viewer19:38
kanzureyou mean EOF or a literal that says '%%EOF'19:39
fenntail foo.pdf19:39
fennbbl maybe19:40
kanzurewell that's weird.19:41
kanzureso doesn't citeseer do citation tracking19:42
kanzureor citeulike19:42
kanzureand i guess mendeley has an ok collection by now.19:42
kanzure"Scientific Literature Digital Library incorporating autonomous citation indexing, awareness and tracking, citation context, related document retrieval, similar"19:42
fenni think this is not solving the same problem19:45
kanzureno not quite19:45
fennthere's the forwards metadata problem (start with a catalog, link to the articles)19:45
fennand the backwards metadata problem (start with the articles, link to the catalog)19:45
fennwow citeseer search sucks balls19:50
fenn"here are some articles that contain keywords that you typed in, in no particular order"19:51
fenndocument clustering is probably more useful than journal indexes anyway20:04
fenni'm wondering how many of my books are essentially duplicates20:05
fennwhat's bib.tiera.ru?20:06
kanzureno clue20:06
kanzurejust found it today20:06
kanzurelooks a bit more hand-curated than libgenesis20:06
kanzureIf you have any stuff to upload (or to donate), <a href='mailto:s@tiera.ru'>write us</a><br>20:08
kanzurehttp://f2.tiera.ru//TEXTBOOKS3/ELSEVIER-Referex/1-Chemical Petrochemical and Process Collection/CD3/RICE, R. G. (1994). Applied Mathematics and Modelin20:14
kanzureg for Chemical Engineers/03771_08.pdf20:14
kanzurehmm ELSEVIER-Referex?20:14
fennwonder what's up with their css and general lack of content http://fennetic.net/irc/facebook_huh.png20:15
fennhmm maybe i broke it with adblock20:17
kanzurei'm looking at tiera.ru's index20:18
kanzureand i'm not sure, but i think some of these were mine20:18
kanzure/other/other3/Chemistry/Chemical engineering/Technology and processing of polymers/20:19
kanzurenobody makes awful paths like i do!20:19
fennhey why not use freenet20:20
fennfor distributing papers20:21
kanzurei don't think distribution is a problem20:21
kanzureand realistically for maximum impact you need http20:21
fennhow big do you think "all" of the journal archives would be if properly OCR'ed?20:25
kanzure"ScienceDirect publishes 250,000 articles a year in 2,000 journals."20:26
kanzuresciencedirect indexes >5000 journals though, so i don't know what's up there20:26
fennfor the sake of analysis, limit "all" to everything currently offered in digital format online20:26
fennsciencedirect is a brand of elsevier, right?20:27
kanzurei think i've seen estimates of 7 to 8 million articles per year at the moment20:27
fenn"ScienceDirect is Elsevier's platform for online electronic access to its journals"20:27
fennso maybe they only have metadata for the other 300020:28
fenn8 million per year sounds way too high20:28
kanzurethey include some things they have "Intellectual Property Rights to index" or something lame20:28
kanzure"1.486 million peer-reviewed papers published within 2010"20:29
kanzure"They estimate that  1.346 million articles were published in 23,750 journals within 200"20:29
kanzureok that was originally:20:29
kanzure"They estimate that  1.346 million articles were published in 23.750 journals within 200"20:29
kanzurewhy would 23.750 make sense20:29
kanzureeither say 1,346 million and 23,750 or 1.346 million and 23750.00 dfkadkfadja20:29
kanzureny times published this graph once that showed the rate of increase in publications for all countries for at least 40 years20:30
Stee|23.750 is used in some european countries20:31
fennit's confusing because they're inconsistently using the period for either decimal place or thousands separator20:32
fennok so something like 2million per year20:33
fennthis means, at minimum your scraper has to be able to handle 5000+ papers a day20:33
fennconversely, 15 seconds maximum processing time per paper (assuming no parallelization)20:34
kanzure"One paper per minute is based on 679,858 papers per year in 2009 / 365 days / 24 hours / 60 minutes = 1.29 papers per minute."20:35
kanzurehrm actually this might be an ok method to help approximate the collection completeness20:35
kanzure"768,341 papers have been written so far in 2012. We have 20,000."20:36
fennmaybe we should just focus on getting papers that aren't crap20:37
kanzurewell that too.20:37
kanzureit would be nice to start with some commonly-read journals20:37
fennmaybe take your collection and use that as a set of seed points for a web of science crawler20:38
fennpresumably anything not cited by or citing the papers cited by the papers in your collection (ugh) is not worth reading20:39
kanzurei find strange outliers all the time20:39
kanzurei guess i haven't looked at the citation network at all.20:39
fennfor all it's been abused by academia, citation network is a pretty cool thing20:40
-!- nuba [~nuba@pauleira.com] has quit [Ping timeout: 260 seconds]20:40
kanzurei've never been sure how researchers remember which paper they read something in20:41
-!- nuba [~nuba@pauleira.com] has joined ##hplusroadmap20:41
kanzurei can remember big papers where important things happen20:41
kanzurebut for the small results, that seems much harder20:41
fennthey don't; it's all made up20:41
fennyou just have to make up a bibliography when writing your paper so you go look around for the originals20:41
kanzurei guess if you go looking for something to cite, it's easier20:41
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Quit: jmil]20:42
kanzureorr they probably just read a review paper and pick out some crap20:42
fennalso a trick i've seen is people just look at what papers they've downloaded20:42
fennmost academics don't have a very extensive library20:42
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has joined ##hplusroadmap20:43
fennthat's the library's job after all20:43
kanzurethe libraries are busy paying too much for all these journals20:43
fennwhy is there no "complete" journal metadata index?20:45
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Client Quit]20:45
fenni mean why does this fall on the shoulders of a couple disgruntled hackers20:46
fennthere should be armies of librarians tackling this20:46
-!- Stee| [~Steel@cpe-67-246-36-165.nycap.res.rr.com] has quit [Ping timeout: 276 seconds]20:46
kanzureit's because "others" got to science first, before the internet got it.20:47
kanzure*got to it.20:47
fennbut it's really not a hard problem to solve20:47
fenn"scrape all the indices, merge, repeat"20:47
kanzureexcept they do20:48
kanzureand then they pay for it for some reason20:48
kanzureand call it proprietary20:48
fennisi is supposed to be the complete metadata index?20:48
kanzurehrmm i'm not sure.20:48
-!- Stee| [~Steel@cpe-67-246-36-165.nycap.res.rr.com] has joined ##hplusroadmap20:49
kanzure". This intelligent research platform provides access to the world's leading citation databases, including powerful cited reference searching, the Analyze Tool, and over 100 years of comprehensive backfile and citation data."20:49
fennso there are these papers studying h-index etc., what is their data set?20:49
kanzuresometimes isi access i think20:49
kanzureit's a hodgepodge of commercial databases i think20:49
kanzure"Calculate an accurate h-index by ensuring the full extent of an author’s past research is taken into account."20:50
fennok this is the citation network, which while cool, is a superset of what i'm asking for20:50
kanzureoh right20:50
fenni just want a list of all articles published20:50
kanzureworldcat? i think that stops at individual "titles" (of volumes/books)20:53
Stee|kanz, any idea how to get access to internal zaibatsu journals?20:53
fenni should be able to google "list of all journal articles published" and find a webpage that lets me download some file with all the metadata20:53
fennStee|: walk into the office wearing a jumpsuit, say you're here to fix the router20:53
fennmaybe a bit harder if you're not japanese20:54
fennworldcat is exactly the right model20:54
fennbut for some mysterious reason no similar thing exists for journals20:54
kanzureworldcat is the supplier of things like ezproxy and interlibrary loans20:54
fenn"With authorization from OCLC, you can download a subset of the WorldCat database for harvesting by your search engine or other enterprise Web application."20:55
kanzureyep.. commercial20:55
fenndoes that mean they sell it or what?20:55
kanzurethey might be doing license agreements if you end up selling/distributing it, but free for 'research'20:56
kanzurebut i don't know for sure20:56
kanzurewhat about library of congress? i don't think they track individual articles20:56
fenn"Each party shall be entirely responsible for meeting its own costs incurred with respect to the matters described in this Agreement and neither shall be obligated to make any payment to the other under the terms of this Agreement."20:56
fennso apparently it's free, if you appear "legit"20:57
fenn"Under no circumstances shall Institution/Company sell, license, publish, display, distribute or otherwise transfer to any third party WorldCat Metadata or holdings information or any copy thereof, in whole or in part, except as expressly permitted"20:57
kanzurehrm it looks like they do track some articles?20:57
kanzurewow what20:57
kanzurewell what is expressly permitted?20:58
fennit just means you don't have permission to redistribute20:58
kanzurehaha how useless20:58
kanzuregod our libraries suck20:58
fennyeah somehow the librarians missed the "information wants to be free" bandwagon20:59
kanzurewasn't google supposed to fix this?21:02
kanzureor was it only supposed to index the shitty information21:02
fenngoogle "grew up" and got all responsible n shit21:03
fennso now they can't do anything21:03
fenn"OCLC does claim copyright rights in WorldCat as a compilation. In accordance with US copyright law, those21:04
fennrights are based on OCLC's substantial intellectual contribution to WorldCat as a whole, including OCLC’s selection,21:04
fennarrangement, and coordination of the material in WorldCat"21:04
fennfwiw, google is awful about exporting data, despite their "data liberation front"21:04
kanzurewell then anyone can claim copyright on collection21:04
kanzurescience liberation front sounds like an awesome name21:04
fennyou need to wear a ski mask and have stacks of liberated hard drives behind you21:05
kanzurenot a problem21:05
kanzureit can also be like a drug bust: http://addictionrecoveryhope.com/wp-content/uploads/2009/09/Drug-Bust.jpg21:06
fennif anyone cares, here's some discussion and self-justification by oclc on why they don't allow redistribution http://www.oclc.org/worldcat/recorduse/policy/forum/forum.pdf21:06
kanzurei think oclc was setup by a bunch of university librarians and that's why it has traction21:07
kanzurei'm not really sure why they all agreed to this awful mess21:07
kanzurebut it's particularly nice of them to all be using ezproxy (it makes it easier once someone finds an exploit)21:07
kanzure"the practical need to sustain the economic viability and value of WorldCat over the long term"21:08
fennwhat about ezproxy?21:08
kanzureall universities use it21:08
kanzurethey all run a local instance of it21:09
kanzureso if i was to find a backdoor, i'd have keys to the entire kingdom21:09
fennis that a mirroring software?21:09
kanzurethe best way to explain it is to show you21:09
fennhow do you get the database if it's all just web pages?21:09
fennyou dont need ezproxy to search worldcat21:09
fenndoes oclc do something else?21:09
kanzureoclc does a lot of things21:10
kanzureit's your usual clusterfuck of databases and library integrations21:10
fennhere's the thing, there's a unique ISBN for every book21:10
kanzurehere's an example of ezproxy21:10
kanzureusername: 395221:11
-!- Stee| is now known as Steel_21:11
kanzurepassword: 395221:11
fennwhere's the data that goes along with that ISBN submitted to when the author publishes the  book?21:11
kanzurelibrary of congress somewhere21:11
fennso what good does worldcat do then? why can't we just download the ISBN's from LOC?21:12
kanzurebecause worldcat is also indexing papers21:12
kanzurealso, worldcat.org is not the primary purpose of worldcat21:12
kanzure"Some WorldCat libraries make their specialized reference databases available on their Web sites, but only to library members."21:13
kanzure"worldcat libraries"21:13
fennoh great21:13
fennISBN is run by a for-profit company21:14
fennnow owned by ... (drumroll)21:14
kanzurehere's what oclc is: http://www.oclc.org/us/en/services/a-to-z.htm21:14
fennthose fuckers21:15
kanzurexisbn is oclc apparently21:15
kanzurelook at how they list worldcat21:15
kanzure"Global network of library content and services that lets your institution be more connected, open and productive"21:15
kanzureit's about convincing libraries to sign up with them21:15
kanzureeven my high school had some weird worldcat integration (it was pretty broken)21:16
kanzuredid you try that ezproxy login?21:17
fennthis is interesting http://isbndb.com/21:17
fennyes, looks poorly configured21:18
fennreminds me of internet circa 199521:18
kanzurei'm pretty sure libraries were networked together pre-web21:18
kanzuremaybe that's why proprietary solutions are so dominant21:18
fenni dont really know what i'm looking at here21:19
fennit's a list of services they've purchased access to?21:20
kanzurebut if you click, you have access21:20
kanzuresince you're logged in21:20
kanzurenormally these services authenticate you by ip address21:20
kanzureezproxy is inside the college's network21:20
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has quit [Ping timeout: 260 seconds]21:21
fenni assume at a more science-oriented university the list would be more useful?21:21
kanzureheh yes this is a bad example21:21
fenni just realized what "WorldBook" is21:22
fennit's the old second-rate paper encyclopedia, for people who couldn't afford britannica21:22
kanzurei thought that was ms encarta21:23
fennoo oxford english dictionary21:23
fennso ezproxy just forwards stuff from the uni's IP to the external internet21:24
kanzuresometimes these services have usernames/passwords instead of ip authentication21:24
kanzureand ezproxy handles that configuration/setup too (apparently)21:25
kanzurein the uk i think they have some federal system for paper access called athens? i don't know much about it21:26
kanzure"Athens is an access management system which controls access to many of the Library's electronic information sources. When you login to an Athens protected resource it checks to see if you are a member of an institution that has paid to use that resource, and if your username and password are correct it lets you through."21:26
fennhmm want to scrap OED? they only have 275k entries http://www.oed.com.webserver.macu.edu:2048/viewdictionaryentry/Entry/27487021:27
kanzurei'm sure someone has a pdf of oed21:27
kanzurei guess a pdf is less useful21:27
kanzure"The Athens service is a trust federation where Identity Providers, Service Providers and Athens operate under common rules and licenses. Trust is enforced by the use of public-key cryptography and other security mechanisms."21:28
kanzure"Athens is used extensively within UK Higher and Further Education institutions, the UK National Health Service, and in more than 90 countries worldwide. It has been adopted by over 2,000 organisations, and over 300 online resources since it was first launched in 1996. Over 4.5 million accounts are now registered with Athens."21:28
kanzure"Conceived in 1996 at the University of Bath," we should get adrian bowyer to fix that21:28
fennthis is the last entry added http://www.oed.com.webserver.macu.edu:2048/viewdictionaryentry/Entry/27711421:28
kanzureit's by id21:28
fennyep just fiddle with the number21:29
fenn"once you're past the perimeter, there's no security! have a nice day"21:29
fenni do just fine with the 1907 dictionary, but someone might find it useful21:30
fennwell i guess i should go socialize21:31
* fenn sighs21:31
-!- SDr [~SDr@unaffiliated/sdr] has joined ##hplusroadmap21:41
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has joined ##hplusroadmap21:47
kanzurehi yashgaroth21:49
yashgarothhey hey21:49
_sol_DO you think DIY orgo chem on a legit side for open source experimentation is possible or that it'd break the bank to make some DIY fume hood and accessory safety in the lab to make sure ya don't have vapors floating wiht a spark setting things soff?21:50
_sol_er off21:50
_sol_I was thinking about that th eother day reading some DIY sites, and wondering what legit open source means for DIY chemistry since ppl are trying to spring up microbio labs and such now21:51
yashgarothdepends where you're trying to install it, and how legal you want to be about EPA regulations21:53
_sol_Of coures, if ya own a glass beaker these days some countries enforcement agencies put books and a single glass beaker together with all the regs and assume the worst...21:53
kanzurewhat does any of that have to do with breaking the bank21:56
_sol_I guess I'm wondering if ya could try to start a DIY fume hood project among other things or is it all regulated as to what is needed for chem lab?21:57
_sol_I mean the cost for breaking the bank...21:58
_sol_the cost may be to much to make something safe for small DIY stuff21:58
_sol_and try to be pretty safe although not sure and in whose eyes21:58
yashgarothhow interesting are the chemicals you plan to use21:58
_sol_don't know yet21:58
_sol_I'm just wondering if there are projects out there already...21:59
_sol_but I think some solvents in basic orgo chem experiments are still pretty volatile with sparks if I recall21:59
_sol_if ya are doing a distillation process to seperate a heavier weight molecule from lighter weight via heating and using a water cooling over the glass to cool the vapors...22:00
_sol_I'm just looking at how-tos22:00
kanzureyes you can make a fume hood if you want to?22:01
_sol_but if ya don't insulate the fan right? couldn't an electrical spark set stuff off?22:01
_sol_I guess I'm just overthinking...22:01
_sol_I have a chemist friend so I could ask him, but I'm wondering how big chemistry is in the DIY...22:03
_sol_DIY community... etc which I tink this room sorta is , but its more open electronics and software maybe right now22:04
_sol_biohacking somewhat I guess22:04
kanzureyou're welcome to bring your chemistry friends in here22:04
_sol_I'll see if he is around later..22:04
kanzuredrazak_: did you ever finish your distillation setup?22:04
-!- augur [~augur@] has joined ##hplusroadmap22:05
fenna fume hood is dead simple: a box where you do your work, a trapezoidal reducing flange, a fan, and a chimney22:10
fennunless you're venting extremely toxic fumes (in which case i question your methodology) dilution with lots of air will render it harmless22:10
fennas for explosion prevention, make sure you don't exceed the minimum concentration needed to explode stuffs22:12
fennfor propane this is as little as 5%22:12
fennon the other hand, 1 mol of gas is only 22 liters so if you're going to evaporate a mol of whatever you need to add at least 1 m^3 of air to it to render it explosion-proof22:13
fenna bigger concern is crap building up on the chimney which really should be done with regular close-up visual inspection22:14
-!- SDr [~SDr@unaffiliated/sdr] has quit []22:14
kanzureoh nice cysteine has Oligonucleotide Synthesis - Methods and Applications [Methods in Molec Bio 288] - P. Herdewijn (Humana, 2005) WW.pdf22:14
fenncysteine has papers on it already?22:15
fennor is that a book22:15
kanzureit's a book22:15
kanzurecheck /torrents/text/protocols/22:15
kanzureand /torrents/text/books/textbooks/Biology_And_Medicine/22:15
kanzuretext/books/textbooks/.. damn the world sucks22:16
fennhey, four hour work week, was looking for that22:16
fenngah 96MB22:20
kanzurefenn: so i've been using phantomjs a lot lately22:23
kanzureand i keep looking at http://www.gnu.org/software/pythonwebkit/22:23
kanzurewhich is very ranty.. but accessing the dom from python seems much better than from javascript22:23
kanzureif you'll notice, it's a giant rant by luke kenneth casson leighton22:24
kanzurewho you might remember from openscad22:24
kanzureluke keeps saying that pythonwebkit is pyjamas22:27
fenntldr what?22:29
kanzureweb scraping with webkit bindings22:30
fenndon't you need to run js to access links that get created at runtime?22:30
kanzureyes that's what webkit does22:30
fenni thought that was the whole point of phantomjs22:30
-!- Mokbortolan_1 [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Read error: Connection reset by peer]22:30
kanzurecorrect. pythonwebkit seems to be phantomjs except piloted by python instead of js22:30
kanzureexcept not marketed like phantomjs22:30
fennwell, that's nice22:31
kanzuretheoretically this should be more pleasant22:31
fenni had intended to learn js anyway22:31
fennlots of stuff can be scraped without js though22:32
fennjust looking at page structure with yer eyeballs22:32
kanzurei'm tired of beautifulsoup, lxml, mechanize and nokogiri22:32
kanzuresometimes the html is poorly formatted and sometimes there's data written in the js headers22:32
fenni suppose it depends on how structured the data is you're trying to scrape in the first place22:32
kanzureand then these parsers break and crap.. if any parser isn't gonnab reak, it's going to be a web browser22:33
fennreally the html parser doesn't work? i thought the whole point of BS was that it didnt break on bad html22:33
kanzureno i thought that was lxml22:33
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has joined ##hplusroadmap22:33
fenn The BeautifulSoup class turns arbitrarily bad HTML into a tree-like nested22:34
kanzurehmmm okay22:34
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap22:35
kanzurefwiw beautifulsoup is what i use anyway22:35
fennug flattr keeps 10% for itself, why do people put up with this22:36
kanzurebecause it has a slim chance of being better than paypal22:37
fennwepay is actually better than paypal22:38
fennfor now at least22:38
fennproblem is paypal tries to be all "don't worry, we'll refund you someone else's money if there's any problems"22:39
kanzuregit clone http://git.savannah.gnu.org/cgit/pythonwebkit.git22:40
* kanzure gulps22:40
drazak_kanzure: at home? nah, never ended up buying anything22:45
drazak_kanzure: too expensive22:45
kanzuretoo expensive! bah22:47
fenndistillation is easy22:53
fenni'm surprised you can't just buy a still from walmart22:54
fennsuch things exist but only for water22:54
kanzurewow what in the name of holy fuck22:58
kanzurehosted ojs instances (openjournalsystem)22:59
kanzurethat should be more like $10 or $20/year22:59
kanzurei guess that's $50/year.. but still22:59
kanzureoops $50/month23:00
fennit's overpriced hosting but it's somewhat specialized knowledge23:01
kanzurethere's more money to be made in vertical integration with the research23:01
kanzurelike "yo dawg, we noticed you're doing a bill of materials for reagents in your project.. we can hook you up aww yeah"23:01
fenn"yo dawg i herd u liek bill o materials so i put a bill o materials in yo bill o materials yo"23:02
kanzurewell each research paper is the result of some $200k grant23:02
kanzurereagents maybe costing some % of that.. which i guess is what um, that lab management webapp thing was trying to tap23:03
Mokbortolan_Yo dawg, I heard you liked BoMs, so I put a bomb in your BoM so you can bomb while you BoM.23:03
fenni guess the idea there is they've already forked over cash so they don't feel so bad forking over more cash to the same entitty?23:04
kanzurealso! there's all sorts of weird stateful information during research that could be served by a platform23:04
kanzureinstead of keeping random spreadsheets on random computers about which petri dish is currently in which state23:04
fenndude this is just a web host with some custom software installed23:05
kanzureyeah i know, there's no reason for this to cost so much23:05
fennthey're not doing anything in the research phase at all23:05
fennit's merely for preserving the results for posterity23:05
kanzurewell they seem convinced they are a part of research23:05
kanzurenot this site in particular though23:05
kanzureok whatever. all these fees are stupid.23:09
Steel_kanzure: worth starting a business for it?23:09
Steel_undercut 'em?23:09
kanzureno. there's only 20000 journals or something23:09
fennah but think how many MORE there could be!23:10
kanzure20000 * $5/year = haha.. yeah23:10
kanzurejournals are a dumb structure anyway23:10
fenna journal for every lab!23:10
kanzureisn't it just supposed to be an aggregator23:10
fenna journal for every paper!23:10
kanzurewait. do you mean tags?23:10
Steel_kanzure: If you have thoughts on a better one, I'd certainly love to see them written up somewhere so I can incorporate those ideas23:10
Steel_tags are on my list23:10
fennhow about scientists publish their own fucking papers23:10
kanzurei don't trust people to maintain active web servers23:11
fennneither do i23:11
fennbut at least someone would be able to aggregate them23:11
fennunlike now where we're all "oh noes the paywall is falling"23:11
kanzuredid those jerks update the arxiv torrents or is there still an anti-get-all-our-data thing going on there?23:11
kanzureerr right now if the paywalls fall there's nothing saved from inside.23:11
fennokay why is pythonwebkit.git > 1GB?23:12
kanzurei have no fucking clue23:12
kanzureit's still cloning23:12
kanzurei have a clone on gnusha in /home/bryan/local/pythonwebkit/ if you want it.23:13
kanzurei'm currently checking out python_codegen (the branch)23:13
fenncan you make a copy of this without the bloat?23:13
kanzureum i don't know what the issue is yet23:13
kanzurei think it's a complete copy of webkit23:13
kanzurei'll check what the working directory size is.23:13
fennstill, shouldn't be that big23:13
kanzure2.9 GB?23:14
-!- lkcl [~lkcl@host86-131-171-208.range86-131.btcentralplus.com] has joined ##hplusroadmap23:14
fennseems to be mostly ./LayoutTests and ./WebCore23:14
lkclmorning folks23:15
kanzurecan we delete those23:15
fennactually wait23:15
fenndunno about webcore23:15
lkcli'm told that there are people trying to compile pythonwebkit around here23:15
-!- uniqanomaly_ [~ua@dynamic-78-8-80-128.ssp.dialog.net.pl] has joined ##hplusroadmap23:15
fennanyway it's in .git as well23:15
lkcli'm the lead developer23:15
fennhi lkcl23:15
fennwhy is your repo 2.9GB?23:15
lkclso... what do you need?23:15
lkclbecause that's what the size of webkit git is23:16
fenni'm interested in pythonwebkit primarily for scraping23:16
kanzureyeah i might have misunderstood you23:16
kanzureheadless would be great23:16
kanzuremaybe pythonwebkit isn't actually required?23:17
lkclok, there's a couple of ways to do that23:17
lkclactually three.23:17
lkclwhat level of HTML compatibility do you need?23:17
kanzurewhat are my options23:17
lkcljust some notes from another conversation i'm going to cut/paste ok?23:17
lkclcomment out line 82.23:17
lkclthat will give you "headless" mode in pythonwebkit.  ok,  _should_ do :)23:18
lkcl1) KDE's KHTMLPart (HTML DOM TR2 compatible)23:18
lkcl2) pythonwebkit gtk mode, hacked to remove that line 8223:18
lkcl3) pythonwebkit "DirectFB" mode, hacked to remove the equivalent line - you want the python_codegen-directfb-2011-10-18 branch23:19
-!- uniqanomaly [~ua@dynamic-78-8-80-186.ssp.dialog.net.pl] has quit [Ping timeout: 272 seconds]23:19
lkcl4) python-hulahop with xulrunner 9.0 - source code is here: http://lkcl.net/hulahop23:19
lkclthen get the tutorial i wrote, here:23:20
kanzurewhat i'd like is the same level of html compatibility as phantomjs, which seems to just be headless vanilla webkit23:20
lkcland hack that to simply remove the pygtk2 equivalent of the window stuff23:20
lkclwell it depends on whether you want Firefox headless HTML5 compatibility or Safari/Webkit/Android headless HTML5 compatibility23:21
lkclpython-hulahop will get you Firefox23:21
lkclpythonwebkit will get you Webkit/Android/Safari23:21
lkcland KHTMLPart will get you.... mmmm.... compatibility with the internet circa 1998 :)23:21
fenni think we are misunderstanding something23:22
lkclyou have to compile KDE with c++ runtime type checking enabled23:22
fennthe idea is to run javascript and get text data out of the page, not to render anything23:22
lkclwell, then you'll need to create a "port" of webkit which does no rendering.23:22
lkcllet me look up what phantomjs is....23:23
lkcloooh hoo hoo!23:23
lkclverrry coool.23:23
kanzureheh. except i want this in fucking python23:23
lkclso they created a port that... oh shit, you want _what_??? :)23:23
lkclooo hoo hoo, you're gonna have a lot of fun then.23:24
lkclok, you have a couple of options23:24
lkcl1) work out the patches that i did to add python bindings and re-apply them to phantomjs23:25
lkclthe pythonwebkit stuff *is* entirely with the exception of about .... 100 lines of code *entirely* screen-independent23:25
lkcl2) work out the phantomjs patches and reapply *those* to pythonwebkit23:26
kanzurei don't think phantomjs patches webkit23:26
lkcl3) freak out at option 1 and 2, and give up and just run pythonwebkit *without* ....  it doesn't??23:26
kanzureit just includes some stuff?23:26
lkclis it based on Webkit2?23:27
kanzuredon't know23:27
kanzureit looks like it's not qtwebkit 2.223:27
lkclwhat der f**???23:28
kanzureyeah it looks like it's webkit123:28
kanzure"However, the fact that WebKit1 API is considered "obsolete" means that at some point, we will not be able to use the latest and greatest WebKit features anymore."23:28
lkclwhere's the makefile showing the #includes23:28
kanzurethey use qmake23:28
kanzureto be fair i don't know how qmake works :)23:29
lkclif they're using QtWebKit then all they are doing is exactly as i described above... except not calling the Qt version of "show window"23:29
lkclso they *are* still "rendering".... just not rendering *on-screen*.23:29
lkclor, more specifically, the code _to_ render is there, but it's just not called.23:30
kanzureso they didn't patch webkit?23:30
lkclthe above commenting-out that i described is *exactly* the same trick.  line 82 removes the gtk "show all windows"23:30
lkclthat's correct - they didn't.23:30
lkclall they're doing is firing up a qtwebkit instance and then not showing it on-screen.23:30
lkclthe same trick is pulled in one of the webkitgtk test applications, i forget its name.23:31
kanzurein your python bindings is the dom-touching-python (in a WebPage i think it's called) sandboxed from the other code?23:32
lkclanyway - in that tutorial:23:32
lkcljust remove the "gtk.show"23:32
lkcland you'll achieve exactly the same thing23:32
lkcli have no idea what you mean by "sandboxed".23:32
kanzurein phantomjs you create a page object and can call page.evaluate(anonymous js function)23:33
kanzurebut the contents of the function can't access anything outside of the page's 'context'23:33
kanzurepage.open('http://www.google.com/', function(status) { console.log(document.location); });23:34
lkclyeah - ok, i didn't add javascript evaluation functions because webkitgtk doesn't have a means to convert the return results into meaningful information23:34
kanzurebut you did seem to have python examples of accessing the dom23:34
lkclthe webkitqt team did translation of results into qt object types23:34
lkclit's done *entirely* through python.23:34
lkclthere is absolutely *no* javascript involved, *whatsoever*.23:34
fennbut js in the page can change the DOM in important ways23:34
lkclyes it can.23:34
kanzureerm, my point is, the javascript is "sandboxed" in phantomjs.. like you're not ever touching the DOM from your main js23:35
lkcland python can change the DOM in exactly the same "important" ways... in a declarative fashion [from quotes outside quotes]23:35
kanzureis it the same way in pythonwebkit?23:35
lkclno, because you cannot activate the running of any javascript *at all* from webkitgtk, period.23:35
lkclok that's not quite true, but....23:36
kanzurei'm not talking about javascript :P hrmm23:36
kanzurelet's look at http://pyxpcomext.mozdev.org/no_wrap/tutorials/hulahop/xpcom-hulahop.html23:36
kanzureunder _loaded23:36
lkclyep sure23:36
kanzureis that code normal python?23:36
kanzurecan it access globals or whatever23:36
lkclyes it is entirely normal python.23:36
lkclyes it can23:36
kanzureok. in phantomjs the answer is no ;)23:36
lkclbecause it's pure python.23:37
lkclright - ok, i see what you mean23:37
kanzurealright cool23:37
kanzurethat's great23:37
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has quit [Quit: Nettalk6 - www.ntalk.de]23:37
lkclit's a one-way street.23:37
kanzurewhat? in pythonwebkit it is?23:37
lkclyou can do tricks such as add a script node to the DOM however :)23:38
kanzurephantomjs is a little weird because you can't actually access the DOM except from inside the page's javascript context23:39
kanzureso you can only hope-and-pray by passing giant hashes/json back and forth between page.evaluate() calls23:40
lkclyes.  it's a bitch.23:40
lkclsomeone actually did a port of pyjamas-desktop using a similar trick, to webkitqt423:41
lkclit got a looong way before being declared a complete failure23:41
lkclexecution of javascript code-snippets for *everything*.23:41
lkcltruly truly dreadful :)23:41
kanzureso! can you convince me to use pythonwebkit-gtk over hulahop?23:41
lkclnope - that's up to you.23:41
kanzuresome code evangelist you are.23:42
lkclit depends on what you want / need23:42
lkclit makes no odds to me :)23:42
kanzurewell, depending on gecko seems a little weird23:42
lkclpyjamas-desktop works on both... *and* on MSHTML under w32!23:42
lkclso if you were a windows fiend you'd even be able to do the same trick there!23:42
lkclthe code you'd be looking for is pyjd/mshtml.py23:42
lkcland, once again, you just don't show the w32 GUI window23:43
* lkcl shrugs23:43
kanzurehrmm i'm going to try out hulahop then.23:43
kanzuresince it probably doesn't have a 2.9 GB git repo23:43
lkclbottom line is: you can actually do *all three* major browser engines if you really wanted to23:43
lkclha ha23:43
lkclyou got debian?23:43
lkclok, don't use debian/unstable, use debian/testing23:44
lkcland grab xulrunner-9.0-dev23:44
kanzurei think i'm on wheezy :/23:44
kanzurei'll take a look23:44
lkcldo "apt-get build-dep python-hulahop"23:44
lkcletc. etc.23:44
lkclbut then grab the source code from here:23:44
lkcldon't for god's sake use xulrunner 1023:44
kanzurewait you also wrote hulahop?23:45
lkclhell no.23:45
kanzureok just a mirror :323:45
lkclno it was a quick-hacked fix to get it to work23:45
lkclthe olpc-sugar team gave up on hulahop 6 months ago... oh dearie me are they in for a shock23:46
kanzurethese are not the words that indicate to me that any of this is stable23:46
kanzureokay let me see if i can get a backport of xulrunner-9.0-dev23:46
lkcli recommend you just add debian/testing and don't worry about it.  use apt-pin priorities23:47
kanzurejrayhawk: how do i use that23:47
lkclor just use "apt-get -t testing install xulrunner-9.0-dev" etc. etc.23:47
lkclwhich will be rather long-winded23:47
kanzureE: Unable to locate package xulrunner-9.0-dev23:48
lkclbackporting of xulrunner will take several hours23:48
kanzurewell i guess i should update my sources23:48
lkclii  xulrunner-9.0                            9.0.1-1                          XUL + XPCOM application runner23:48
kanzureyeah there's still no xulrunner-9.0-dev package being found?23:49
lkcldeb http://ftp.uk.debian.org/debian/ testing main contrib non-free23:49
lkcldeb-src http://ftp.uk.debian.org/debian/ testing main contrib non-free23:49
kanzureok maybe it's in testing contrib23:49
kanzurei only see xulrunner-9.0-dbg and xulrunner-9.023:50
lkcllkcl@teenymac:~/src/python-tv/WebKit$ apt-cache search xulrunner-9.023:50
lkclxulrunner-9.0 - XUL + XPCOM application runner23:50
lkclhmmm.... i also have this: deb http://ftp.uk.debian.org/debian/ experimental non-free23:51
lkclbut that's experimental *non-free*.  hmm...23:52
lkclyep - don't know.  up to you to sort out :)23:52
kanzurepyxpcom is supposed to be in testing?23:52
lkclpyxpcom has been around for a while, but it was formerly part of xulrunner's source package23:52
fennmaybe i'm a bit slow, but http://packages.debian.org/wheezy/xulrunner-dev23:52
lkclit's now independent23:52
rdbDoes anyone know how much the magnetic stirrers in ordinary lab heaters affect magnetic implants?23:53
lkclno, you _definitely_ don't want the older version23:53
lkclok there's another way: if you can get debian/lenny such that you end up with xulrunner 1.9.123:53
lkclor debian/blahblah with xulrunner 1.9.123:53
fennrdb: quite a lot i'd imagine23:53
lkclyou will *not* need to do any kind of source code compiling.23:53
rdbIt sounds quite painful.23:53
fennafter a while tissue grows around the magnet and immobilizes it a bit23:54
rdbDoes it hurt when the magnet gets a tug and flips around?23:54
lkclthe last "stable" version of python-hulahop which installs out-of-the-box was 18+ months ago, and it used xulrunner-1.923:54
kanzurelkcl: xulrunner 1.9.1 sounds a bit old?23:54
Steel_rdb: the magnet shouldn't23:54
Steel_one of the things I'm planning on running later this year hopefully are some FE simulations of magnet implants in flesh23:55
lkclkanzure: it does the job.  i'm still using firefox 3.5 and that uses xulrunner 1.9.223:55
lkcli get absolutely no problems with it, other than f****g stupid google advertising f****g chrome at me nyah nyah youuu're usiiing an ooold version of firefox that weeeee can't be bothered to suppport23:56
kanzurethe python-hulahop package is also grabbing xulrunner-9.023:56
lkclwhine, whine23:56
lkclyep there you go.23:56
kanzureis that bad23:56
lkclno you need that - that's the runtime.23:56
kanzureh it's python-xpcom23:56
kanzure*ah it's23:56
lkclbut if you want to recompile for yourself you _will_ need xulrunner-9.0-dev23:56
lkclthat you can get with "apt-get build-dep python-hulahop".23:57
kanzurewhat do i need besides xulrunner, python-hulahop, python-xpcom?23:57
lkclonce you've done that, then grab the source from the url i posted23:57
lkclnothing else.23:57
lkcloh... the source code from that tutorial, obviously.23:57
lkclthen you do dpkg-buildpackage -rfakeroot -nc23:58
lkclcd into the hulahop directory obviously23:58
kanzureyou mean this? http://lkcl.net/hulahop/sugar-hulahop-0.8.1.success.tgz23:58
lkclyep that's it23:58
lkclso cd sugar-hulahop-0.8.123:58
lkclthen do23:58
lkcldpkg-buildpackage -rfakeroot -nc23:58
lkclthen install the resultant .deb which will be in the directory *below*23:59
lkcland you're done.23:59
kanzuredpkg-checkbuilddeps: Unmet build dependencies: cdbs (>= 0.4.90~) python-all-dev dh-buildinfo xulrunner-dev (>= 1.9~rc2) python-gtk2-dev23:59
lkclyou're on your way23:59
kanzureoh xulrunner-dev exists23:59
kanzurei see23:59
lkcli _did_ say do "apt-get build-dep python-hulahop" :)23:59
--- Log closed Sun Feb 19 00:00:12 2012

Generated by irclog2html.py 2.15.0.dev0 by Marius Gedminas - find it at mg.pov.lt!