--- Log opened Sat Feb 18 00:00:00 2012 | ||
--- Day changed Sat Feb 18 2012 | ||
rdb | those are pretty much the three reasons people come to amsterdam | 00:00 |
---|---|---|
yashgaroth | I'm meeting family there anyway, but surely there's museums and stuff | 00:00 |
yashgaroth | though I was there a couple years ago and saw all the museums, so hookers & drugs it is | 00:00 |
rdb | you used to be able to buy psilocybin mushrooms as a tourist, but they banned that in 2008, you can only buy a growkit now. psilocybin truffles are still legal though | 00:03 |
Stee| | rdb: How far is anywhere in the netherlands from anywhere else in the netherlands? | 00:03 |
* rdb hates governments. | 00:03 | |
yashgaroth | 10 minutes | 00:03 |
rdb | Stee|, how do you mean that? | 00:03 |
Stee| | like, how far would it take you to get to amsterdam | 00:03 |
rdb | I live relatively close, but it probably would take me still an hour or two | 00:04 |
Stee| | how long, rather | 00:04 |
Stee| | ah | 00:04 |
Stee| | clearly you should go get drunk with yashgaroth | 00:04 |
yashgaroth | where you at rdb, the hague? | 00:04 |
rdb | I don't drink alcohol. | 00:04 |
rdb | I don't think that ethanol brings me any effects that I find useful, so... plus its destructive, unlike many other drugs | 00:05 |
rdb | yashgaroth, near gouda | 00:05 |
yashgaroth | ah, I hope to visit the cheese market | 00:06 |
rdb | <3 gouda cheese | 00:06 |
yashgaroth | awww yeeee | 00:06 |
Stee| | welp | 00:23 |
Stee| | I'mm going to go lay down I think | 00:23 |
Stee| | *I'm | 00:23 |
yashgaroth | same here, g'night | 00:25 |
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has quit [Quit: Leaving] | 00:25 | |
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap | 00:54 | |
-!- Mokbortolan_1 [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Quit: Leaving.] | 00:55 | |
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has quit [Ping timeout: 245 seconds] | 01:03 | |
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has joined ##hplusroadmap | 02:23 | |
-!- klafka [~textual@ip-64-139-28-14.sjc.megapath.net] has joined ##hplusroadmap | 02:38 | |
-!- marainein [~marainein@114-198-65-190.dyn.iinet.net.au] has joined ##hplusroadmap | 02:55 | |
-!- chris_99 [~chris_99@unaffiliated/chris-99/x-3062929] has joined ##hplusroadmap | 03:06 | |
rdb | the more I learn about it, the more I want to get a magnetic implant | 03:48 |
-!- yottabit [~heath@unaffiliated/ybit] has joined ##hplusroadmap | 04:19 | |
-!- yottabit [~heath@unaffiliated/ybit] has quit [Quit: Konversation terminated!] | 04:32 | |
chris_99 | http://www.genome.gov/images/content/cost_per_genome.jpg | 04:50 |
-!- ThomasEgi [~thomas@pppdyn-6e.stud-ko.rz-online.net] has joined ##hplusroadmap | 04:51 | |
-!- ThomasEgi [~thomas@pppdyn-6e.stud-ko.rz-online.net] has quit [Changing host] | 04:51 | |
-!- ThomasEgi [~thomas@panda3d/ThomasEgi] has joined ##hplusroadmap | 04:51 | |
archels | kanzure: crazy prices for this old mechanical stuff, http://www.sciquip.com/browses/detailed_item_view.asp?productID=26051&Mfg=MICROMANIPULATOR&Mdl=550 | 05:06 |
archels | http://www.ebay.de/itm/Motorisierter-dreiachsiger-Mikromanipulator-Marzhauser-Steuergerat-/290669989423?pt=Laborger%C3%A4te_instrumente&hash=item43ad480e2f | 05:14 |
chris_99 | what's that archels? | 05:16 |
archels | two XYZ micromanipulators | 05:21 |
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Read error: Connection reset by peer] | 05:41 | |
-!- augur [~augur@208.58.5.87] has quit [Remote host closed the connection] | 06:12 | |
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has joined ##hplusroadmap | 06:45 | |
-!- ParahSailin__ [~parahsail@adsl-69-151-205-240.dsl.hstntx.swbell.net] has joined ##hplusroadmap | 07:15 | |
-!- ParahSailin [~parahsail@unaffiliated/parahsailin] has quit [Ping timeout: 248 seconds] | 07:17 | |
-!- JayDugger [~duggerj@pool-173-74-78-36.dllstx.fios.verizon.net] has quit [Quit: Leaving.] | 07:25 | |
-!- anelma [~elmom@hoas-fe3ddd00-25.dhcp.inet.fi] has quit [Remote host closed the connection] | 08:33 | |
-!- augur [~augur@129.2.129.35] has joined ##hplusroadmap | 08:33 | |
ParahSailin__ | morning | 08:37 |
rdb | morning | 08:42 |
rdb | evening actually | 08:42 |
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Quit: jmil] | 08:43 | |
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has joined ##hplusroadmap | 08:59 | |
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has quit [Remote host closed the connection] | 09:01 | |
-!- jmil [~jmil@SEASNet-148-05.seas.upenn.edu] has joined ##hplusroadmap | 09:17 | |
-!- elmom [~elmom@hoas-fe3ddd00-25.dhcp.inet.fi] has joined ##hplusroadmap | 09:19 | |
-!- pasky_ [pasky@nikam.ms.mff.cuni.cz] has joined ##hplusroadmap | 09:37 | |
-!- jrayhawk_ [~jrayhawk@nursie.omgwallhack.org] has joined ##hplusroadmap | 09:37 | |
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Ping timeout: 240 seconds] | 09:37 | |
-!- pasky [pasky@nikam.ms.mff.cuni.cz] has quit [Ping timeout: 240 seconds] | 09:37 | |
-!- jrayhawk [~jrayhawk@nursie.omgwallhack.org] has quit [Ping timeout: 240 seconds] | 09:37 | |
-!- gedankenstuecke [~bastian@phylomemetic-tree.de] has quit [Ping timeout: 240 seconds] | 09:37 | |
-!- epitron [~epitron@unaffiliated/epitron] has quit [Ping timeout: 240 seconds] | 09:37 | |
-!- epitron [~epitron@bito.ponzo.net] has joined ##hplusroadmap | 09:37 | |
-!- gedankenstuecke [~bastian@phylomemetic-tree.de] has joined ##hplusroadmap | 09:37 | |
-!- epitron [~epitron@bito.ponzo.net] has quit [Changing host] | 09:37 | |
-!- epitron [~epitron@unaffiliated/epitron] has joined ##hplusroadmap | 09:37 | |
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap | 09:38 | |
chris_99 | anyone heard of transcranial direct current stimulation | 09:43 |
archels | Some dude in ##neuroscience just asked about it. ;) | 09:49 |
mag1strate | isnt it mostly used for pschological disorders? | 09:50 |
chris_99 | yeah i know, thats why i was asking archels | 09:51 |
chris_99 | and yes it does seem so, although it might be similar to TMS | 09:52 |
chris_99 | in some ways | 09:52 |
mag1strate | hmmm I had no clue | 09:54 |
mag1strate | it would be interesting to look into something like this | 09:54 |
chris_99 | yeah and it looks easy to experiment with | 09:56 |
chris_99 | as its just low current DC | 09:56 |
chris_99 | low voltage too | 09:56 |
chris_99 | although i don't really fancy attaching electrodes with electricity to my head | 09:57 |
mag1strate | thats usually a smart move lol | 09:58 |
mag1strate | the problem is I dont see any practical applications for it | 09:58 |
mag1strate | most is used to treat disease | 09:59 |
mag1strate | unless we can control where the current will be flowing to | 09:59 |
chris_99 | could could make a DIY TMS device | 09:59 |
mag1strate | you can make one | 10:00 |
chris_99 | oops bad spelling there | 10:00 |
chris_99 | has anyone done that? | 10:00 |
mag1strate | but it would be nice to experiement where the electrodes would go to make it benificial | 10:00 |
kanzure | yes lots of others in here have heard about tdcs | 10:12 |
kanzure | and i think one or two built a tdcs setup | 10:12 |
kanzure | although i think collecctively this channel has more experience with magnetic stimulation | 10:13 |
-!- strages_home [~strages@adsl-98-67-175-14.shv.bellsouth.net] has joined ##hplusroadmap | 10:13 | |
chris_99 | has anyone built a magnetic stimulation device? | 10:15 |
kanzure | superkuh worked on something | 10:18 |
kanzure | at the moment i'm more interested in ultrasound stimulation | 10:18 |
kanzure | http://diyhpl.us/~bryan/papers2/neuro/ultrasound/ | 10:18 |
mag1strate | I enjoy that I can't give myself and want to share with someone special. | 10:19 |
mag1strate | I enjoy that I can't give myself and want to share with someone special. | 10:19 |
mag1strate | lol | 10:19 |
mag1strate | my middle click button is a paste and enter button | 10:19 |
kanzure | "it is thought that the nonthermal actions of US are understood in terms of cavitation - for example, radiation force, acoustic streaming, shock waves, and strain neuromodulation, where US produces fluid-mechanical effects on the cellular environments of neurons to modulate their resting membrane potentials." | 10:20 |
kanzure | "The direct activation of ion channels by US may also represent a mechanism of action, since many of the voltage-gated sodium, potassium, and calcium channels influencing neuronal excitability possess mechanically sensitive gating kinetics (Morris and Juranka, 2007)." | 10:20 |
chris_99 | oh i've not heard of ultrasound stimulation | 10:20 |
mag1strate | do you know of any positive effects of this kanzure? | 10:21 |
chris_99 | i'm hopefully in the process of ordering some ultrasound transducers from china | 10:21 |
kanzure | mag1strate: 2mm targetting of regions in the brain | 10:21 |
mag1strate | I can see maybe positive effects on the cellular environemtn level | 10:21 |
kanzure | it's neural stimulation | 10:21 |
mag1strate | hmmm | 10:21 |
kanzure | soo if you have a 2mm chunk you want to stimulate somewhere.. it's pretty useful | 10:21 |
mag1strate | I've never really seen US used on the brian | 10:21 |
kanzure | rTMS has more like 1cm resolution | 10:21 |
kanzure | mag1strate: check those papers.. | 10:21 |
kanzure | one of the studies was to remove an inoperable brain tumor | 10:22 |
kanzure | by melting the tumor. | 10:22 |
mag1strate | was it successful? | 10:25 |
kanzure | yes | 10:25 |
mag1strate | noice | 10:26 |
mag1strate | if it was for melting the tumor, would it effect actual brain tissue? | 10:26 |
kanzure | really the ideal setup would be one where you can apply a certain amount of energy to any location within the brain | 10:26 |
kanzure | mag1strate: the brain tumor study was just a high-power version | 10:26 |
kanzure | here's a low power version: | 10:27 |
kanzure | http://www.youtube.com/watch?v=RGEP6iWLsvQ | 10:27 |
mag1strate | interesting | 10:28 |
mag1strate | it seemed the low power version seemed almost like an impulsive shock to the brain area | 10:28 |
kanzure | yes.. it's a mechanical compression wave that goes into the skull | 10:29 |
kanzure | when you have 10 or 50 transducers the compression waves add up | 10:29 |
kanzure | so when they geometrically intersect the power delivery increases | 10:30 |
kanzure | erm.. the total mW/mm^2 increases. you get the idea. | 10:30 |
kanzure | i just had the most fascinating time traveling dream | 10:31 |
kanzure | apparently it's a dream of mine to one day own giraffes and t-rex's and force them to fight against each other | 10:32 |
mag1strate | lolol | 10:32 |
mag1strate | That would be really cool actually | 10:32 |
mag1strate | I have always wanted sharks with lazer beams on their heads :/ | 10:33 |
kanzure | a shark tank doesn't cost that much | 10:33 |
kanzure | http://omicsomics.blogspot.com/2012/02/oxford-nanopore-doesnt-disappoint.html | 10:38 |
chris_99 | is their presentation online? | 10:40 |
mag1strate | kanzure: lol | 10:40 |
kanzure | however you might have to go wrestle your own shark off the coast | 10:41 |
mag1strate | Thats the easy part | 10:48 |
mag1strate | the hardest part is the lazer beams | 10:48 |
ParahSailin__ | as insty would say "faster, please" | 10:49 |
kanzure | i prefer this edit of doc brown: http://www.youtube.com/watch?v=KJRh-37H4fA | 11:02 |
chris_99 | haha | 11:06 |
chris_99 | isn't it rather dangerous someone could get the DNA for the pneumonic plague off the net? | 11:12 |
rkos | i think synthesizing companies ban sequences of too dangerous things | 11:14 |
chris_99 | thats scary as hell to me though | 11:14 |
kanzure | chris_99: doesn't matter, they already have it | 11:14 |
chris_99 | who already have it? | 11:15 |
kanzure | obscurity is not security | 11:15 |
chris_99 | true i agree with that normally, but in this case | 11:15 |
kanzure | the best defense against plagues is a biological solution | 11:15 |
kanzure | we have an immune system for a reason | 11:15 |
chris_99 | is there a vaccine for the plague? | 11:16 |
kanzure | if there isn't, sounds like an important thing to make, no? | 11:16 |
chris_99 | it does yeah, but i'm really suprised the DNA is available | 11:17 |
Stee| | no hangover, this is good | 11:18 |
rkos | is it available? | 11:18 |
chris_99 | yes | 11:18 |
rkos | but dont dna foundries refuse to synthesize sequences of viruses etc? | 11:19 |
kanzure | jrayhawk_: do you have any interest in doing a mirror of ftp://ftp.ncbi.nlm.nih.gov/genomes/ | 11:20 |
jrayhawk_ | Could do. | 11:20 |
-!- jrayhawk_ is now known as jrayhawk | 11:21 | |
kanzure | to my knowledge, there are no mirrors | 11:21 |
jrayhawk | Huh. | 11:21 |
kanzure | which is bad. | 11:21 |
kanzure | oh | 11:22 |
kanzure | http://biomirror.aarnet.edu.au/biomirror/ncbigenomes/ | 11:22 |
kanzure | http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/genomes/ | 11:22 |
kanzure | well, there should be a non-institutional mirror somewhere | 11:23 |
kanzure | http://ftp.cbi.pku.edu.cn/pub/database/genomes/ | 11:23 |
jrayhawk | What's wrong with institutional mirrors | 11:23 |
kanzure | and some of these mirrors look a bit stale (that last one was from 2010?) | 11:23 |
kanzure | i don't trust universities to always keep them up | 11:23 |
kanzure | if feds come knocking, etc. | 11:23 |
jrayhawk | Ah, I see. | 11:23 |
kanzure | also, apparently i don't trust these universities to keep their mirrors current o__o | 11:24 |
kanzure | nice set of backups: http://ftp.cbi.pku.edu.cn/pub/database/ | 11:24 |
kanzure | weird they're using some perl module i think, but it doesn't appear on cpan | 11:26 |
kanzure | http://ftp.cbi.pku.edu.cn/pub/biomirror/software/biomirror/BioMirror.pm | 11:26 |
kanzure | http://ftp.cbi.pku.edu.cn/pub/biomirror/software/biomirror/BioMirror/ | 11:26 |
-!- delinquentme [~asdfasdf@c-67-171-66-113.hsd1.pa.comcast.net] has joined ##hplusroadmap | 11:26 | |
Urchin | mirrors of what? | 11:49 |
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has quit [Ping timeout: 260 seconds] | 12:03 | |
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has joined ##hplusroadmap | 12:08 | |
kanzure | Urchin: gnomes | 12:27 |
kanzure | *genomes | 12:27 |
jrayhawk | the world has enough gnome mirrors | 12:29 |
-!- Technicus [~Technicus@108.198.137.39] has joined ##hplusroadmap | 12:33 | |
-!- Technicus [~Technicus@108.198.137.39] has quit [] | 12:40 | |
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has joined ##hplusroadmap | 12:47 | |
kanzure | hi jrayhawk | 12:50 |
kanzure | erm.. Jaakko96 | 12:50 |
ParahSailin__ | kanzure, i saw that mutual banking essay on your pdf directory -- this is similar http://praxeology.net/FDT-VS.htm | 12:59 |
kanzure | alright | 13:01 |
-!- Mokbortolan_1 [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap | 13:20 | |
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Ping timeout: 265 seconds] | 13:22 | |
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has joined ##hplusroadmap | 13:35 | |
-!- archels [~foo@sascha.esrac.ele.tue.nl] has quit [Ping timeout: 245 seconds] | 13:35 | |
kanzure | hi yashgaroth | 13:37 |
yashgaroth | hello | 13:37 |
kanzure | why would onLoadFinished() be called 3 times for saks, but not 3 times for heybryan? http://pastebin.com/4csM0qSC | 13:41 |
kanzure | ^for anyone who wants to help out with some javascript | 13:41 |
-!- archels [~foo@sascha.esrac.ele.tue.nl] has joined ##hplusroadmap | 13:47 | |
-!- ParahSailin__ [~parahsail@adsl-69-151-205-240.dsl.hstntx.swbell.net] has quit [Ping timeout: 260 seconds] | 13:51 | |
-!- ParahSailin__ [~parahsail@adsl-69-151-205-240.dsl.hstntx.swbell.net] has joined ##hplusroadmap | 14:07 | |
kanzure | aha.. http://code.google.com/p/phantomjs/issues/detail?id=122 | 14:13 |
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has quit [Ping timeout: 240 seconds] | 14:14 | |
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has quit [Ping timeout: 260 seconds] | 14:21 | |
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has quit [Quit: Nettalk6 - www.ntalk.de] | 14:28 | |
-!- strages_home [~strages@adsl-98-67-175-14.shv.bellsouth.net] has quit [Ping timeout: 245 seconds] | 14:36 | |
-!- strages_home [~strages@adsl-98-67-175-14.shv.bellsouth.net] has joined ##hplusroadmap | 14:38 | |
-!- jmil [~jmil@SEASNet-148-05.seas.upenn.edu] has quit [Read error: Operation timed out] | 14:38 | |
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has joined ##hplusroadmap | 14:40 | |
-!- d3nd3 [~dende@cpc10-croy17-2-0-cust245.croy.cable.virginmedia.com] has quit [Client Quit] | 14:42 | |
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has joined ##hplusroadmap | 14:59 | |
kanzure | hi ianmathwiz7 | 14:59 |
ianmathwiz7 | hey | 15:01 |
ThomasEgi | hoho ianmathwiz7 , long time no chat^ | 15:06 |
ianmathwiz7 | yeah | 15:06 |
ianmathwiz7 | I've been on the ##biohack channel from time to time | 15:06 |
ianmathwiz7 | but I haven't been spending too much time on IRC, lately | 15:07 |
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has joined ##hplusroadmap | 15:13 | |
-!- ianmathwiz7 [~chatzilla@x-134-84-100-61.reshalls.umn.edu] has quit [Quit: ChatZilla 0.9.88 [SeaMonkey 2.7.1/20120208224119]] | 15:18 | |
kanzure | win 13 | 15:22 |
-!- augur [~augur@129.2.129.35] has quit [Remote host closed the connection] | 15:23 | |
bkero | lose 12 | 15:23 |
-!- chris_99 [~chris_99@unaffiliated/chris-99/x-3062929] has quit [Quit: Leaving] | 15:29 | |
kanzure | http://www.reddit.com/r/science/comments/pv7tn/the_high_price_of_knowledge_should_journals/ | 16:03 |
kanzure | http://thepiratebay.se/torrent/6554331/Papers_from_Philosophical_Transactions_of_the_Royal_Society__fro | 16:03 |
ParahSailin__ | https://docs.google.com/open?id=0B3qaT-ZL6aeKMWFhNmIwOGYtNWM2Yi00ZTU0LTkxZjMtZGYzNjUwNWJhZTBm | 16:05 |
ParahSailin__ | like bitcoin but different | 16:05 |
kanzure | what is this? | 16:05 |
kanzure | http://bib.tiera.ru/static | 16:05 |
kanzure | seems to be 24 MB | 16:05 |
kanzure | hmm this is a very poorly organized collection | 16:06 |
kanzure | <tr><td><a href="http://bib.tiera.ru//ShiZ/Homelab/spec116/Kulagina M.A., Kiseleva N.A. Osnovy tehnologicheskogo proektirovaniya sborochno-svarochnyh cehov. | 16:07 |
kanzure | <tr><td><a href="http://bib.tiera.ru//ShiZ/Homelab/spec107/Evstifeev A.V. Mikrokontrollery AVR semejstva Mega. (2007).djvu"> | 16:08 |
kanzure | <tr><td><a href="http://bib.tiera.ru//dvd54/Cavaleiro A. - Nanostructured Coatings(2006)(955).pdf">Cavaleiro A. - Nanostructured Coatings</a><td>pdf<td>en<td> | 16:08 |
kanzure | how the fuck am i supposed to fix this? | 16:08 |
superkuh | I don't know. But at least it's easy to search through and extract URLs. Neat. | 16:10 |
superkuh | Tangentially, http://erewhon.superkuh.com/library/ - always on (fast) daily mirror of my library for when I turn off the main server. | 16:10 |
kanzure | superkuh: we need some way of helping people who keep these mirrors to have better indexes | 16:11 |
kanzure | for instance, at the moment i can't confirm that their copy of journal xyz is complete or not | 16:11 |
kanzure | and they probably don't know which papers belong to which journals either | 16:11 |
kanzure | bibtex could probably work for this? with some tools to search through directories and get file hashes | 16:12 |
delinquentme | http://imgur.com/r/atheism/VEKo5 | 16:13 |
-!- marainein [~marainein@114-198-65-190.dyn.iinet.net.au] has quit [Ping timeout: 260 seconds] | 17:29 | |
-!- ThomasEgi [~thomas@panda3d/ThomasEgi] has quit [Remote host closed the connection] | 17:43 | |
uniqanomaly | http://news.slashdot.org/story/12/02/18/2130245/universities-agree-to-email-monitoring-for-copyright-agency :> | 18:14 |
kanzure | eww slashdot | 18:31 |
kanzure | how is that not dead yet | 18:31 |
kanzure | <tr><td><a href="http://bib.tiera.ru//DVD-034/Agrawal_S._Protocols_for_Oligonucleotide_Conjugates[c]_Synthesis_and_Analytical_Techniques_(1993)(en)(390s).pdf">Agrawal S. Protocols for Oligonucleotide Conjugates[c] Synthesis and Analytical Techniques</a><td>pdf<td>en<td>255<td>21154034</tr> | 18:56 |
kanzure | http://bib.tiera.ru//DVD-022/Arnold_F.H.,_Georgiou_G._Directed_Evolution_Library_Creation._Methods_and_Protocols_(2003)(en)(232s).pdf | 18:57 |
kanzure | http://bib.tiera.ru//DVD-022/Cutler_P._Protein_Purification_Protocols_(2003)(2nd)(en)(496s).pdf | 18:58 |
kanzure | http://bib.tiera.ru//DVD-035/Dooman_S._Protein_Purification_Protocols_(1996)(en)(416s).pdf | 18:58 |
kanzure | http://bib.tiera.ru//DVD-022/English_L.B._(ed.)_Combinatorial_Library_Methods_and_Protocols_(2002)(en)(383s).pdf | 18:59 |
kanzure | http://bib.tiera.ru//DVD-022/Federico_M._Lentivirus_Gene_Engineering_Protocols_(2003)(en)(328s).pdf | 18:59 |
kanzure | http://bib.tiera.ru//DVD-022/Fedoroff_S.,_Richardson_A._Protocols_for_Neural_Cell_Culture_(2001)(3rd)(en)(384s).rar | 18:59 |
kanzure | http://bib.tiera.ru//DVD-022/Findeis_M.A._Nonviral_Vectors_for_Gene_Therapy._Methods_and_Protocols_(2001)(en)(416s).pdf | 19:00 |
kanzure | http://bib.tiera.ru//dvd57/Foster G. D. (Ed), Taylor S. (Ed) - Plant Virology Protocols, Vol. 81(1998)(571).rar | 19:01 |
kanzure | http://bib.tiera.ru//DVD-028/Gartland_K.M.A.,_Davey_M.R._(eds.)_Agrobacterium_Protocols_(1995)(en)(432s).pdf | 19:01 |
kanzure | probably useless http://bib.tiera.ru//DVD-022/Graham_C.A._(ed.),_Hill_A._(ed.)_DNA_Sequencing_Protocols_(2001)(2-nd)(en)(244s).rar | 19:01 |
kanzure | http://bib.tiera.ru//DVD-022/Gray_J.,_Desselberger_U._Rotaviruses._Methods_and_Protocols_(2000)(en)(272s).pdf | 19:01 |
kanzure | http://bib.tiera.ru//DVD-028/Harwood_A.J._Basic_DNA_and_RNA_Protocols_(1996)(en)(528s).pdf | 19:02 |
kanzure | http://bib.tiera.ru//DVD-022/Hofker_M.H.,_van_Deursen_J._Transgenic_Mouse_Methods_and_Protocols_(2002)(en)(392s).pdf | 19:02 |
kanzure | http://bib.tiera.ru//DVD-022/Howe_P.H._Transforming_Growth_Factor-Beta_Protocols_(2000)(en)(176s).pdf | 19:03 |
kanzure | http://bib.tiera.ru//DVD-030/Irvine_G.B.,_Williams_C.H._Neuropeptide_Protocols_(1996)(en)(381s).pdf | 19:03 |
kanzure | http://bib.tiera.ru//DVD-035/Jones_G.E._Human_Cell_Culture_Protocols_(1996)(en)(560s).pdf | 19:03 |
kanzure | http://bib.tiera.ru//DVD-031/Jones_H._Plant_Gene_Transfer_and_Expression_Protocols_(1995)(en)(462s).pdf | 19:03 |
kanzure | http://bib.tiera.ru//DVD-034/Kendall_D.A.,_Hill_S.J._Signal_Transduction_Protocols_(1995)(en)(316s).pdf | 19:04 |
kanzure | http://bib.tiera.ru//DVD-022/Kmiec_E.B._(ed.)_Gene_Targeting_Protocols_(1999)(en)(450s).pdf | 19:04 |
kanzure | http://bib.tiera.ru//DVD-022/Kola_I.,_Tymms_M.J._Gene_Knockout_Protocols_(2001)(en)(448s).pdf | 19:04 |
kanzure | http://bib.tiera.ru//DVD-029/LaRossa_R.A._Bioluminescence_Methods_and_Protocols_(1998)(en)(320s).pdf | 19:04 |
kanzure | http://bib.tiera.ru//DVD-022/Lieberman_B.A._Steroid_Receptor_Methods._Protocols_and_Assays_(2001)(en)(400s).pdf | 19:05 |
fenn | books books books books books books books books books | 19:05 |
kanzure | http://bib.tiera.ru//DVD-022/Lo_B._(ed.)_Antibody_Engineering,_Methods_and_Protocols_(2003)(en)(550s).pdf | 19:05 |
kanzure | booooooks | 19:05 |
kanzure | fenn: help me figure out a way to fix this | 19:05 |
fenn | what's the problem? | 19:05 |
kanzure | lots of people scrape and download lots of this content (which is great) | 19:05 |
kanzure | their organization skills suck | 19:05 |
kanzure | and they can't show that they have a complete collection for a given journal, edition, volume, or whatever | 19:06 |
fenn | hmm | 19:06 |
kanzure | ideally there's some folder organization scheme, or metadata file format that we should all be using | 19:06 |
kanzure | and writing tools to use. | 19:06 |
kanzure | and then some web service that collects this information and maps it all together. | 19:06 |
fenn | i assume the metadata is publically available | 19:06 |
kanzure | it's definitely available at the original soruce | 19:06 |
kanzure | *source | 19:06 |
fenn | then it's a "simple matter of coding" to match the files to the metadata | 19:06 |
kanzure | but most people just take a pdf and give it a title and they feel they are done | 19:07 |
kanzure | i know that's what i did (because i'm an idiot) | 19:07 |
fenn | unfortunately you will probably have to rewrite parsing code for each publisher | 19:07 |
kanzure | well, how about bibtex | 19:07 |
fenn | yeah it's the curse of filesystems | 19:07 |
fenn | what, have a bibtex file for every pdf? i think the problem is people only download a pdf so the (computer parseable) metadata gets lost | 19:08 |
kanzure | well let's only consider scraping scenarios | 19:08 |
fenn | you could build up a hash table of pdf to metadata | 19:08 |
kanzure | where the programmer can afford to take extra metadata | 19:08 |
fenn | sha256(foo.pdf) => metadata | 19:08 |
kanzure | except the pdfs are modified each time you download them | 19:08 |
fenn | o rly | 19:08 |
fenn | what doesn't change then | 19:09 |
kanzure | yeah like AdobePDFStamperPro (watermarking) | 19:09 |
kanzure | sometimes there's metadata in the file itself | 19:09 |
fenn | is there a title field or something at least that stays the same? | 19:09 |
kanzure | sometimes, but for the vast majority of content it's just a scanned image | 19:09 |
fenn | how does mendeley do it? | 19:09 |
kanzure | proprietary ocr | 19:09 |
kanzure | as a fallback. | 19:09 |
fenn | how hard is OCR then? | 19:10 |
fenn | i mean, you just need to match the text to something in a list of titles | 19:10 |
kanzure | tesseract was pretty awful. i don't know, i think the pain of ocr probably goes down for sufficiently large collections | 19:10 |
kanzure | anyway, let's assume this isn't a problem | 19:10 |
kanzure | let's assume that programmers will scrape metadata too | 19:10 |
fenn | "programmers"? | 19:10 |
kanzure | i'm thinking of a particular scenario where people are scraping content | 19:10 |
kanzure | and contributing it to the master collection | 19:11 |
kanzure | and then being able to say "Ok, the internet has pages 1-400 of journal xyz" based on collected records on the server | 19:11 |
kanzure | "THE INTERNET" well.. "this public service" | 19:11 |
kanzure | then if you are feeling like you want to contribute, maybe you'd write a compatible scraper to gather, dump and upload data for "x, y and z" that the server says is missing | 19:12 |
fenn | how to keep it from falling apart again when the site gets shut down? | 19:13 |
fenn | presumably people would have backups or partial backups | 19:13 |
kanzure | and the software is separate anyway | 19:13 |
fenn | but if the pdf hash changes.. | 19:13 |
kanzure | *shrug* that just hurts verification | 19:14 |
kanzure | web of trust bullshit can maybe offset that | 19:14 |
kanzure | "hey look, all the contributions from anonymous user with this public key are all awful" | 19:14 |
fenn | oh i didnt even think of that | 19:14 |
fenn | something like git annex would be preferred | 19:15 |
fenn | so you as the library maintainer would only accept patches that look legit | 19:15 |
fenn | i've never seen any working web of trust software | 19:15 |
kanzure | sure. and again different computers and friendlies will contribute 'chunks' of scraped content / metadata that gets distributed from some primary server | 19:15 |
fenn | (doesnt mean it doesnt exist) | 19:15 |
kanzure | well, i do think contributions would need to be reviewed somehow | 19:16 |
kanzure | a basic method might be "show us your scraper" | 19:16 |
fenn | wouldnt looking at the content make more sense? | 19:16 |
kanzure | yes but how am i going to manually look at 7 million articles a month | 19:17 |
kanzure | there might be some computational way to determine if a paper looks like what it says it is | 19:17 |
fenn | you do a random sampling | 19:17 |
kanzure | well ok. | 19:17 |
fenn | yes, document clustering | 19:17 |
kanzure | so far i've met nobody that has indicated they would want to spam a service like this | 19:17 |
fenn | there really arent very many people working for journals | 19:17 |
kanzure | hm? | 19:18 |
fenn | many many more pissed off students who don't have access and want to fix this broken situation | 19:18 |
kanzure | right.. and "Here's 20,000 files with some bad names" does not help as much as it could | 19:18 |
fenn | i think you're just going to have to ignore filenames | 19:18 |
kanzure | fine by me | 19:18 |
fenn | start with the nature archive | 19:19 |
kanzure | i don't want to bother with parsing everyone's horrible dump | 19:19 |
fenn | get metadata, figure out how to computationally extract matching metadata from the articles themselves | 19:19 |
kanzure | what's wrong with me forcing them to go get metadata | 19:19 |
kanzure | yeah, i can't easily reverse from the nature archive "what the original url was so i can go grab metadata" | 19:19 |
fenn | well, how do you attach the metadata to the files in the first place? | 19:19 |
kanzure | bibtex, some simple yaml format, let's make something up | 19:20 |
kanzure | then that will be the definition, and the service will grow around that definition. | 19:20 |
* fenn reads about bibtex | 19:20 | |
kanzure | it's latex except a citation subset | 19:20 |
kanzure | i don't know. everything supports bibtex. | 19:20 |
fenn | "It is possible to use BibTeX outside of a LaTeX-Environment, namely MS Word using the tool Bibshare. " okay but how is this supposed to help for the pdf scenario | 19:21 |
kanzure | example: http://diyhpl.us/~bryan/sciencedirect/microelectronics.journal.txt | 19:21 |
fenn | cat you just cat foo.bibtex >> foo.pdf | 19:21 |
kanzure | well i've always considered a .pdf.tar format that would include that, but whatever | 19:21 |
kanzure | yes there's metadata portions in the pdf format, and probably some way to attach files.. but i don't care that much | 19:22 |
kanzure | so my example link | 19:22 |
kanzure | i had a bibtex file for each volume of each issue of this journal | 19:22 |
kanzure | and cat'd it together. so think of this as a single issue of a journal with a looong list of articles | 19:22 |
fenn | .pdf.tar sucks because your pdf indexing software won't read it | 19:23 |
kanzure | proprietary pdf indexing software is not the solution anyway | 19:23 |
kanzure | it's a part of the problem. proprietary ocr? thanks mendeley, that doesn't actually help | 19:24 |
fenn | tracker-search does pdf indexing | 19:24 |
fenn | time tracker-search transcranial | 19:24 |
fenn | real 0m0.080s | 19:24 |
fenn | unfortunately it returns urls instead of paths | 19:24 |
kanzure | how is this better than just keeping the original metadata from the webpage clean | 19:25 |
fenn | eh? | 19:25 |
fenn | it works with just the pdf files | 19:25 |
kanzure | i don't trust pdf files to have this information, and i don't trust ocr that much | 19:25 |
fenn | essentially you have to write a scraper either way | 19:25 |
kanzure | sure | 19:25 |
fenn | scrape metadata from the webpage, or scrape it from the pdf | 19:25 |
kanzure | well you have to get the webpage anyway to get to the pdf | 19:26 |
fenn | i'd rather scrape from the pdf because we already have those | 19:26 |
kanzure | but you can't do that reliably :/ | 19:26 |
fenn | you're guaranteed to have the pdf, but the webpage could disappear, change, or never exist in the first place | 19:26 |
kanzure | lots of pdfs are just images | 19:26 |
kanzure | well you only get the webpage once, you see | 19:26 |
kanzure | after you extrat all the data it might as well disappear (who cares) | 19:26 |
fenn | okay maybe you should just do that first | 19:26 |
kanzure | ? | 19:27 |
fenn | get all the journal metadata | 19:27 |
fenn | there's "only" 80,000 journals | 19:27 |
kanzure | right, i'm presently doing that for most of elsevier (although it seems they hide some of their metadata to me unless i'm on a university network) | 19:27 |
fenn | wow really? what do they hide? | 19:27 |
kanzure | like issues | 19:27 |
kanzure | one of the journals i was looking at went back to 2003 | 19:27 |
kanzure | but on another computer, i saw that it went back to 1990something | 19:28 |
kanzure | same site. | 19:28 |
fenn | is this data considered "copyright"? i mean it's basically a library card catalog, so there shouldn't be any problem with hosting journal metadata out in the open | 19:28 |
kanzure | exactly | 19:28 |
kanzure | i'm sure they will complain anyway | 19:28 |
fenn | but the mere existence would help others doing the same sort of thing | 19:29 |
fenn | i mean there's no reason everyone should have to scrape metadata from the web | 19:29 |
kanzure | and then we can coordinate multiple scrapers simultaneously or at least help people to not duplicate work | 19:29 |
-!- _sol_ [Sol@c-174-57-58-11.hsd1.pa.comcast.net] has joined ##hplusroadmap | 19:29 | |
kanzure | so, i guess it's just a matter of bibtex+pdf? | 19:34 |
fenn | bibtex is just a text format | 19:34 |
kanzure | sure | 19:34 |
fenn | how does that fix anything? | 19:34 |
kanzure | well ultimately what we need from a scraper is the pdf plus metadata | 19:34 |
kanzure | i guess not. that doesn't fix it. | 19:35 |
fenn | you want to know if a collection is "complete", no? | 19:35 |
kanzure | i want each collection to have an index and know what it has | 19:35 |
kanzure | and then to describe each item. | 19:35 |
fenn | okay | 19:35 |
fenn | the index could just be bibtex | 19:35 |
fenn | one big file with bibtex entries for all pdf files i the collection | 19:36 |
kanzure | i guess so. does that cover everything? | 19:36 |
fenn | alternatively, you could have one bibtex file for each pdf file | 19:36 |
fenn | they amount to the same thing i guess | 19:36 |
fenn | neither one "sticks" to the pdf file though | 19:37 |
fenn | it would be nice to somehow append bibtex to the pdf | 19:37 |
fenn | i dont know enough about modifying pdf's to know if this is easy | 19:37 |
fenn | pdf ends with %%EOF so you could just literally append bibtex and it shouldnt hurt anything | 19:38 |
kanzure | pdftk html_tidy.pdf attach_files command_ref.html to_page 24 output html_tidy_book.pdf | 19:38 |
fenn | metadata doesn't need to be visible in the pdf viewer | 19:38 |
kanzure | you mean EOF or a literal that says '%%EOF' | 19:39 |
fenn | literal | 19:39 |
fenn | tail foo.pdf | 19:39 |
fenn | bbl maybe | 19:40 |
kanzure | well that's weird. | 19:41 |
kanzure | so doesn't citeseer do citation tracking | 19:42 |
kanzure | or citeulike | 19:42 |
kanzure | and i guess mendeley has an ok collection by now. | 19:42 |
kanzure | "Scientific Literature Digital Library incorporating autonomous citation indexing, awareness and tracking, citation context, related document retrieval, similar" | 19:42 |
kanzure | http://citeseer.ist.psu.edu/index | 19:42 |
fenn | i think this is not solving the same problem | 19:45 |
kanzure | no not quite | 19:45 |
fenn | there's the forwards metadata problem (start with a catalog, link to the articles) | 19:45 |
fenn | and the backwards metadata problem (start with the articles, link to the catalog) | 19:45 |
fenn | wow citeseer search sucks balls | 19:50 |
fenn | "here are some articles that contain keywords that you typed in, in no particular order" | 19:51 |
kanzure | http://diyhpl.us/~bryan/papers2/bib.tiera.ru/protocols.txt | 20:02 |
fenn | document clustering is probably more useful than journal indexes anyway | 20:04 |
fenn | i'm wondering how many of my books are essentially duplicates | 20:05 |
fenn | what's bib.tiera.ru? | 20:06 |
kanzure | no clue | 20:06 |
kanzure | just found it today | 20:06 |
kanzure | looks a bit more hand-curated than libgenesis | 20:06 |
kanzure | If you have any stuff to upload (or to donate), <a href='mailto:s@tiera.ru'>write us</a><br> | 20:08 |
kanzure | http://f2.tiera.ru//TEXTBOOKS3/ELSEVIER-Referex/1-Chemical Petrochemical and Process Collection/CD3/RICE, R. G. (1994). Applied Mathematics and Modelin | 20:14 |
kanzure | g for Chemical Engineers/03771_08.pdf | 20:14 |
kanzure | hmm ELSEVIER-Referex? | 20:14 |
fenn | wonder what's up with their css and general lack of content http://fennetic.net/irc/facebook_huh.png | 20:15 |
fenn | hmm maybe i broke it with adblock | 20:17 |
kanzure | i'm looking at tiera.ru's index | 20:18 |
kanzure | and i'm not sure, but i think some of these were mine | 20:18 |
fenn | hehehe | 20:19 |
kanzure | /other/other3/Chemistry/Chemical engineering/Technology and processing of polymers/ | 20:19 |
kanzure | nobody makes awful paths like i do! | 20:19 |
fenn | spaces! | 20:19 |
kanzure | :/ | 20:19 |
fenn | hey why not use freenet | 20:20 |
fenn | for distributing papers | 20:21 |
kanzure | i don't think distribution is a problem | 20:21 |
kanzure | and realistically for maximum impact you need http | 20:21 |
fenn | how big do you think "all" of the journal archives would be if properly OCR'ed? | 20:25 |
kanzure | "ScienceDirect publishes 250,000 articles a year in 2,000 journals." | 20:26 |
kanzure | sciencedirect indexes >5000 journals though, so i don't know what's up there | 20:26 |
fenn | for the sake of analysis, limit "all" to everything currently offered in digital format online | 20:26 |
fenn | sciencedirect is a brand of elsevier, right? | 20:27 |
kanzure | yes | 20:27 |
kanzure | i think i've seen estimates of 7 to 8 million articles per year at the moment | 20:27 |
fenn | "ScienceDirect is Elsevier's platform for online electronic access to its journals" | 20:27 |
kanzure | yeah | 20:27 |
fenn | so maybe they only have metadata for the other 3000 | 20:28 |
fenn | 8 million per year sounds way too high | 20:28 |
kanzure | they include some things they have "Intellectual Property Rights to index" or something lame | 20:28 |
kanzure | http://www.quora.com/How-many-academic-papers-are-published-each-year | 20:28 |
kanzure | "1.486 million peer-reviewed papers published within 2010" | 20:29 |
kanzure | "They estimate that 1.346 million articles were published in 23,750 journals within 200" | 20:29 |
kanzure | ok that was originally: | 20:29 |
kanzure | "They estimate that 1.346 million articles were published in 23.750 journals within 200" | 20:29 |
kanzure | why would 23.750 make sense | 20:29 |
kanzure | either say 1,346 million and 23,750 or 1.346 million and 23750.00 dfkadkfadja | 20:29 |
kanzure | ny times published this graph once that showed the rate of increase in publications for all countries for at least 40 years | 20:30 |
Stee| | 23.750 is used in some european countries | 20:31 |
fenn | it's confusing because they're inconsistently using the period for either decimal place or thousands separator | 20:32 |
fenn | ok so something like 2million per year | 20:33 |
fenn | this means, at minimum your scraper has to be able to handle 5000+ papers a day | 20:33 |
kanzure | http://duncan.hull.name/2010/07/15/fifty-million/ | 20:33 |
fenn | conversely, 15 seconds maximum processing time per paper (assuming no parallelization) | 20:34 |
kanzure | "One paper per minute is based on 679,858 papers per year in 2009 / 365 days / 24 hours / 60 minutes = 1.29 papers per minute." | 20:35 |
kanzure | hrm actually this might be an ok method to help approximate the collection completeness | 20:35 |
kanzure | "768,341 papers have been written so far in 2012. We have 20,000." | 20:36 |
fenn | maybe we should just focus on getting papers that aren't crap | 20:37 |
kanzure | well that too. | 20:37 |
kanzure | it would be nice to start with some commonly-read journals | 20:37 |
fenn | maybe take your collection and use that as a set of seed points for a web of science crawler | 20:38 |
kanzure | ? | 20:38 |
fenn | presumably anything not cited by or citing the papers cited by the papers in your collection (ugh) is not worth reading | 20:39 |
kanzure | i find strange outliers all the time | 20:39 |
kanzure | i guess i haven't looked at the citation network at all. | 20:39 |
fenn | for all it's been abused by academia, citation network is a pretty cool thing | 20:40 |
-!- nuba [~nuba@pauleira.com] has quit [Ping timeout: 260 seconds] | 20:40 | |
fenn | s/for/despite/ | 20:41 |
kanzure | i've never been sure how researchers remember which paper they read something in | 20:41 |
-!- nuba [~nuba@pauleira.com] has joined ##hplusroadmap | 20:41 | |
kanzure | i can remember big papers where important things happen | 20:41 |
kanzure | but for the small results, that seems much harder | 20:41 |
fenn | they don't; it's all made up | 20:41 |
fenn | you just have to make up a bibliography when writing your paper so you go look around for the originals | 20:41 |
kanzure | i guess if you go looking for something to cite, it's easier | 20:41 |
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Quit: jmil] | 20:42 | |
kanzure | orr they probably just read a review paper and pick out some crap | 20:42 |
fenn | also a trick i've seen is people just look at what papers they've downloaded | 20:42 |
fenn | most academics don't have a very extensive library | 20:42 |
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has joined ##hplusroadmap | 20:43 | |
fenn | that's the library's job after all | 20:43 |
kanzure | the libraries are busy paying too much for all these journals | 20:43 |
fenn | why is there no "complete" journal metadata index? | 20:45 |
-!- jmil [~jmil@c-68-81-252-40.hsd1.pa.comcast.net] has quit [Client Quit] | 20:45 | |
fenn | i mean why does this fall on the shoulders of a couple disgruntled hackers | 20:46 |
kanzure | http://diyhpl.us/~bryan/irc/sciencedirect_journals.json | 20:46 |
fenn | there should be armies of librarians tackling this | 20:46 |
-!- Stee| [~Steel@cpe-67-246-36-165.nycap.res.rr.com] has quit [Ping timeout: 276 seconds] | 20:46 | |
kanzure | it's because "others" got to science first, before the internet got it. | 20:47 |
kanzure | *got to it. | 20:47 |
fenn | but it's really not a hard problem to solve | 20:47 |
kanzure | nope | 20:47 |
fenn | "scrape all the indices, merge, repeat" | 20:47 |
kanzure | except they do | 20:48 |
kanzure | and then they pay for it for some reason | 20:48 |
kanzure | and call it proprietary | 20:48 |
kanzure | isi... | 20:48 |
kanzure | :/ | 20:48 |
fenn | isi is supposed to be the complete metadata index? | 20:48 |
kanzure | hrmm i'm not sure. | 20:48 |
-!- Stee| [~Steel@cpe-67-246-36-165.nycap.res.rr.com] has joined ##hplusroadmap | 20:49 | |
kanzure | ". This intelligent research platform provides access to the world's leading citation databases, including powerful cited reference searching, the Analyze Tool, and over 100 years of comprehensive backfile and citation data." | 20:49 |
fenn | so there are these papers studying h-index etc., what is their data set? | 20:49 |
kanzure | sometimes isi access i think | 20:49 |
kanzure | http://wokinfo.com/about/whatitis/ | 20:49 |
kanzure | it's a hodgepodge of commercial databases i think | 20:49 |
kanzure | "Calculate an accurate h-index by ensuring the full extent of an author’s past research is taken into account." | 20:50 |
fenn | ok this is the citation network, which while cool, is a superset of what i'm asking for | 20:50 |
kanzure | oh right | 20:50 |
fenn | i just want a list of all articles published | 20:50 |
kanzure | worldcat? i think that stops at individual "titles" (of volumes/books) | 20:53 |
Stee| | kanz, any idea how to get access to internal zaibatsu journals? | 20:53 |
fenn | i should be able to google "list of all journal articles published" and find a webpage that lets me download some file with all the metadata | 20:53 |
fenn | Stee|: walk into the office wearing a jumpsuit, say you're here to fix the router | 20:53 |
Stee| | hhaa | 20:53 |
fenn | maybe a bit harder if you're not japanese | 20:54 |
fenn | worldcat is exactly the right model | 20:54 |
fenn | but for some mysterious reason no similar thing exists for journals | 20:54 |
kanzure | worldcat is the supplier of things like ezproxy and interlibrary loans | 20:54 |
fenn | "With authorization from OCLC, you can download a subset of the WorldCat database for harvesting by your search engine or other enterprise Web application." | 20:55 |
kanzure | yep.. commercial | 20:55 |
fenn | http://www.worldcat.org/partnership/harvestset/worldcat_sample_data.xml | 20:55 |
kanzure | 'enterprise' | 20:55 |
fenn | does that mean they sell it or what? | 20:55 |
kanzure | they might be doing license agreements if you end up selling/distributing it, but free for 'research' | 20:56 |
kanzure | but i don't know for sure | 20:56 |
kanzure | what about library of congress? i don't think they track individual articles | 20:56 |
fenn | "Each party shall be entirely responsible for meeting its own costs incurred with respect to the matters described in this Agreement and neither shall be obligated to make any payment to the other under the terms of this Agreement." | 20:56 |
kanzure | uhuh | 20:57 |
fenn | so apparently it's free, if you appear "legit" | 20:57 |
fenn | "Under no circumstances shall Institution/Company sell, license, publish, display, distribute or otherwise transfer to any third party WorldCat Metadata or holdings information or any copy thereof, in whole or in part, except as expressly permitted" | 20:57 |
kanzure | hrm it looks like they do track some articles? | 20:57 |
kanzure | http://www.worldcat.org/search?q=george+whitesides&qt=owc_search#x0%253Aartchap-%2Cx0%253Aartchap%2Bx4%253Adigitalformat | 20:57 |
kanzure | wow what | 20:57 |
kanzure | well what is expressly permitted? | 20:58 |
fenn | it just means you don't have permission to redistribute | 20:58 |
kanzure | haha how useless | 20:58 |
kanzure | god our libraries suck | 20:58 |
fenn | yeah somehow the librarians missed the "information wants to be free" bandwagon | 20:59 |
kanzure | wasn't google supposed to fix this? | 21:02 |
kanzure | or was it only supposed to index the shitty information | 21:02 |
fenn | google "grew up" and got all responsible n shit | 21:03 |
fenn | so now they can't do anything | 21:03 |
fenn | "OCLC does claim copyright rights in WorldCat as a compilation. In accordance with US copyright law, those | 21:04 |
fenn | rights are based on OCLC's substantial intellectual contribution to WorldCat as a whole, including OCLC’s selection, | 21:04 |
fenn | arrangement, and coordination of the material in WorldCat" | 21:04 |
kanzure | riight. | 21:04 |
fenn | fwiw, google is awful about exporting data, despite their "data liberation front" | 21:04 |
kanzure | well then anyone can claim copyright on collection | 21:04 |
kanzure | science liberation front sounds like an awesome name | 21:04 |
fenn | heh | 21:04 |
fenn | you need to wear a ski mask and have stacks of liberated hard drives behind you | 21:05 |
kanzure | not a problem | 21:05 |
kanzure | it can also be like a drug bust: http://addictionrecoveryhope.com/wp-content/uploads/2009/09/Drug-Bust.jpg | 21:06 |
fenn | if anyone cares, here's some discussion and self-justification by oclc on why they don't allow redistribution http://www.oclc.org/worldcat/recorduse/policy/forum/forum.pdf | 21:06 |
kanzure | i think oclc was setup by a bunch of university librarians and that's why it has traction | 21:07 |
kanzure | i'm not really sure why they all agreed to this awful mess | 21:07 |
kanzure | but it's particularly nice of them to all be using ezproxy (it makes it easier once someone finds an exploit) | 21:07 |
kanzure | "the practical need to sustain the economic viability and value of WorldCat over the long term" | 21:08 |
fenn | what about ezproxy? | 21:08 |
kanzure | all universities use it | 21:08 |
kanzure | they all run a local instance of it | 21:09 |
kanzure | so if i was to find a backdoor, i'd have keys to the entire kingdom | 21:09 |
fenn | is that a mirroring software? | 21:09 |
kanzure | no | 21:09 |
kanzure | the best way to explain it is to show you | 21:09 |
fenn | how do you get the database if it's all just web pages? | 21:09 |
fenn | you dont need ezproxy to search worldcat | 21:09 |
fenn | does oclc do something else? | 21:09 |
kanzure | oclc does a lot of things | 21:10 |
kanzure | it's your usual clusterfuck of databases and library integrations | 21:10 |
fenn | here's the thing, there's a unique ISBN for every book | 21:10 |
kanzure | here's an example of ezproxy | 21:10 |
kanzure | http://webserver.macu.edu:2048/ | 21:11 |
kanzure | username: 3952 | 21:11 |
-!- Stee| is now known as Steel_ | 21:11 | |
kanzure | password: 3952 | 21:11 |
fenn | where's the data that goes along with that ISBN submitted to when the author publishes the book? | 21:11 |
kanzure | library of congress somewhere | 21:11 |
kanzure | http://www.loc.gov/rr/ | 21:11 |
fenn | so what good does worldcat do then? why can't we just download the ISBN's from LOC? | 21:12 |
kanzure | because worldcat is also indexing papers | 21:12 |
kanzure | also, worldcat.org is not the primary purpose of worldcat | 21:12 |
kanzure | "Some WorldCat libraries make their specialized reference databases available on their Web sites, but only to library members." | 21:13 |
kanzure | "worldcat libraries" | 21:13 |
fenn | oh great | 21:13 |
fenn | ISBN is run by a for-profit company | 21:14 |
fenn | http://en.wikipedia.org/wiki/R._R._Bowker | 21:14 |
fenn | now owned by ... (drumroll) | 21:14 |
fenn | Elsevier! | 21:14 |
kanzure | here's what oclc is: http://www.oclc.org/us/en/services/a-to-z.htm | 21:14 |
fenn | those fuckers | 21:15 |
kanzure | xisbn is oclc apparently | 21:15 |
kanzure | hah | 21:15 |
kanzure | look at how they list worldcat | 21:15 |
kanzure | "Global network of library content and services that lets your institution be more connected, open and productive" | 21:15 |
kanzure | it's about convincing libraries to sign up with them | 21:15 |
kanzure | even my high school had some weird worldcat integration (it was pretty broken) | 21:16 |
kanzure | did you try that ezproxy login? | 21:17 |
fenn | this is interesting http://isbndb.com/ | 21:17 |
fenn | yes, looks poorly configured | 21:18 |
fenn | reminds me of internet circa 1995 | 21:18 |
kanzure | i'm pretty sure libraries were networked together pre-web | 21:18 |
kanzure | maybe that's why proprietary solutions are so dominant | 21:18 |
fenn | i dont really know what i'm looking at here | 21:19 |
fenn | it's a list of services they've purchased access to? | 21:20 |
kanzure | yes | 21:20 |
kanzure | but if you click, you have access | 21:20 |
kanzure | since you're logged in | 21:20 |
kanzure | normally these services authenticate you by ip address | 21:20 |
kanzure | ezproxy is inside the college's network | 21:20 |
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has quit [Ping timeout: 260 seconds] | 21:21 | |
fenn | i assume at a more science-oriented university the list would be more useful? | 21:21 |
kanzure | heh yes this is a bad example | 21:21 |
fenn | i just realized what "WorldBook" is | 21:22 |
kanzure | microsoft? | 21:22 |
fenn | it's the old second-rate paper encyclopedia, for people who couldn't afford britannica | 21:22 |
kanzure | i thought that was ms encarta | 21:23 |
fenn | oo oxford english dictionary | 21:23 |
fenn | so ezproxy just forwards stuff from the uni's IP to the external internet | 21:24 |
kanzure | yes | 21:24 |
kanzure | sometimes these services have usernames/passwords instead of ip authentication | 21:24 |
kanzure | and ezproxy handles that configuration/setup too (apparently) | 21:25 |
kanzure | in the uk i think they have some federal system for paper access called athens? i don't know much about it | 21:26 |
kanzure | "Athens is an access management system which controls access to many of the Library's electronic information sources. When you login to an Athens protected resource it checks to see if you are a member of an institution that has paid to use that resource, and if your username and password are correct it lets you through." | 21:26 |
fenn | hmm want to scrap OED? they only have 275k entries http://www.oed.com.webserver.macu.edu:2048/viewdictionaryentry/Entry/274870 | 21:27 |
kanzure | http://en.wikipedia.org/wiki/Athens_access_and_identity_management | 21:27 |
kanzure | i'm sure someone has a pdf of oed | 21:27 |
kanzure | i guess a pdf is less useful | 21:27 |
kanzure | "The Athens service is a trust federation where Identity Providers, Service Providers and Athens operate under common rules and licenses. Trust is enforced by the use of public-key cryptography and other security mechanisms." | 21:28 |
kanzure | "Athens is used extensively within UK Higher and Further Education institutions, the UK National Health Service, and in more than 90 countries worldwide. It has been adopted by over 2,000 organisations, and over 300 online resources since it was first launched in 1996. Over 4.5 million accounts are now registered with Athens." | 21:28 |
kanzure | "Conceived in 1996 at the University of Bath," we should get adrian bowyer to fix that | 21:28 |
fenn | this is the last entry added http://www.oed.com.webserver.macu.edu:2048/viewdictionaryentry/Entry/277114 | 21:28 |
kanzure | it's by id | 21:28 |
kanzure | ? | 21:28 |
fenn | yep just fiddle with the number | 21:29 |
fenn | "once you're past the perimeter, there's no security! have a nice day" | 21:29 |
fenn | i do just fine with the 1907 dictionary, but someone might find it useful | 21:30 |
fenn | well i guess i should go socialize | 21:31 |
* fenn sighs | 21:31 | |
-!- SDr [~SDr@unaffiliated/sdr] has joined ##hplusroadmap | 21:41 | |
-!- yashgaroth [~f@cpe-24-94-5-223.san.res.rr.com] has joined ##hplusroadmap | 21:47 | |
kanzure | hi yashgaroth | 21:49 |
yashgaroth | hey hey | 21:49 |
_sol_ | DO you think DIY orgo chem on a legit side for open source experimentation is possible or that it'd break the bank to make some DIY fume hood and accessory safety in the lab to make sure ya don't have vapors floating wiht a spark setting things soff? | 21:50 |
_sol_ | er off | 21:50 |
_sol_ | I was thinking about that th eother day reading some DIY sites, and wondering what legit open source means for DIY chemistry since ppl are trying to spring up microbio labs and such now | 21:51 |
yashgaroth | depends where you're trying to install it, and how legal you want to be about EPA regulations | 21:53 |
_sol_ | Of coures, if ya own a glass beaker these days some countries enforcement agencies put books and a single glass beaker together with all the regs and assume the worst... | 21:53 |
kanzure | what does any of that have to do with breaking the bank | 21:56 |
_sol_ | I guess I'm wondering if ya could try to start a DIY fume hood project among other things or is it all regulated as to what is needed for chem lab? | 21:57 |
_sol_ | I mean the cost for breaking the bank... | 21:58 |
_sol_ | the cost may be to much to make something safe for small DIY stuff | 21:58 |
_sol_ | and try to be pretty safe although not sure and in whose eyes | 21:58 |
yashgaroth | how interesting are the chemicals you plan to use | 21:58 |
_sol_ | don't know yet | 21:58 |
_sol_ | I'm just wondering if there are projects out there already... | 21:59 |
_sol_ | but I think some solvents in basic orgo chem experiments are still pretty volatile with sparks if I recall | 21:59 |
_sol_ | if ya are doing a distillation process to seperate a heavier weight molecule from lighter weight via heating and using a water cooling over the glass to cool the vapors... | 22:00 |
_sol_ | I'm just looking at how-tos | 22:00 |
kanzure | yes you can make a fume hood if you want to? | 22:01 |
_sol_ | but if ya don't insulate the fan right? couldn't an electrical spark set stuff off? | 22:01 |
_sol_ | I guess I'm just overthinking... | 22:01 |
_sol_ | I have a chemist friend so I could ask him, but I'm wondering how big chemistry is in the DIY... | 22:03 |
_sol_ | DIY community... etc which I tink this room sorta is , but its more open electronics and software maybe right now | 22:04 |
_sol_ | biohacking somewhat I guess | 22:04 |
kanzure | you're welcome to bring your chemistry friends in here | 22:04 |
_sol_ | I'll see if he is around later.. | 22:04 |
kanzure | drazak_: did you ever finish your distillation setup? | 22:04 |
-!- augur [~augur@208.58.5.87] has joined ##hplusroadmap | 22:05 | |
fenn | a fume hood is dead simple: a box where you do your work, a trapezoidal reducing flange, a fan, and a chimney | 22:10 |
fenn | unless you're venting extremely toxic fumes (in which case i question your methodology) dilution with lots of air will render it harmless | 22:10 |
fenn | as for explosion prevention, make sure you don't exceed the minimum concentration needed to explode stuffs | 22:12 |
fenn | for propane this is as little as 5% | 22:12 |
fenn | on the other hand, 1 mol of gas is only 22 liters so if you're going to evaporate a mol of whatever you need to add at least 1 m^3 of air to it to render it explosion-proof | 22:13 |
fenn | a bigger concern is crap building up on the chimney which really should be done with regular close-up visual inspection | 22:14 |
-!- SDr [~SDr@unaffiliated/sdr] has quit [] | 22:14 | |
kanzure | oh nice cysteine has Oligonucleotide Synthesis - Methods and Applications [Methods in Molec Bio 288] - P. Herdewijn (Humana, 2005) WW.pdf | 22:14 |
fenn | cysteine has papers on it already? | 22:15 |
fenn | or is that a book | 22:15 |
kanzure | it's a book | 22:15 |
kanzure | check /torrents/text/protocols/ | 22:15 |
kanzure | and /torrents/text/books/textbooks/Biology_And_Medicine/ | 22:15 |
kanzure | text/books/textbooks/.. damn the world sucks | 22:16 |
fenn | hey, four hour work week, was looking for that | 22:16 |
fenn | gah 96MB | 22:20 |
kanzure | fenn: so i've been using phantomjs a lot lately | 22:23 |
kanzure | and i keep looking at http://www.gnu.org/software/pythonwebkit/ | 22:23 |
kanzure | which is very ranty.. but accessing the dom from python seems much better than from javascript | 22:23 |
kanzure | if you'll notice, it's a giant rant by luke kenneth casson leighton | 22:24 |
kanzure | who you might remember from openscad | 22:24 |
kanzure | luke keeps saying that pythonwebkit is pyjamas | 22:27 |
fenn | tldr what? | 22:29 |
kanzure | web scraping with webkit bindings | 22:30 |
fenn | don't you need to run js to access links that get created at runtime? | 22:30 |
kanzure | yes that's what webkit does | 22:30 |
fenn | i thought that was the whole point of phantomjs | 22:30 |
-!- Mokbortolan_1 [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has quit [Read error: Connection reset by peer] | 22:30 | |
kanzure | correct. pythonwebkit seems to be phantomjs except piloted by python instead of js | 22:30 |
kanzure | except not marketed like phantomjs | 22:30 |
fenn | ok | 22:30 |
fenn | well, that's nice | 22:31 |
kanzure | theoretically this should be more pleasant | 22:31 |
fenn | i had intended to learn js anyway | 22:31 |
fenn | lots of stuff can be scraped without js though | 22:32 |
fenn | just looking at page structure with yer eyeballs | 22:32 |
kanzure | i'm tired of beautifulsoup, lxml, mechanize and nokogiri | 22:32 |
kanzure | sometimes the html is poorly formatted and sometimes there's data written in the js headers | 22:32 |
fenn | i suppose it depends on how structured the data is you're trying to scrape in the first place | 22:32 |
kanzure | and then these parsers break and crap.. if any parser isn't gonnab reak, it's going to be a web browser | 22:33 |
fenn | really the html parser doesn't work? i thought the whole point of BS was that it didnt break on bad html | 22:33 |
kanzure | no i thought that was lxml | 22:33 |
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has joined ##hplusroadmap | 22:33 | |
fenn | The BeautifulSoup class turns arbitrarily bad HTML into a tree-like nested | 22:34 |
fenn | etc | 22:34 |
kanzure | hmmm okay | 22:34 |
-!- Mokbortolan_ [~Nate@c-71-59-241-82.hsd1.or.comcast.net] has joined ##hplusroadmap | 22:35 | |
kanzure | fwiw beautifulsoup is what i use anyway | 22:35 |
fenn | ug flattr keeps 10% for itself, why do people put up with this | 22:36 |
kanzure | because it has a slim chance of being better than paypal | 22:37 |
fenn | bah | 22:38 |
fenn | wepay is actually better than paypal | 22:38 |
fenn | for now at least | 22:38 |
fenn | problem is paypal tries to be all "don't worry, we'll refund you someone else's money if there's any problems" | 22:39 |
kanzure | git clone http://git.savannah.gnu.org/cgit/pythonwebkit.git | 22:40 |
* kanzure gulps | 22:40 | |
drazak_ | kanzure: at home? nah, never ended up buying anything | 22:45 |
drazak_ | kanzure: too expensive | 22:45 |
kanzure | too expensive! bah | 22:47 |
fenn | distillation is easy | 22:53 |
fenn | i'm surprised you can't just buy a still from walmart | 22:54 |
fenn | such things exist but only for water | 22:54 |
kanzure | wow what in the name of holy fuck | 22:58 |
kanzure | http://www.scholartime.com/index.php/journal-hosting | 22:58 |
kanzure | hosted ojs instances (openjournalsystem) | 22:59 |
kanzure | $600/year? | 22:59 |
kanzure | that should be more like $10 or $20/year | 22:59 |
kanzure | i guess that's $50/year.. but still | 22:59 |
kanzure | oops $50/month | 23:00 |
fenn | it's overpriced hosting but it's somewhat specialized knowledge | 23:01 |
kanzure | there's more money to be made in vertical integration with the research | 23:01 |
fenn | eh? | 23:01 |
kanzure | like "yo dawg, we noticed you're doing a bill of materials for reagents in your project.. we can hook you up aww yeah" | 23:01 |
fenn | "yo dawg i herd u liek bill o materials so i put a bill o materials in yo bill o materials yo" | 23:02 |
kanzure | well each research paper is the result of some $200k grant | 23:02 |
kanzure | reagents maybe costing some % of that.. which i guess is what um, that lab management webapp thing was trying to tap | 23:03 |
Mokbortolan_ | Yo dawg, I heard you liked BoMs, so I put a bomb in your BoM so you can bomb while you BoM. | 23:03 |
fenn | i guess the idea there is they've already forked over cash so they don't feel so bad forking over more cash to the same entitty? | 23:04 |
kanzure | also! there's all sorts of weird stateful information during research that could be served by a platform | 23:04 |
kanzure | instead of keeping random spreadsheets on random computers about which petri dish is currently in which state | 23:04 |
fenn | dude this is just a web host with some custom software installed | 23:05 |
kanzure | yeah i know, there's no reason for this to cost so much | 23:05 |
fenn | they're not doing anything in the research phase at all | 23:05 |
fenn | it's merely for preserving the results for posterity | 23:05 |
kanzure | well they seem convinced they are a part of research | 23:05 |
kanzure | not this site in particular though | 23:05 |
kanzure | ok whatever. all these fees are stupid. | 23:09 |
Steel_ | kanzure: worth starting a business for it? | 23:09 |
Steel_ | undercut 'em? | 23:09 |
kanzure | no. there's only 20000 journals or something | 23:09 |
fenn | ah but think how many MORE there could be! | 23:10 |
kanzure | 20000 * $5/year = haha.. yeah | 23:10 |
kanzure | journals are a dumb structure anyway | 23:10 |
fenn | a journal for every lab! | 23:10 |
kanzure | isn't it just supposed to be an aggregator | 23:10 |
fenn | a journal for every paper! | 23:10 |
kanzure | yes! | 23:10 |
kanzure | wait. do you mean tags? | 23:10 |
Steel_ | kanzure: If you have thoughts on a better one, I'd certainly love to see them written up somewhere so I can incorporate those ideas | 23:10 |
Steel_ | tags are on my list | 23:10 |
fenn | how about scientists publish their own fucking papers | 23:10 |
kanzure | i don't trust people to maintain active web servers | 23:11 |
fenn | neither do i | 23:11 |
fenn | but at least someone would be able to aggregate them | 23:11 |
fenn | unlike now where we're all "oh noes the paywall is falling" | 23:11 |
kanzure | did those jerks update the arxiv torrents or is there still an anti-get-all-our-data thing going on there? | 23:11 |
kanzure | err right now if the paywalls fall there's nothing saved from inside. | 23:11 |
fenn | okay why is pythonwebkit.git > 1GB? | 23:12 |
kanzure | i have no fucking clue | 23:12 |
kanzure | it's still cloning | 23:12 |
kanzure | i have a clone on gnusha in /home/bryan/local/pythonwebkit/ if you want it. | 23:13 |
kanzure | i'm currently checking out python_codegen (the branch) | 23:13 |
fenn | can you make a copy of this without the bloat? | 23:13 |
kanzure | um i don't know what the issue is yet | 23:13 |
kanzure | i think it's a complete copy of webkit | 23:13 |
kanzure | i'll check what the working directory size is. | 23:13 |
fenn | still, shouldn't be that big | 23:13 |
kanzure | 2.9 GB? | 23:14 |
-!- lkcl [~lkcl@host86-131-171-208.range86-131.btcentralplus.com] has joined ##hplusroadmap | 23:14 | |
fenn | seems to be mostly ./LayoutTests and ./WebCore | 23:14 |
lkcl | morning folks | 23:15 |
kanzure | can we delete those | 23:15 |
fenn | yes | 23:15 |
fenn | actually wait | 23:15 |
fenn | dunno about webcore | 23:15 |
lkcl | i'm told that there are people trying to compile pythonwebkit around here | 23:15 |
-!- uniqanomaly_ [~ua@dynamic-78-8-80-128.ssp.dialog.net.pl] has joined ##hplusroadmap | 23:15 | |
fenn | anyway it's in .git as well | 23:15 |
lkcl | i'm the lead developer | 23:15 |
fenn | hi lkcl | 23:15 |
fenn | why is your repo 2.9GB? | 23:15 |
lkcl | so... what do you need? | 23:15 |
lkcl | because that's what the size of webkit git is | 23:16 |
kanzure | git-annex? | 23:16 |
lkcl | git://git.webkit.org/WebKit.git | 23:16 |
fenn | i'm interested in pythonwebkit primarily for scraping | 23:16 |
lkcl | ahhh | 23:16 |
lkcl | headless? | 23:16 |
kanzure | yeah i might have misunderstood you | 23:16 |
kanzure | headless would be great | 23:16 |
kanzure | maybe pythonwebkit isn't actually required? | 23:17 |
lkcl | ok, there's a couple of ways to do that | 23:17 |
lkcl | actually three. | 23:17 |
lkcl | what level of HTML compatibility do you need? | 23:17 |
kanzure | what are my options | 23:17 |
lkcl | just some notes from another conversation i'm going to cut/paste ok? | 23:17 |
lkcl | http://git.savannah.gnu.org/cgit/pythonwebkit.git/tree/pywebkitgtk/gtk/webkitgtkmodule.c?h=python_codegen | 23:17 |
lkcl | comment out line 82. | 23:17 |
lkcl | that will give you "headless" mode in pythonwebkit. ok, _should_ do :) | 23:18 |
fenn | heh | 23:18 |
lkcl | 1) KDE's KHTMLPart (HTML DOM TR2 compatible) | 23:18 |
lkcl | 2) pythonwebkit gtk mode, hacked to remove that line 82 | 23:18 |
lkcl | 3) pythonwebkit "DirectFB" mode, hacked to remove the equivalent line - you want the python_codegen-directfb-2011-10-18 branch | 23:19 |
-!- uniqanomaly [~ua@dynamic-78-8-80-186.ssp.dialog.net.pl] has quit [Ping timeout: 272 seconds] | 23:19 | |
lkcl | 4) python-hulahop with xulrunner 9.0 - source code is here: http://lkcl.net/hulahop | 23:19 |
lkcl | then get the tutorial i wrote, here: | 23:20 |
lkcl | http://pyxpcomext.mozdev.org/no_wrap/tutorials/hulahop/xpcom-hulahop.html | 23:20 |
kanzure | what i'd like is the same level of html compatibility as phantomjs, which seems to just be headless vanilla webkit | 23:20 |
lkcl | and hack that to simply remove the pygtk2 equivalent of the window stuff | 23:20 |
lkcl | well it depends on whether you want Firefox headless HTML5 compatibility or Safari/Webkit/Android headless HTML5 compatibility | 23:21 |
lkcl | python-hulahop will get you Firefox | 23:21 |
lkcl | pythonwebkit will get you Webkit/Android/Safari | 23:21 |
lkcl | and KHTMLPart will get you.... mmmm.... compatibility with the internet circa 1998 :) | 23:21 |
fenn | i think we are misunderstanding something | 23:22 |
lkcl | you have to compile KDE with c++ runtime type checking enabled | 23:22 |
fenn | the idea is to run javascript and get text data out of the page, not to render anything | 23:22 |
lkcl | well, then you'll need to create a "port" of webkit which does no rendering. | 23:22 |
lkcl | 1sec... | 23:22 |
lkcl | let me look up what phantomjs is.... | 23:23 |
lkcl | oooh hoo hoo! | 23:23 |
lkcl | verrry coool. | 23:23 |
kanzure | heh. except i want this in fucking python | 23:23 |
lkcl | so they created a port that... oh shit, you want _what_??? :) | 23:23 |
lkcl | ooo hoo hoo, you're gonna have a lot of fun then. | 23:24 |
lkcl | ok, you have a couple of options | 23:24 |
lkcl | 1) work out the patches that i did to add python bindings and re-apply them to phantomjs | 23:25 |
lkcl | the pythonwebkit stuff *is* entirely with the exception of about .... 100 lines of code *entirely* screen-independent | 23:25 |
lkcl | 2) work out the phantomjs patches and reapply *those* to pythonwebkit | 23:26 |
kanzure | i don't think phantomjs patches webkit | 23:26 |
lkcl | 3) freak out at option 1 and 2, and give up and just run pythonwebkit *without* .... it doesn't?? | 23:26 |
kanzure | it just includes some stuff? | 23:26 |
lkcl | 1sec.... | 23:26 |
kanzure | https://github.com/ariya/phantomjs/blob/master/src/webpage.cpp | 23:27 |
lkcl | is it based on Webkit2? | 23:27 |
kanzure | don't know | 23:27 |
kanzure | it looks like it's not qtwebkit 2.2 | 23:27 |
lkcl | what der f**??? | 23:28 |
kanzure | yeah it looks like it's webkit1 | 23:28 |
kanzure | http://groups.google.com/group/phantomjs/browse_thread/thread/e8ffec54c440b0a1 | 23:28 |
kanzure | "However, the fact that WebKit1 API is considered "obsolete" means that at some point, we will not be able to use the latest and greatest WebKit features anymore." | 23:28 |
lkcl | where's the makefile showing the #includes | 23:28 |
kanzure | they use qmake | 23:28 |
lkcl | ok. | 23:29 |
lkcl | right. | 23:29 |
kanzure | to be fair i don't know how qmake works :) | 23:29 |
lkcl | if they're using QtWebKit then all they are doing is exactly as i described above... except not calling the Qt version of "show window" | 23:29 |
lkcl | so they *are* still "rendering".... just not rendering *on-screen*. | 23:29 |
kanzure | correct | 23:30 |
lkcl | or, more specifically, the code _to_ render is there, but it's just not called. | 23:30 |
kanzure | so they didn't patch webkit? | 23:30 |
lkcl | the above commenting-out that i described is *exactly* the same trick. line 82 removes the gtk "show all windows" | 23:30 |
lkcl | that's correct - they didn't. | 23:30 |
lkcl | all they're doing is firing up a qtwebkit instance and then not showing it on-screen. | 23:30 |
lkcl | the same trick is pulled in one of the webkitgtk test applications, i forget its name. | 23:31 |
kanzure | in your python bindings is the dom-touching-python (in a WebPage i think it's called) sandboxed from the other code? | 23:32 |
lkcl | anyway - in that tutorial: | 23:32 |
lkcl | http://pyxpcomext.mozdev.org/no_wrap/tutorials/hulahop/xpcom-hulahop.html | 23:32 |
lkcl | just remove the "gtk.show" | 23:32 |
lkcl | and you'll achieve exactly the same thing | 23:32 |
lkcl | i have no idea what you mean by "sandboxed". | 23:32 |
kanzure | in phantomjs you create a page object and can call page.evaluate(anonymous js function) | 23:33 |
kanzure | but the contents of the function can't access anything outside of the page's 'context' | 23:33 |
kanzure | page.open('http://www.google.com/', function(status) { console.log(document.location); }); | 23:34 |
lkcl | yeah - ok, i didn't add javascript evaluation functions because webkitgtk doesn't have a means to convert the return results into meaningful information | 23:34 |
kanzure | sure | 23:34 |
kanzure | but you did seem to have python examples of accessing the dom | 23:34 |
lkcl | the webkitqt team did translation of results into qt object types | 23:34 |
lkcl | yes | 23:34 |
lkcl | it's done *entirely* through python. | 23:34 |
kanzure | right | 23:34 |
lkcl | there is absolutely *no* javascript involved, *whatsoever*. | 23:34 |
fenn | but js in the page can change the DOM in important ways | 23:34 |
lkcl | yes it can. | 23:34 |
kanzure | erm, my point is, the javascript is "sandboxed" in phantomjs.. like you're not ever touching the DOM from your main js | 23:35 |
lkcl | and python can change the DOM in exactly the same "important" ways... in a declarative fashion [from quotes outside quotes] | 23:35 |
kanzure | is it the same way in pythonwebkit? | 23:35 |
lkcl | no, because you cannot activate the running of any javascript *at all* from webkitgtk, period. | 23:35 |
lkcl | ok that's not quite true, but.... | 23:36 |
kanzure | i'm not talking about javascript :P hrmm | 23:36 |
kanzure | let's look at http://pyxpcomext.mozdev.org/no_wrap/tutorials/hulahop/xpcom-hulahop.html | 23:36 |
kanzure | under _loaded | 23:36 |
lkcl | yep sure | 23:36 |
kanzure | is that code normal python? | 23:36 |
kanzure | can it access globals or whatever | 23:36 |
lkcl | yes it is entirely normal python. | 23:36 |
lkcl | yes it can | 23:36 |
kanzure | ok. in phantomjs the answer is no ;) | 23:36 |
lkcl | because it's pure python. | 23:37 |
lkcl | right - ok, i see what you mean | 23:37 |
kanzure | alright cool | 23:37 |
kanzure | that's great | 23:37 |
-!- Jaakko96 [~Jaakko@host86-131-178-213.range86-131.btcentralplus.com] has quit [Quit: Nettalk6 - www.ntalk.de] | 23:37 | |
lkcl | it's a one-way street. | 23:37 |
kanzure | what? in pythonwebkit it is? | 23:37 |
lkcl | you can do tricks such as add a script node to the DOM however :) | 23:38 |
kanzure | phantomjs is a little weird because you can't actually access the DOM except from inside the page's javascript context | 23:39 |
kanzure | so you can only hope-and-pray by passing giant hashes/json back and forth between page.evaluate() calls | 23:40 |
kanzure | whatever.. | 23:40 |
lkcl | yes. it's a bitch. | 23:40 |
lkcl | someone actually did a port of pyjamas-desktop using a similar trick, to webkitqt4 | 23:41 |
lkcl | it got a looong way before being declared a complete failure | 23:41 |
lkcl | execution of javascript code-snippets for *everything*. | 23:41 |
lkcl | truly truly dreadful :) | 23:41 |
kanzure | so! can you convince me to use pythonwebkit-gtk over hulahop? | 23:41 |
lkcl | nope - that's up to you. | 23:41 |
kanzure | bah | 23:41 |
kanzure | some code evangelist you are. | 23:42 |
lkcl | it depends on what you want / need | 23:42 |
kanzure | hrm | 23:42 |
lkcl | it makes no odds to me :) | 23:42 |
kanzure | well, depending on gecko seems a little weird | 23:42 |
lkcl | pyjamas-desktop works on both... *and* on MSHTML under w32! | 23:42 |
lkcl | so if you were a windows fiend you'd even be able to do the same trick there! | 23:42 |
lkcl | the code you'd be looking for is pyjd/mshtml.py | 23:42 |
lkcl | and, once again, you just don't show the w32 GUI window | 23:43 |
* lkcl shrugs | 23:43 | |
kanzure | hrmm i'm going to try out hulahop then. | 23:43 |
kanzure | since it probably doesn't have a 2.9 GB git repo | 23:43 |
lkcl | bottom line is: you can actually do *all three* major browser engines if you really wanted to | 23:43 |
lkcl | ha ha | 23:43 |
lkcl | you got debian? | 23:43 |
kanzure | yes | 23:43 |
lkcl | ok, don't use debian/unstable, use debian/testing | 23:44 |
lkcl | and grab xulrunner-9.0-dev | 23:44 |
kanzure | i think i'm on wheezy :/ | 23:44 |
kanzure | i'll take a look | 23:44 |
lkcl | do "apt-get build-dep python-hulahop" | 23:44 |
lkcl | etc. etc. | 23:44 |
lkcl | but then grab the source code from here: | 23:44 |
lkcl | http://lkcl.net/hulahop | 23:44 |
lkcl | don't for god's sake use xulrunner 10 | 23:44 |
lkcl | https://bugzilla.mozilla.org/show_bug.cgi?id=728500 | 23:45 |
kanzure | wait you also wrote hulahop? | 23:45 |
lkcl | https://bugzilla.mozilla.org/show_bug.cgi?id=728645 | 23:45 |
lkcl | http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=660178 | 23:45 |
lkcl | hell no. | 23:45 |
kanzure | ok just a mirror :3 | 23:45 |
lkcl | no it was a quick-hacked fix to get it to work | 23:45 |
lkcl | the olpc-sugar team gave up on hulahop 6 months ago... oh dearie me are they in for a shock | 23:46 |
kanzure | these are not the words that indicate to me that any of this is stable | 23:46 |
kanzure | heh | 23:46 |
kanzure | okay let me see if i can get a backport of xulrunner-9.0-dev | 23:46 |
lkcl | i recommend you just add debian/testing and don't worry about it. use apt-pin priorities | 23:47 |
kanzure | jrayhawk: how do i use that | 23:47 |
lkcl | or just use "apt-get -t testing install xulrunner-9.0-dev" etc. etc. | 23:47 |
lkcl | which will be rather long-winded | 23:47 |
kanzure | E: Unable to locate package xulrunner-9.0-dev | 23:48 |
lkcl | backporting of xulrunner will take several hours | 23:48 |
lkcl | 1sec... | 23:48 |
kanzure | well i guess i should update my sources | 23:48 |
lkcl | ii xulrunner-9.0 9.0.1-1 XUL + XPCOM application runner | 23:48 |
lkcl | ftp://ftp.uk.debian.org/debian/pool/main/p/pyxpcom/ | 23:48 |
lkcl | ftp://ftp.uk.debian.org/debian/pool/main/i/iceweasel/ | 23:49 |
kanzure | yeah there's still no xulrunner-9.0-dev package being found? | 23:49 |
lkcl | deb http://ftp.uk.debian.org/debian/ testing main contrib non-free | 23:49 |
lkcl | deb-src http://ftp.uk.debian.org/debian/ testing main contrib non-free | 23:49 |
lkcl | ftp://ftp.uk.debian.org/debian/pool/main/i/iceweasel/xulrunner-dev_9.0.1-1_amd64.deb | 23:49 |
kanzure | ok maybe it's in testing contrib | 23:49 |
lkcl | ftp://ftp.uk.debian.org/debian/pool/main/i/iceweasel/xulrunner-9.0_9.0.1-1_amd64.deb | 23:49 |
lkcl | ftp://ftp.uk.debian.org/debian/pool/main/i/iceweasel/libmozjs-dev_9.0.1-1_amd64.deb | 23:50 |
kanzure | i only see xulrunner-9.0-dbg and xulrunner-9.0 | 23:50 |
lkcl | lkcl@teenymac:~/src/python-tv/WebKit$ apt-cache search xulrunner-9.0 | 23:50 |
lkcl | xulrunner-9.0 - XUL + XPCOM application runner | 23:50 |
lkcl | hmmm.... i also have this: deb http://ftp.uk.debian.org/debian/ experimental non-free | 23:51 |
lkcl | but that's experimental *non-free*. hmm... | 23:52 |
lkcl | yep - don't know. up to you to sort out :) | 23:52 |
kanzure | pyxpcom is supposed to be in testing? | 23:52 |
lkcl | pyxpcom has been around for a while, but it was formerly part of xulrunner's source package | 23:52 |
fenn | maybe i'm a bit slow, but http://packages.debian.org/wheezy/xulrunner-dev | 23:52 |
lkcl | it's now independent | 23:52 |
rdb | Does anyone know how much the magnetic stirrers in ordinary lab heaters affect magnetic implants? | 23:53 |
lkcl | no, you _definitely_ don't want the older version | 23:53 |
lkcl | ok there's another way: if you can get debian/lenny such that you end up with xulrunner 1.9.1 | 23:53 |
lkcl | or debian/blahblah with xulrunner 1.9.1 | 23:53 |
fenn | rdb: quite a lot i'd imagine | 23:53 |
lkcl | you will *not* need to do any kind of source code compiling. | 23:53 |
rdb | It sounds quite painful. | 23:53 |
fenn | yep | 23:54 |
fenn | after a while tissue grows around the magnet and immobilizes it a bit | 23:54 |
rdb | Does it hurt when the magnet gets a tug and flips around? | 23:54 |
lkcl | the last "stable" version of python-hulahop which installs out-of-the-box was 18+ months ago, and it used xulrunner-1.9 | 23:54 |
kanzure | lkcl: xulrunner 1.9.1 sounds a bit old? | 23:54 |
Steel_ | rdb: the magnet shouldn't | 23:54 |
kanzure | hrm | 23:55 |
Steel_ | one of the things I'm planning on running later this year hopefully are some FE simulations of magnet implants in flesh | 23:55 |
lkcl | kanzure: it does the job. i'm still using firefox 3.5 and that uses xulrunner 1.9.2 | 23:55 |
lkcl | i get absolutely no problems with it, other than f****g stupid google advertising f****g chrome at me nyah nyah youuu're usiiing an ooold version of firefox that weeeee can't be bothered to suppport | 23:56 |
kanzure | the python-hulahop package is also grabbing xulrunner-9.0 | 23:56 |
lkcl | whine, whine | 23:56 |
lkcl | yep there you go. | 23:56 |
kanzure | is that bad | 23:56 |
lkcl | no you need that - that's the runtime. | 23:56 |
kanzure | h it's python-xpcom | 23:56 |
kanzure | *ah it's | 23:56 |
lkcl | but if you want to recompile for yourself you _will_ need xulrunner-9.0-dev | 23:56 |
lkcl | obviously. | 23:57 |
lkcl | that you can get with "apt-get build-dep python-hulahop". | 23:57 |
kanzure | what do i need besides xulrunner, python-hulahop, python-xpcom? | 23:57 |
lkcl | once you've done that, then grab the source from the url i posted | 23:57 |
lkcl | nothing else. | 23:57 |
lkcl | oh... the source code from that tutorial, obviously. | 23:57 |
lkcl | then you do dpkg-buildpackage -rfakeroot -nc | 23:58 |
lkcl | cd into the hulahop directory obviously | 23:58 |
kanzure | you mean this? http://lkcl.net/hulahop/sugar-hulahop-0.8.1.success.tgz | 23:58 |
lkcl | first | 23:58 |
lkcl | yep that's it | 23:58 |
lkcl | so cd sugar-hulahop-0.8.1 | 23:58 |
lkcl | then do | 23:58 |
lkcl | dpkg-buildpackage -rfakeroot -nc | 23:58 |
lkcl | then install the resultant .deb which will be in the directory *below* | 23:59 |
lkcl | and you're done. | 23:59 |
kanzure | dpkg-checkbuilddeps: Unmet build dependencies: cdbs (>= 0.4.90~) python-all-dev dh-buildinfo xulrunner-dev (>= 1.9~rc2) python-gtk2-dev | 23:59 |
lkcl | you're on your way | 23:59 |
kanzure | oh xulrunner-dev exists | 23:59 |
kanzure | i see | 23:59 |
lkcl | i _did_ say do "apt-get build-dep python-hulahop" :) | 23:59 |
--- Log closed Sun Feb 19 00:00:12 2012 |
Generated by irclog2html.py 2.15.0.dev0 by Marius Gedminas - find it at mg.pov.lt!