--- Log opened Sat Jun 25 00:00:07 2011 | ||
kanzure | hi fernan | 00:29 |
---|---|---|
fenn | 3d photolithography how-to http://mrsec.wisc.edu/Edetc/nanolab/3D_print/index.html#Procedure | 01:10 |
fenn | same thing https://nano-cemms.illinois.edu/materials/3d_printing_full | 01:12 |
-!- JayDugger [~duggerj@pool-173-74-79-43.dllstx.fios.verizon.net] has joined ##hplusroadmap | 01:12 | |
-!- JayDugger [~duggerj@pool-173-74-79-43.dllstx.fios.verizon.net] has left ##hplusroadmap ["Leaving."] | 01:23 | |
-!- alystair [alystair@24-246-14-18.cable.teksavvy.com] has quit [Ping timeout: 260 seconds] | 02:13 | |
kanzure | fenn: what did ##opengl say? | 02:32 |
-!- PixelScum [~PixelScum@ip98-177-175-88.ph.ph.cox.net] has quit [Read error: Connection reset by peer] | 02:41 | |
-!- PixelScum [~PixelScum@ip98-177-175-88.ph.ph.cox.net] has joined ##hplusroadmap | 02:42 | |
fenn | i didnt ask | 02:42 |
fenn | oh you mean the bug report | 02:43 |
fenn | no response | 02:45 |
-!- augur [~augur@208.58.6.161] has quit [Remote host closed the connection] | 03:40 | |
-!- streety [streety@li139-74.members.linode.com] has quit [Remote host closed the connection] | 03:40 | |
-!- streety [streety@li139-74.members.linode.com] has joined ##hplusroadmap | 03:41 | |
-!- augur [~augur@129.2.129.34] has joined ##hplusroadmap | 04:04 | |
-!- foucist [~foucist@ps14150.dreamhost.com] has joined ##hplusroadmap | 04:21 | |
-!- klafka [~textual@cpe-69-205-70-55.rochester.res.rr.com] has joined ##hplusroadmap | 05:48 | |
-!- fernan [~pseudo@118.101.154.183] has quit [Ping timeout: 260 seconds] | 05:49 | |
-!- Guest89588 [~Jaakko@host86-131-177-233.range86-131.btcentralplus.com] has joined ##hplusroadmap | 05:51 | |
-!- klafka [~textual@cpe-69-205-70-55.rochester.res.rr.com] has quit [Quit: Computer has gone to sleep.] | 06:27 | |
-!- BaldimerBrandybo [~PixelScum@ip98-177-175-88.ph.ph.cox.net] has joined ##hplusroadmap | 07:08 | |
-!- PixelScum [~PixelScum@ip98-177-175-88.ph.ph.cox.net] has quit [Ping timeout: 240 seconds] | 07:10 | |
-!- Guest89588 [~Jaakko@host86-131-177-233.range86-131.btcentralplus.com] has quit [Quit: Nettalk6 - www.ntalk.de] | 07:51 | |
-!- Guest89588 [~Jaakko@host86-131-177-233.range86-131.btcentralplus.com] has joined ##hplusroadmap | 07:54 | |
-!- Guest89588 [~Jaakko@host86-131-177-233.range86-131.btcentralplus.com] has quit [Client Quit] | 07:55 | |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has quit [Read error: Connection reset by peer] | 08:20 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has joined ##hplusroadmap | 08:21 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has quit [Changing host] | 08:21 | |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has joined ##hplusroadmap | 08:21 | |
-!- lumos [~lumos@afdy30.neoplus.adsl.tpnet.pl] has joined ##hplusroadmap | 08:31 | |
-!- lumos [~lumos@afdy30.neoplus.adsl.tpnet.pl] has left ##hplusroadmap [] | 08:31 | |
kanzure | no i meant ##opengl | 08:36 |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has quit [Read error: Connection reset by peer] | 08:39 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has joined ##hplusroadmap | 08:39 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has quit [Changing host] | 08:39 | |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has joined ##hplusroadmap | 08:39 | |
-!- AJolly [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has joined ##hplusroadmap | 08:52 | |
-!- AJolly [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has quit [Changing host] | 08:52 | |
-!- AJolly [~quassel@unaffiliated/ajollylife] has joined ##hplusroadmap | 08:52 | |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has quit [Ping timeout: 258 seconds] | 08:53 | |
-!- lumos [~lumos@afdy30.neoplus.adsl.tpnet.pl] has joined ##hplusroadmap | 09:31 | |
lumos | hey what u think of this colour scheme, is it good or is it whack http://s2.postimage.org/suunsdvjt/streem.jpg | 09:31 |
kanzure | who are you | 09:38 |
lumos | kanzure, its me lumos | 09:39 |
lumos | kanzure, chanOP | 09:39 |
lumos | kanzure, make me chanop 2day plz | 09:39 |
-!- AJolly [~quassel@unaffiliated/ajollylife] has quit [Read error: Connection reset by peer] | 09:40 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has joined ##hplusroadmap | 09:40 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has quit [Changing host] | 09:40 | |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has joined ##hplusroadmap | 09:40 | |
kanzure | AJollyLife: you should go to the diybio-boston meetup | 10:04 |
kanzure | can someone please bug me to write up how to remove watermarks from pdfs like from sciencedirect/iop before i forget | 10:50 |
kanzure | i use pdftk to remove pages from a pdf without converting the rest to pure-image documents | 10:51 |
kanzure | and then manually remove repeating watermark footers if those are present | 10:51 |
kanzure | i should write some code to find those repeating watermarks and remove sensitive metadata | 10:52 |
streety | kanzure: can you explain what you mean by "i use pdftk to remove pages from a pdf without converting the rest to pure-image documents" | 11:16 |
streety | I ask because I've spent some time today playing around with pdfminer extracting text from pdfs | 11:17 |
kanzure | pdftk input.pdf cat $pagestart-$pagestop output.pdf | 11:18 |
kanzure | that's all i've been using pdftk for so far | 11:18 |
kanzure | i just learned about it a few weeks ago but i dunno why i haven't seen it before | 11:18 |
streety | okay, I think I assumed it was more complex due to your mention of images | 11:19 |
kanzure | well i used imagemagick in the past (via 'convert') to dump pdf to images and then move signatures by coordinates or otherwise blank shit out | 11:21 |
streety | fair enough, makes sense with context | 11:21 |
streety | actually you may be interested in what I've been up to with pdfs. I set wget lose on diyhpl.us/~bryan/papers2 (excluding archives) a couple of months ago expecting to get a couple hundred Mbs but ended up with 4.5G before I realised how much there was. I wasn't sure what to do with it all but decided to try extracting text from the pdfs and then automatically tag and group the files. | 11:25 |
kanzure | a useful thing for you to do would be DOI number extraction from text-based pdfs as well as image-based pdfs | 11:33 |
kanzure | doi numbers can lead to additional metadata from the web by throwing the number through a resolver and then parsing metadata in META tags on journal sites | 11:33 |
kanzure | realistically i'm not sure how many papers in my collection are pure images and how many are text | 11:33 |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has quit [Read error: Connection reset by peer] | 11:34 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has joined ##hplusroadmap | 11:34 | |
-!- AJollyLife [~quassel@c-68-57-192-88.hsd1.il.comcast.net] has quit [Changing host] | 11:34 | |
-!- AJollyLife [~quassel@unaffiliated/ajollylife] has joined ##hplusroadmap | 11:34 | |
streety | Yeah matching the DOI will definitely be useful. I was considering extracting the pdf title by comparing the size of the text to the average for the doc but it's a fudge that probably won't work all that well | 11:36 |
streety | everything to do with pdfs is a bit of a fudge | 11:37 |
kanzure | the whole concept of papers is a fudge | 11:37 |
-!- mayko [~mayko@71-22-217-151.gar.clearwire-wmx.net] has joined ##hplusroadmap | 11:40 | |
kanzure | http://thisiscolossal.com/2011/06/markus-kayser-builds-a-solar-powered-3d-printer-that-prints-glass-from-sand-and-a-sun-powered-laser-cutter/ | 11:42 |
archels | solar powered? What a showoff. | 11:43 |
kanzure | solar powered photocopier | 11:44 |
streety | I've just taken a look at the distribution of page lengths for the pdfs I've extracted text from so far. Looks like perhaps 10% of the documents contain unusually little text | 11:45 |
kanzure | hey that's not bad | 11:46 |
streety | it's not a random sampling of the docs (I'm running through the directory with pythons os.walk) but I'm happy with that | 11:47 |
kanzure | i'm trying to find a paper on the server that has a "Downloaded by" or an IP address watermark | 11:47 |
kanzure | IEEE always embeds a $xyz amount in a footer somewhere | 11:48 |
kanzure | example: http://diyhpl.us/~bryan/papers2/neuro/implants/Data%20communication%20between%20brain%20implants%20and%20computer%20-%20short%20-%20IEEENeuralSystemsJune2003.pdf | 11:48 |
kanzure | i could remove that i guess but it's not particularly harmful | 11:48 |
streety | I assume it's more removing the name or university that downloaded a document which is more useful | 11:52 |
kanzure | ah here's one: | 11:53 |
kanzure | http://diyhpl.us/~bryan/papers2/Patterning%20design%20in%20color%20at%20the%20submicron%20scale.pdf | 11:53 |
kanzure | see left-hand side | 11:53 |
kanzure | some of the pdf obj streams seem to be zipped | 11:58 |
-!- uniqanomaly__ [~ua@dynamic-78-8-84-162.ssp.dialog.net.pl] has quit [Quit: uniqanomaly__] | 11:59 | |
streety | strangely text extraction has largely worked on that pdf but it doesn't include the Downloaded by reference | 12:01 |
kanzure | i found it by googling | 12:02 |
streety | it looks like google is doing better than I currently am then | 12:03 |
-!- augur [~augur@129.2.129.34] has quit [Remote host closed the connection] | 12:04 | |
streety | mendeley seems to cope just fine as well | 12:06 |
kanzure | -_- i just spent 10min trying to figure out why the pdf wouldn't change | 12:13 |
kanzure | editing the wrong file | 12:13 |
kanzure | soo anyway my first guess was right | 12:14 |
kanzure | using that same file, try this: | 12:16 |
kanzure | cat temp.pdf | grep -a "Length " | sort | uniq -c | sort -k2nr | 12:16 |
kanzure | as you can see, they repeat the watermark four times (once for each page) | 12:16 |
kanzure | in this case lines 63-67 inclusive are the watermark on the first page | 12:17 |
kanzure | iirc pypdf can handle FlateDecode? | 12:20 |
streety | I'm not using pypdf, I think that was the package then returned text but no spaces between words | 12:21 |
streety | I'm using pdfminer instead. It was a pain to get my head around how it worked but generally produces good output | 12:21 |
kanzure | is there a way to use zlib's inflate from stdin on bash? | 12:24 |
streety | time for me to take off, I'll let you know what I manage to create from all those papers | 12:29 |
kanzure | unfortunately i'm not sure what the contents of that objstream really means | 12:31 |
kanzure | cat objstream.dat | python -c'import sys;import zlib;data=sys.stdin.read();print zlib.decompress(data)' | 12:31 |
kanzure | also that's probably just the display of the text and doesn't actually remove the compressed text from the file | 12:37 |
kanzure | the objects with "Length 40" in this file are the pdf/display commands | 12:41 |
kanzure | the objects like on line 6, 13, 19 and 25 are the "Downloaded by" lines | 12:42 |
-!- augur [~augur@208.58.6.161] has joined ##hplusroadmap | 12:46 | |
-!- mayko [~mayko@71-22-217-151.gar.clearwire-wmx.net] has quit [Remote host closed the connection] | 12:47 | |
kanzure | "Producer: Acrobat Distiller Command 3.01 for Solaris 2.3 and later (SPARC)" | 12:57 |
kanzure | acs is running on solaris? | 12:57 |
-!- lumos [~lumos@afdy30.neoplus.adsl.tpnet.pl] has left ##hplusroadmap ["Leaving"] | 13:01 | |
-!- eudoxia [~eudoxia@r190-135-41-139.dialup.adsl.anteldata.net.uy] has joined ##hplusroadmap | 13:19 | |
-!- PixelScum [~PixelScum@ip98-177-175-88.ph.ph.cox.net] has joined ##hplusroadmap | 13:19 | |
-!- BaldimerBrandybo [~PixelScum@ip98-177-175-88.ph.ph.cox.net] has quit [Ping timeout: 258 seconds] | 13:22 | |
-!- eudoxia [~eudoxia@r190-135-41-139.dialup.adsl.anteldata.net.uy] has quit [Read error: Connection reset by peer] | 13:57 | |
-!- uniqanomaly [~ua@dynamic-78-8-84-162.ssp.dialog.net.pl] has joined ##hplusroadmap | 14:06 | |
-!- Guest89588 [~Jaakko@host86-131-177-233.range86-131.btcentralplus.com] has joined ##hplusroadmap | 14:38 | |
-!- eudoxia [~eudoxia@r190-135-106-229.dialup.adsl.anteldata.net.uy] has joined ##hplusroadmap | 14:58 | |
-!- uniqanomaly [~ua@dynamic-78-8-84-162.ssp.dialog.net.pl] has quit [Quit: uniqanomaly] | 15:03 | |
-!- Guest89588 [~Jaakko@host86-131-177-233.range86-131.btcentralplus.com] has quit [Quit: Nettalk6 - www.ntalk.de] | 15:08 | |
-!- augur [~augur@208.58.6.161] has quit [Read error: Connection reset by peer] | 16:30 | |
-!- eudoxia [~eudoxia@r190-135-106-229.dialup.adsl.anteldata.net.uy] has quit [Read error: Connection reset by peer] | 16:30 | |
-!- augur [~augur@208.58.6.161] has joined ##hplusroadmap | 16:31 | |
-!- eridu [~eridu@gateway/tor-sasl/eridu] has joined ##hplusroadmap | 17:03 | |
-!- nchaimov [~nchaimov@c-24-20-202-138.hsd1.or.comcast.net] has quit [Read error: Connection reset by peer] | 19:10 | |
-!- nchaimov [~nchaimov@c-24-20-202-138.hsd1.or.comcast.net] has joined ##hplusroadmap | 19:11 | |
kanzure | more graph visualization: http://ubietylab.net/ubigraph/ | 19:42 |
-!- eudoxia [~eudoxia@r190-135-86-128.dialup.adsl.anteldata.net.uy] has joined ##hplusroadmap | 19:56 | |
-!- eudoxia [~eudoxia@r190-135-86-128.dialup.adsl.anteldata.net.uy] has quit [Client Quit] | 20:00 | |
-!- eridu [~eridu@gateway/tor-sasl/eridu] has quit [Remote host closed the connection] | 20:06 | |
-!- eudoxia [~eudoxia@r190-135-86-128.dialup.adsl.anteldata.net.uy] has joined ##hplusroadmap | 20:36 | |
-!- eudoxia [~eudoxia@r190-135-86-128.dialup.adsl.anteldata.net.uy] has quit [Client Quit] | 20:38 | |
-!- eridu [~eridu@gateway/tor-sasl/eridu] has joined ##hplusroadmap | 20:44 | |
-!- eridu [~eridu@gateway/tor-sasl/eridu] has quit [Ping timeout: 250 seconds] | 22:20 | |
QuantumG | http://www.youtube.com/watch?v=S7lAlzMBzLQ | 23:35 |
QuantumG | pretty impressive | 23:35 |
--- Log closed Sun Jun 26 00:00:07 2011 |
Generated by irclog2html.py 2.15.0.dev0 by Marius Gedminas - find it at mg.pov.lt!