Home |
Search |
Today's Posts |
#1
![]() |
|||
|
|||
![]()
Maybe not the right place but seems there are several web experts here.
Can web spiders read and harvest e-mail addresses from a pdf file ? Many users and folks like QRZ.com are using jpegs not ascii for listing e-mails -- this seems to work. So for pdf files without going to a jpeg --- is ascii text addresses harvestable ? Thanks -- CL -- I doubt, therefore I might be ! |
#2
![]() |
|||
|
|||
![]()
Maybe not the right place but seems there are several web experts here.
Can web spiders read and harvest e-mail addresses from a pdf file ? It's certainly possible. I don't know to what extent it's actually done. In principle, though, any file which stores information as text can be searched (Google apparently searches PDFs and indexes their contents, just as if it was in HTML), and a spammer/harvester could do the same thing, though. I have my doubts as to whether spammers really care to go to that much work, though. -- Dave Platt AE6EO Hosting the Jade Warrior home page: http://www.radagast.org/jade-warrior I do _not_ wish to receive unsolicited commercial email, and I will boycott any company which has the gall to send me such ads! |
#3
![]() |
|||
|
|||
![]()
At my old place of work I use to quite regularly parse PDF and PS files
for text. Not so easy on encrypted files or those PDF's from scans! Would a spammer think its worthwhile? I mean they could also OCR the image files on a website. They would only need to look for an "@" sign and all would be revealed. I am sure they'll start doing this when the current harvest starts going downhill! Cheers Bob VK2YQA Caveat Lector wrote: Maybe not the right place but seems there are several web experts here. Can web spiders read and harvest e-mail addresses from a pdf file ? |
#4
![]() |
|||
|
|||
![]()
Security through obscurity is a poor answer, and even poorer security....
I wouldn't wait for the spammers to "get stupid", "run out of ideas", "end up in prison", etc... You are best actively taking part in providing your own protections... many freeware/shareware/commercial programs which will virtually guarantee you are spam free (one in every few thousand may leak through)--just gotta learn how to use them effectively--they let the spammers have your email if they want it... you will never see the spam if you just take the time to become computer savvy... one of the best is K9, is free (well, donation-ware), and guarantees no malware... if you know perl regex expressions it is DEADLY on spam (if not, get a perl person to write 'em for ya--ask in a perl newsgroup)... Warmest regards, John "Caveat Lector" wrote in message news:HxQje.1450$Xh.611@fed1read07... Maybe not the right place but seems there are several web experts here. Can web spiders read and harvest e-mail addresses from a pdf file ? Many users and folks like QRZ.com are using jpegs not ascii for listing e-mails -- this seems to work. So for pdf files without going to a jpeg --- is ascii text addresses harvestable ? Thanks -- CL -- I doubt, therefore I might be ! |
#5
![]() |
|||
|
|||
![]()
On Sat, 21 May 2005 17:40:07 -0700, "Caveat Lector"
wrote: Can web spiders read and harvest e-mail addresses from a pdf file ? Hi OM, Yup, my Robots sure could at a couple of MB/min. PDF is simply a proprietary markup language (as is Word). 73's Richard Clark, KB7QHC |
#6
![]() |
|||
|
|||
![]()
I have been using a java applet that encrypts my email address but allows
visitors to my website to email me. The applet I am using came from the Hivelogic website but seems to have been removed. A possible substitute (which I haven't tried myself) is at: http://leon.mvps.org/Encoder/ If you do a Google search using words like "email encryption" you will probable find links to a number of different applications. 73, Roger K6XQ |
#7
![]() |
|||
|
|||
![]()
you can protect text in PDF format. I do it all the time so people cant copy
my work. Now they have to retype it "Caveat Lector" wrote in message news:HxQje.1450$Xh.611@fed1read07... Maybe not the right place but seems there are several web experts here. Can web spiders read and harvest e-mail addresses from a pdf file ? Many users and folks like QRZ.com are using jpegs not ascii for listing e-mails -- this seems to work. So for pdf files without going to a jpeg --- is ascii text addresses harvestable ? Thanks -- CL -- I doubt, therefore I might be ! |
#8
![]() |
|||
|
|||
![]()
You are gravely mistaken if you think .pdf protects text at all... name a
small (10 pages or less) .pdf doc and I will a word doc of it in minutes... ..pdf is NOT a security format... Warmest regards, John "Mr. Man with the Master Plan" wrote in message ... you can protect text in PDF format. I do it all the time so people cant copy my work. Now they have to retype it "Caveat Lector" wrote in message news:HxQje.1450$Xh.611@fed1read07... Maybe not the right place but seems there are several web experts here. Can web spiders read and harvest e-mail addresses from a pdf file ? Many users and folks like QRZ.com are using jpegs not ascii for listing e-mails -- this seems to work. So for pdf files without going to a jpeg --- is ascii text addresses harvestable ? Thanks -- CL -- I doubt, therefore I might be ! |
#9
![]() |
|||
|
|||
![]()
I found the applet that used to be on the Hivelogic website:
http://automaticlabs.com/products/enkoder You can either download it and run it on your PC or you can run it online. Either way, the java code it produces looks like this: script type="text/javascript" //![CDATA[ function hiveware_enkoder(){var i,j,x,y,x= "x=\"783d2232517d783635363d5c223634323034366636363 7363337323666353664366432" + "3635363037336537343730363265373136643737323265353 6393763323234363533653432" + "3832333663323363363936363132333662303638323036373 2363836353536363732363364" + "3535323063323237343636643666323031363937333636633 7353665343666363432336137" + "3036643236313635323663363036313536663665323665363 0363535343036643637333631" + "3639323633366332363736313363633666326636363236313 3653136633232323265363933" + "62653635333033373435625c223b633232793d27323037273 b663436396f7228373436693d" + This is actually only about half of the code. I doubt a spider would find my address in there. You then paste this code into your HTML. 73, Roger K6XQ |
#10
![]() |
|||
|
|||
![]()
if you know how to use Adobe , yes you can block it from printing out to
word as saved text. My company produces competitive intelligence and we construct pdf's in a way that you can NOT export the text in anyway, except if you want to type it out by hand If you want to take up the challenge let me know We can meet on AIM or YIM, I can send you a PDF sample via file transfer and see how long it takes for you to crack the PDF. If it is a software trick, the size of the PDF wont matter. I can type 10 pages of text in about 12 minutes too if thats what you were thinking. Otherwise, get a webcam or fly to new york and let me see how you do it snappy. "John Smith" wrote in message news ![]() You are gravely mistaken if you think .pdf protects text at all... name a small (10 pages or less) .pdf doc and I will a word doc of it in minutes... .pdf is NOT a security format... Warmest regards, John "Mr. Man with the Master Plan" wrote in message ... you can protect text in PDF format. I do it all the time so people cant copy my work. Now they have to retype it "Caveat Lector" wrote in message news:HxQje.1450$Xh.611@fed1read07... Maybe not the right place but seems there are several web experts here. Can web spiders read and harvest e-mail addresses from a pdf file ? Many users and folks like QRZ.com are using jpegs not ascii for listing e-mails -- this seems to work. So for pdf files without going to a jpeg --- is ascii text addresses harvestable ? Thanks -- CL -- I doubt, therefore I might be ! |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|