Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Gaming > Advanced Dungeons and Dragons > Re: How to OCR ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 1 Topic 759 of 811
Post > Topic >>

Re: How to OCR new books

by tussock <scrub@[EMAIL PROTECTED] > Jan 31, 2008 at 03:33 PM

Jefgorbach wrote:


> While its illegal to UPLOAD, nothing in the copywrite law (afaik) says
> -you- have to make the legal copy for personal use -yourself-

    Yes it does, explicitly so. In many cases, like with books, it's also 
illegal to make yourself any copies. Copyright law is inane; if you're 
not sure about some point, assume it works to punish the paying customer 
and you'll be pretty close to the truth.

> so the easiest way on both yourself and your originals would be check
> limewire/bit torrent for existing copies.

    That's true regardless. Legally, you must buy the book *and* buy the 
PDF, unless the publisher specifically provides otherwise.

> Otherwise be sure to keep the originals aligned as straight as possible,
> using around 600dpi (higher is better, but slower scanning and results
> in much larger files).

    Also make damned sure the scanner program doesn't use any lossy 
compression when you up the scan size. Even at a raw 33 MB per page a 
book will only be 6 gig or so, which doesn't really matter these days, 
and it should tuck down to 20 MB once you've OCR'd and crunched.

    For better results still, use a sheet of black card behind the page 
being scanned to cut back on reflection artifacts, use high contrast 
settings and remember to use the highest quality scan options (so it uses 
the most light), sit a brick or two on the scanner to hold the book flat 
(scanner on a surface that sup****ts such things, and external part of the 
book sup****ted too); or if you're real keen remove the book binding and 
do it page by page, good scanning practice gives it a tough workout that 
a lot of books don't survive anyway.

> Im unfamiliar with ABBY, but generally any scanning program will save
> the pages to images for reprinting. The ocr program converts the images
> into editable text -- the big snag is finding one capable of dealing
> with the multiple columns most books are formatted in without
> interlacing/merging the paragraphs into a worse mess than simply
> retyping.

    Meh, never seen one that couldn't part the columns smoothly, it's 
usually the older PDF construction programs that have trouble there, 
especially the free ones.

    Most of the pirated stuff is OEF rather than OCR now, meaning they 
hand-craft the PDFs from scratch with the full Adobe suite (or now and 
then you see the original PDF from the printer with colour balance and 
alignment marks and such, which is pretty rare).

-- 
    tussock

No blogs and no usenet make tussock something something.
 




 1 Posts in Topic:
Re: How to OCR new books
tussock <scrub@[EMAIL   2008-01-31 15:33:26 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Aug 28 19:19:55 CDT 2008.