Well,
I have to work with filled forms, so I know what the fields are (the
titles) and I need to extract the info in the filled fields. So it
shouldn't be too big a problem to guess what the fileds are checking the
filled title.
I'm not sure how to manage the whole do***ent and build a user interface
for it. I would like to build the user interface with some kind of
script extracting info from the do***ent (template) and presentig to the
user the necessary fields to fill in.
Any idea how to manage do***ents this way. Some tutorial or code sample?
Thanks for your help
Jim Langston wrote:
> "DoctorC" <enco@[EMAIL PROTECTED]
> wrote in message
> news:456ae3ac$0$17950$f69f905@[EMAIL PROTECTED]
>
>>Hi,
>>I need some suggestion about do***ent processing techniques.
>>I need to im****t do***ents in html, DOC and PDF formats and would like
to
>>parse them and automatically create fields to fill the do***ents.
>>Any idea how to do it?
>
>
> "im****t do***ents..." "automaticallycreate fields to fill the
do***ents..."
>
> html, DOC and PDF are 3 different animals.
>
> The easiest would probably be HTML, since it'll probably have tags
specify
> what are actually fields (if my HTML memory servers me, it might be
> something like <field=...> but don't quote me on that).
>
> The problem with DOC and PDF is there is nothing really stating what a
field
> is. Lets take a PDF which are (usually) graphic images. If they are
> graphic you'll need some type of OCR (Optical Character Recognition) to
read
> the text. At least with DOC you already have that. But then what? How
do
> you know what a field is?
>
> We, as humans, see:
> Name
> and we know we're supposed to put our name their. How is you software
> supposed to distinguish that as a field though? How does it know:
> Enter your name:
>
> is a field and
> Do not write below this line:
> isn't?
>
>
>


|