• PDF indexing

    Author
    Topic
    #2349556

    I’ve a couple of hundred scientific papers I’ve collected in PDF format – some of which are named in a fashion that one can figure out what they are about, but many (downloaded from Journal websites) have names that are in some Hogwarts code.

    I’d like a piece fo software that could scan multiple PDF’s and make an index from those articles – one document indexing to articles  (even, optimially, with links, but that’s a bit of a dreamj).  An intelligent indexing algorithm that would ignore ‘a’ and ‘the’ would be nice but that’s not imperative…

    Anybody know of such a utility?  I can find things that index single articles but none that will index a group.\

    Cheers

    Richard

     

    Viewing 0 reply threads
    Author
    Replies
    • #2349569

      First, begin the habit of putting your own shortened code at the front of all filenames before hitting save. These may be date based, subject based, author, or some mix thereof; whatever makes referencing most useful to your needs.

      Second, other consumers of various Journals have likely developed this problem at other times prior, over the years. There may be a help section, FAQ, or forum to help decode the Wizarding jargon used by any one Journal at their source. This would assist in third step…

      Hire undergrad summer intern to process backlog of articles according to your new code established in “First,” above. Incentive bonus for timely completion, with random samples to assure compliance. Good luck.

      • #2349610

        Thanks for looking.

        Mostly of course I do rename files but nobody is perfect.   And even with that the names are not as brilliant as one thought after a year or two.

        Perhaps I could have been a bit more clear:   I want to index the contents of files.  Then when I search for COVID19 in a while I will be able to find all files that contain mention of Covid19….

        Cheers R

         

    Viewing 0 reply threads
    Reply To: PDF indexing

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: