Can PDFE identify and locate duplicate PDFs?
PDF comparing is a very complicated task because two PDFs are only identical if they produce rendered page images equal bit by bit with each other and two different files, at binary level, can produce the same rendered output.
So, to make the rendered page comparison we need pdf rendering capabilities and PDFE don't have it, for now.
Obvious this comparison is only the final task of previous processes to isolate files than can be equal and this processes can be done very easily in PDFE.
Assuming that files to be equal must:
- have the same number of pages.
- equal, or very approximate, file size
- Probably, and this depend of the duplicates you are trying to find, same creation date in pdf metadata.
We can easily sort the scangrid by number of pages, select all of equal number and copy them to the workgrid. Now, sort the workgrid, by file size and make a visual comparison of the rendered output, with the help of the reader plugin, of these files with same number of pages and of same, of approximate, file size.
Other processes can be used but they depend on the type of pdf contents of theses files.
Other ideas, to accomplish this task, are welcome
