Author Topic: Problems with text indexation  (Read 8841 times)

0 Members and 1 Guest are viewing this topic.

Padanges

  • Newbie
  • *
  • Posts: 179
Problems with text indexation
« on: November 25, 2016, 08:03:49 AM »
Hi,
It seams that Index text words Batch tool does not reindex already indexed documents even if they were modified - after modifying a PDF and using dbSearch tool I get "bypasing already indexed text files" I/O message. Is that correct? How can then we re-index a document?
Also, I have noticed that indexed PDF documents don't get their Bookmarks indexed. That a misfortune - I have many PDF documents which are not OCR, i.e. do not contain any text, but have a "quick-fix" for keywords/tags as a bookmark structure. Is it possible to expand dbSearch so that it could check in bookmarks as a part of the text as well?


Thanks in advance

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Problems with text indexation
« Reply #1 on: November 27, 2016, 01:47:27 AM »
How can then we re-index a document?
Right now you can do this by removing from the DB the containing folder, using the database edit tool, menu database>edit. Just select in the DB tree the folder(s), where you have that/these file(s), and delete it. After, you just need to reindex the files again.

Quote
Also, I have noticed that indexed PDF documents don't get their Bookmarks indexed. That a misfortune - I have many PDF documents which are not OCR, i.e. do not contain any text, but have a "quick-fix" for keywords/tags as a bookmark structure. Is it possible to expand dbSearch so that it could check in bookmarks as a part of the text as well?
The pdfe can already parse the PDF bookmarks, so indexation of the bookmarks is something that can be added. When the text content indexer got developed, bookmarks support wasn't available yet.

Padanges

  • Newbie
  • *
  • Posts: 179
Re: Problems with text indexation
« Reply #2 on: November 30, 2016, 10:21:57 AM »
One more feature for the next release!  ;D
I'm leading the high-score board  8)

Padanges

  • Newbie
  • *
  • Posts: 179
Re: Problems with text indexation
« Reply #3 on: December 16, 2016, 09:35:45 AM »
Quote
The pdfe can already parse the PDF bookmarks, so indexation of the bookmarks is something that can be added.
Do you consider bookmark indexation addition as a separate option/check or will it be done automatically?

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Problems with text indexation
« Reply #4 on: December 18, 2016, 01:16:18 AM »
Automatically.