Author Topic: Archive scan depth limit (Read 20252 times)

Padanges · « **on:** November 22, 2016, 10:36:23 AM »

Hi,
is it possible to limit the depth of archives for document scanning? For example, I have an archive within an archive, and I would like to find only documents which are only in the primary archive - is there a way to do that?

Thanks in advance

RTT · « **Reply #1 on:** November 23, 2016, 12:04:56 AM »

Not possible right now but, but definitively something that may be implemented. I will check it. Thanks for the suggestion.

Padanges · « **Reply #2 on:** November 25, 2016, 07:54:15 AM »

This feature would be most welcome

Padanges · « **Reply #3 on:** November 26, 2016, 08:46:26 AM »

I used to extract file name from full path by checking whether it's inside an archive with such code:

Code: [Select]

if (fileName.indexOf('>') > 0) {                // remove archive name tag
	fileName = fileName.substring(fileName.indexOf('>') + 1); }

After messing around I found out that it would not work properly depending on archive depth.
Currently our file name pattern is: <archive.zip>archive-inside.zip|document-inside.pdf .
Wouldn't it be simpler if we had pattern like this: <archive.zip><archive-inside.zip>document-inside.pdf ?

Padanges · « **Reply #4 on:** November 26, 2016, 08:59:47 AM »

I think limiting scan depth should even speed-up file scanning in cases where we have archived archives of various recognizable file types.

RTT · « **Reply #5 on:** November 27, 2016, 01:59:50 AM »

Quote from: Padanges on November 26, 2016, 08:46:26 AM

I used to extract file name from full path by checking whether it's inside an archive with such code:
Code: [Select]
if (fileName.indexOf('>') > 0) { // remove archive name tag fileName = fileName.substring(fileName.indexOf('>') + 1); }After messing around I found out that it would not work properly depending on archive depth.

Try this way:
FileName = FileName.substring(FileName.indexOf('>') + 1).split('|').slice(-1)[0];

Quote

Currently our file name pattern is: <archive.zip>archive-inside.zip|document-inside.pdf .
Wouldn't it be simpler if we had pattern like this: <archive.zip><archive-inside.zip>document-inside.pdf ?

No. Current format makes it easy to parse with a simple split operation. What's after the main archive name will be handled by the un-archive code, and it is passed to it as the filename to extract. It splits it and follows the split array in order to reach the last level, that is the file the caller requested.

RTT · « **Reply #6 on:** November 27, 2016, 02:03:57 AM »

Quote from: Padanges on November 26, 2016, 08:59:47 AM

I think limiting scan depth should even speed-up file scanning in cases where we have archived archives of various recognizable file types.

How's that?

Padanges · « **Reply #7 on:** November 30, 2016, 10:26:17 AM »

Quote

How's that?

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?

RTT · « **Reply #8 on:** December 01, 2016, 12:47:47 AM »

Quote from: Padanges on November 30, 2016, 10:26:17 AM

Quote
How's that?

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?

The archive within archive scan depth, that I just finished implementing, is about instructing the scanner how many levels of archives inside archives should be scanned. If in the scenario you are referring, these .txt files are archived in an archive inside a main archive, then setting the scan depth can indeed exclude these files from the indexation, and speed-up the scanning. But if you just want to scan all, the scan depth check, in the end, makes the process slower. But not that much, and the feature is indeed useful.

Padanges · « **Reply #9 on:** December 16, 2016, 09:26:56 AM »

Quote

The archive within archive scan depth, that I just finished implementing, is about instructing the scanner how many levels of archives inside archives should be scanned.

That's sweet

Quote

FileName = FileName.substring(FileName.indexOf('>') + 1).split('|').slice(-1)[0];

An alternative code could be: FileName = FileName.substring(FileName.lastIndexOf('|') + 1);

RTT · « **Reply #10 on:** December 18, 2016, 01:13:14 AM »

Quote from: Padanges on December 16, 2016, 09:26:56 AM

An alternative code could be: FileName = FileName.substring(FileName.lastIndexOf('|') + 1);

No. It will fail if the archived file is in the main archive (depth 0), i.e. no '|' character present.

RTTSoftware Support Forum

Author Topic: Archive scan depth limit (Read 20252 times)

Padanges

Archive scan depth limit

RTT

Re: Archive scan depth limit

Padanges

Re: Archive scan depth limit

Padanges

Re: Archive scan depth limit

Padanges

Re: Archive scan depth limit

RTT

Re: Archive scan depth limit

RTT

Re: Archive scan depth limit

Padanges

Re: Archive scan depth limit

RTT

Re: Archive scan depth limit

Padanges

Re: Archive scan depth limit

RTT

Re: Archive scan depth limit