Author Topic: Batch Create Multipage PDFs based on Part of File name.  (Read 20600 times)

0 Members and 1 Guest are viewing this topic.

RW

  • Newbie
  • *
  • Posts: 4
Batch Create Multipage PDFs based on Part of File name.
« on: July 16, 2015, 09:37:29 PM »
I have a bunch of pdfs labeled like below in the same folder

00001235.001.pdf
00001235.002.pdf
00001235.003.pdf

00001236.001.pdf
00001236.002.pdf
00001236.003.pdf
00001236.004.pdf

with the first 8 characters representing the Document Name and the ".001" representing pg 1 of the document", and ".002" representing pg 2, etc.  I would like to merge all the pdfs that have the same first 8 characters of the file name as one pdf and label the resulting file with the 8 character Document Name, and then move down and merge the next similar Document Named pages, etc.  So after it completed this batch process, it would result in having two multipage pdfs with first being a 3 page pdf and being saved with the file name "00001235.pdf", and the next would be a 4 page pdf and being saved as "00001236.pdf"  Of course most of my folders have thousands of pdfs in them, onoy have with a few hundred different Document Name, except for the page numbers that follow.

Ultimately, it would be nice if you could tell it to merge all pdfs that have the same "X" # of characters, not just be limited to 8 characters, as some may only have 5 similar characters or 10 similar.

One other option to add, would be to have the ability to say merge all pdfs with same Document Name before the first Delimiter that you could choose, as some people may have files labeled like this:  Image1_0001.pdf, image1_0002.pdf and some like this Summer Pics_001.pdf, Summer Pics_002.pdf, Summer Pics_003.pdf  and all being in the same folder and would want to merge all similar Document names that are in front of the underscore. Since these Document names have different # of characters before the underscore, the character count wouldn’t work.  The results would be two files with one being a 2 page document labeled "image1.pdf" and the other being a 3 page document saved as "Summer Pics.pdf" 

If this function could also be used for Tifs or jpgs, and convert to pdf and then merge these into a multipage pdfs, would be really nice.  Since many people have have all sorts of different file types in one folder (pdfs, tif, jpg).

Having the ability to do this in the context menu and in command line would be ideal.  Even adding this function to PDF Explorer would be beneficial, especially if you could add a column to the Explorer window that would count the # of characters of the file name, then you could sort on this column, select the files and merge accordingly.



RTT

  • Administrator
  • *****
  • Posts: 918
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #1 on: July 17, 2015, 05:50:56 AM »
A my scripts script will achieve this easily. I'm going to code a sample one and post it here soon.
Are the result PDFs to be saved at the same folder of the source files?

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #2 on: July 20, 2015, 03:49:48 PM »
Here's a script that, I think, covers all the above scenarios.To have it at your system, just import the attached MergeByFilenamePartMatch.myscript file, into your PDF-ShellTools list of My Scripts.
Code: [Select]
var ListOfFiles = pdfe.Arguments;
var SortedFilesIndexArray = GetSortedFilesIndexArray(ListOfFiles);
var ExcludedNodeName = 'Excluded';
var MergeGroups = {};
var nValidMerges = 0;
var OutputFolder = null;
var LastGroupingMethodFunction = applyBtn_ByNChars_onClick;

//Create the GUI
var html = function() {
/*<!DOCTYPE html>
<html>
<head>
<title>Merge by filename part match</title>
<style>
.fixed-opt-pannel {
  position: fixed;
  top: 0;
  left: 0;
  z-index: 9999;
  width: 100%;
  height: 50px;
  background-color: #E6E6FA;
}
label {
  padding-left:10px;
  display: inline-block;
  width: 120px;
}
#row {
    white-space: nowrap;
}
#row > div {
    display: inline-block;
    margin-top:3px;
}
#row > div + div {
    margin-left: 10px
}
</style>
</head>
<body bgcolor="#E6E6FA">
<nav class="fixed-opt-pannel">
<div style="float:right;margin-top:2px;margin-right:10px;width:100px;">
<button type="button" style="Width:100px;" onClick="window.open('http://www.rttsoftware.com/forum.php?dsturl=http%3A%2F%2Fwww.rttsoftware.com%2Fforum%2Findex.php%3Ftopic%3D503.msg1377')">Help</button>
<button type="button" id="MergeBtn" style="Width:100px;margin-top:2px">Merge</button>
</div>
<div id="row">
<div>
<label for="nChars">By first n chars:</label><input name="nChars" id="nChars" value="8" type="number" onkeypress='return event.charCode >= 48 && event.charCode <= 57'></input>
<input type="button" id="applyBtn_ByNChars" value="Apply"><br>
<label for="sepChar">By separator:</label><input name="sepChar" id="sepChar"  value="_" type="text"></input>
<input type="button" id="applyBtn_BySeparator" value="Apply">
</div>
<div>
<fieldset>
<legend>Output folder:</legend>
<input type="radio" id="ofsame" name="outfolder" value="Same" checked>Same
<input type="radio" id="ofspecify" name="outfolder" value="Specify">Specify
</fieldset>
</div>
</div>
</nav>
<div style="margin-top:55px;">
<select style="width:100%;" name="MergeGroups" id="MergeGroups">
</div>       
</body>
</html>
*/}.toString().replace(/^[^\/]+\/\*!?/, '').replace(/\*\/[^\/]+$/, '');

var objIE = pdfe.CreateObject("InternetExplorer.Application", "objIE_");
objIE.toolbar = false;
objIE.Visible = true;
pdfe.BringWindowToFront(objIE.HWND /*, true*/ );
objIE.Navigate("about:blank");
IE_waitLoad(objIE);
objIE.Document.writeln(html);
objIE.Refresh();
IE_waitLoad(objIE);

//buttons click event functions
function applyBtn_ByNChars_onClick(e) {
    ShowGroups(partial(NameByFirstNCharacters, parseInt(objIE.document.getElementById('nChars').value)));
    LastGroupingMethodFunction = applyBtn_ByNChars_onClick;
}
objIE.document.getElementById('applyBtn_ByNChars').onclick = applyBtn_ByNChars_onClick;

function applyBtn_BySeparator_onClick(e) {
    ShowGroups(partial(NameBySeparator, objIE.document.getElementById('sepChar').value));
    LastGroupingMethodFunction = applyBtn_BySeparator_onClick;
}
objIE.document.getElementById('applyBtn_BySeparator').onclick = applyBtn_BySeparator_onClick;

function SpecifyOutFolder_onclick(e) {
    if (objIE.document.getElementById('ofspecify').checked) {
        var folder = BrowseForFolder(objIE.HWND, 'Select output folder');
        if (folder != null) {
            if (OutputFolder != folder) {
                OutputFolder = folder;
                LastGroupingMethodFunction();
            }
        } else if (OutputFolder == null) {
            objIE.document.getElementById('ofsame').checked = true;
        }
    } else if (OutputFolder != null) {
        OutputFolder = null;
        LastGroupingMethodFunction();
    }

};
objIE.document.getElementById('ofspecify').onclick = SpecifyOutFolder_onclick;
objIE.document.getElementById('ofsame').onclick = SpecifyOutFolder_onclick;

//handle the deletion of files from group with the DEL keypress event
function onMergeGroupKeyDown(e) {
    var sel = e.target ? e.target : e.srcElement;
    if (e.keyCode && e.keyCode == 46 || e.which == 46) { //if DEL key
        if (sel.selectedIndex >= 0 && ExcludedNodeName !== sel.options[sel.selectedIndex].parentNode.label) { //if not from the excluded node
            //remove from MergeGroups
            var optgroup = sel.options[sel.selectedIndex].parentNode;
            var index = findIndex(sel.options[sel.selectedIndex]);
            var ExcludedFileIndex = MergeGroups[optgroup.label].splice(index, 1)[0];
            objIE.document.getElementById('MergeBtn').disabled = --nValidMerges == 0;
            //remove from the html selector
            sel.remove(sel.selectedIndex);
            if (MergeGroups[optgroup.label].length == 0) {
                sel.removeChild(optgroup);
                sel.size--;
                delete MergeGroups[optgroup.label];
            }

            //add the removed file to the excluded node list   
            if (!(ExcludedNodeName in MergeGroups)) {
                MergeGroups[ExcludedNodeName] = [];
                optgroup = objIE.document.createElement("OPTGROUP");
                optgroup.label = ExcludedNodeName;
                sel.insertBefore(optgroup, sel.children[0]);
                sel.size++;
            } else optgroup = sel.options[0].parentNode;

            MergeGroups[ExcludedNodeName].push(ExcludedFileIndex);
            //now in the html
            var option = objIE.document.createElement("OPTION");
            var filename = ListOfFiles(ExcludedFileIndex);
            option.title = htmlspecialchars(filename);
            option.text = htmlspecialchars(filename.substring(filename.lastIndexOf('\\') + 1));
            optgroup.appendChild(option);

        }
    }
};
objIE.document.getElementById("MergeGroups").onkeydown = onMergeGroupKeyDown;

//Where the files merge happens
function Merge_onclick() {
    BrowserRunning = false;
    objIE.Visible = false;
    objIE.Quit();

    var Merger = pdfe.CreateDocumentMerger();
    var ProgressBar = pdfe.ProgressBar;
    ProgressBar.max = nValidMerges;

    for (var groupFilename in MergeGroups)
    if (groupFilename != ExcludedNodeName) {
        pdfe.echo('>' + groupFilename);
        for (var ii = 0; ii < MergeGroups[groupFilename].length; ii++) {
            ProgressBar.position++;
            var srcFilename = ListOfFiles(MergeGroups[groupFilename][ii]);
            pdfe.echo('   Merging: ' + srcFilename);
            if (Merger.MergeDocument(srcFilename)) pdfe.echo(' [OK]', 0, true)
            else pdfe.echo(' [Failed]', 0xFF0000, true);
        }
        //save the new file
        if (!Merger.EndAndSaveTo(groupFilename)) pdfe.echo(' [Failed]', 0xFF0000);
        pdfe.echo('');
    }
    pdfe.echo('Done');
}
objIE.document.getElementById('MergeBtn').onclick = Merge_onclick;

function GetSortedFilesIndexArray(FilesList) {
    var filesIndexArray = new Array(FilesList.length);
    for (var i = 0; i < FilesList.length; i++) {
        filesIndexArray[i] = i;
    }
    filesIndexArray.sort(function(a, b) {
        return FilesList(a) > FilesList(b) ? 1 : FilesList(a) < FilesList(b) ? -1 : 0;
    });
    return filesIndexArray;
};

//Compute filename by start number of characters
function NameByFirstNCharacters(nChars, filename) {
    var name = filename.substring(filename.lastIndexOf('\\') + 1, filename.lastIndexOf('.'));
    if (nChars && name.length > nChars) return (OutputFolder ? OutputFolder : filename.substring(0, filename.lastIndexOf('\\') + 1)) + name.substr(0, nChars) + '.pdf'
    else return '';
}

//Compute filename by separator
function NameBySeparator(separator, filename) {
    var name = filename.substring(filename.lastIndexOf('\\') + 1, filename.lastIndexOf('.'));
    var s = name.substring(0, name.indexOf(separator));
    if (s.length > 0) return (OutputFolder ? OutputFolder : filename.substring(0, filename.lastIndexOf('\\') + 1)) + s + '.pdf'
    else return '';
}

//Returns the list of files grouped by equal part off the filename, part computed by the passed GetNameFunction 
function GetMergeGroups(GetNameFunction) {
    var groups = {}
    groups[ExcludedNodeName] = [];
    nValidMerges = 0;
    for (var i = 0; i < SortedFilesIndexArray.length; i++) {
        var name = GetNameFunction(ListOfFiles(SortedFilesIndexArray[i]));
        if (name.length == 0) name = ExcludedNodeName
        else nValidMerges++;
        if (!(name in groups)) {
            groups[name] = [];
        }
        groups[name].push(SortedFilesIndexArray[i]);
    }
    if (nValidMerges == SortedFilesIndexArray.length) delete groups[ExcludedNodeName];
    return groups;
}

//Compute the merge groups, and show it in the GUI
function ShowGroups(Namefunct) {
    MergeGroups = GetMergeGroups(Namefunct);
    var groupsHtml = '';
    for (var item in MergeGroups) {
        groupsHtml += '<optgroup  label="' + item + '">\n';
        for (var ii = 0; ii < MergeGroups[item].length; ii++) {
            var filename = ListOfFiles(MergeGroups[item][ii]);
            filename = filename.substring(filename.lastIndexOf('\\') + 1);
            groupsHtml += '<option title="' + htmlspecialchars(ListOfFiles(MergeGroups[item][ii])) + '">' + htmlspecialchars(filename) + '<img src="dummy.gif" width="16" height="16">';
        }
        groupsHtml += '\n</optgroup>\n';
    }
    var selector = objIE.document.getElementById('MergeGroups');
    selector.innerHTML = groupsHtml;
    selector.size = selector.length + selector.children.length;
    objIE.document.getElementById('MergeBtn').disabled = nValidMerges == 0;
}

//start with the default grouping method
ShowGroups(partial(NameByFirstNCharacters, 8));

//pass control to the GUI
var BrowserRunning = true;
while (BrowserRunning && objIE.Visible) {
    pdfe.Sleep(500);
}

function objIE_OnQuit() {
    BrowserRunning = false;
}

/*************************************************************************/
/*************************************************************************/

//http://stackoverflow.com/questions/373157/how-can-i-pass-a-reference-to-a-function-with-parameters
function partial(func) {
    var args = new Array();
    for (var i = 1; i < arguments.length; i++) {
        args.push(arguments[i]);
    }
    return function() {
        var allArguments = args.concat(Array.prototype.slice.call(arguments));
        return func.apply(this, allArguments);
    }
}

function htmlspecialchars(str) {
    if (typeof(str) == "string") {
        str = str.replace(/&/g, "&amp;"); /* must do &amp; first */
        str = str.replace(/"/g, "&quot;");
        str = str.replace(/'/g, "&#039;");
        str = str.replace(/</g, "&lt;");
        str = str.replace(/>/g, "&gt;");
    }
    return str;
}

//Wait until Internet Explorer document loading is complete.
function IE_waitLoad(pIE) {
    var stat, dstart;
    stat = 0;
    while (true) {
        if (stat == 0) {
            if (!pIE.Busy) {
                if (pIE.Document.readyState == "complete") {
                    dstart = new Date().getTime();
                    stat = 1;
                }
            }
        } else {
            if (!pIE.Busy && pIE.Document.readyState == "complete") {
                if (new Date().getTime() >= dstart + 50) {
                    break;
                }
            } else {
                stat = 0;
            }
        }
        pdfe.sleep(50)
    }
}

//Finds the index of an html element in the parent children's list
function findIndex(node) {
    var i = 0,
        prev = node.previousElementSibling;

    if (prev) {
        do ++i;
        while (prev = prev.previousElementSibling);
    } else {
        while (node = node.previousSibling) {
            if (node.nodeType === 1) {
                ++i;
            }
        }
    }
    return i;
}

function BrowseForFolder(HWND, sTitle, rootFolder) {
    var objShell = new ActiveXObject("shell.application");
    var ofolder = objShell.BrowseForFolder(HWND, sTitle, 0x00000001, rootFolder);
    if (ofolder != null) return ofolder.Self.Path + (ofolder.Self.Path.charAt(ofolder.Self.Path.length - 1) == '\\' ? "" : "\\");
    else return null;
}

RW

  • Newbie
  • *
  • Posts: 4
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #3 on: July 21, 2015, 06:18:33 PM »
thank you very much.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #4 on: May 23, 2017, 06:54:08 AM »
I can;t get this t work..
Isn't it supposed to merge files? Mine is a preview only, nothing happens when you hit "merge"

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #5 on: May 23, 2017, 11:46:48 PM »
Just tested it (this script is included in the list of sample scripts, but not checked to show in the menu), and it worked. In what Windows, and Internet Explorer, version are you testing?
Run it from the script editor (you can easily open the script in the editor, and passing to it the selected files, by pressing the CTRL key while invoking the script from the context menu), to see if any error shows up. If you have the debug dependencies installed, you can place a breakpoint at the first line of the Merge_onclick function, to see if the Internet Explorer Merge button is invoking this function for the click event.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #6 on: May 24, 2017, 12:13:06 AM »
It is working on one of my machines running PDFInfoTools but not my other one.. odd. May just require restarting.
Running Windows10. Internet Explorer 11

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #7 on: April 23, 2018, 01:56:37 AM »
I can;t get this t work..
Isn't it supposed to merge files? Mine is a preview only, nothing happens when you hit "merge"

Still getting this error.
The function buttons won't do anything .. the "apply" or "merge" buttons - not doing anything.

Is there any specific setting in IE that need to be on to make this work? or any add on programs or permissions that might be blocking it?

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #8 on: April 24, 2018, 01:24:54 AM »
Is there any specific setting in IE that need to be on to make this work? or any add on programs or permissions that might be blocking it?
None that I know about.
Check if the attached variation, (it's using a different method to assign the click event functions) works for you.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #9 on: April 24, 2018, 06:20:14 AM »
yeah still nothing.. you click the merge or apply buttons and they do nothing.

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #10 on: April 25, 2018, 01:01:51 AM »
I have no idea for what may be cause. Check under the IE options security tab if you have the protection enabled for the local intranet, and disable it if yes. Check also under the advanced tab if there is any related setting that may be causing this. I'm using the default ones, ensured by a settings reset.

Try putting a pdfe.echo("Merge btn clicked") at the beginning of the Merge_onclick functions, i.e.:

function Merge_onclick() {
    pdfe.echo("Merge btn clicked");
    BrowserRunning = false;
    objIE.Visible = false;
    objIE.Quit();
...

to see if that message show at the script output console when you click the "merge" button.

Open the IE developer tools and check if any error message appears at the console when these buttons are clicked.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #11 on: June 19, 2018, 01:31:19 AM »
yep, still no love.
is such a handy tool when it works.
but no joy.. tried everything above.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #12 on: June 19, 2018, 02:53:28 AM »
what about chrome of firefox?
can the code be manipulated to generate he program box in those instead of IE?

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #13 on: June 19, 2018, 11:58:45 PM »
yep, still no love.
is such a handy tool when it works.
but no joy.. tried everything above.
Sorry, but I don't have any other idea.
If I get some time, I'm going to try to pass all the code execution to the browser, making use of the possibility of using the scripts API from an external host. This way there is no need for these buttons to call back the script, and so fix the issue you are experiencing, if this is indeed the problem.

what about chrome of firefox?
can the code be manipulated to generate he program box in those instead of IE?
No, using this COM automation technique. With a more complex approach, e.g. using Node.js, may be doable.

Bryant Coon

  • Guest
Re: Batch Create Multipage PDFs based on Part of File name.
« Reply #14 on: October 27, 2019, 04:01:06 PM »
I'm using a separator but instead of computing the merged file's name to be the portion before the separator I need it to be the same as the name of the alphabetically first matched file. I also need files to be merged in numeric order where 1 would come before 100. I've tinkered with the script some but apparently this is beyond my abilities.