Author Topic: Rasterizing PDFs?  (Read 15254 times)

0 Members and 1 Guest are viewing this topic.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Rasterizing PDFs?
« on: October 20, 2017, 06:38:48 AM »
Got any thoughts on a way to rasterize PDF documents?
Would look to highlight a group of files and right click, select rasterize.
Would be for very heavily layered files that take too long to load / print.

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Rasterizing PDFs?
« Reply #1 on: October 21, 2017, 07:18:21 PM »
The next script rasterizes all the document pages into image files, using the extract images tool command line interface, and then convert all these image files to a PDF with the same name as the original and a "_rasterized.pdf" suffix.

Code: [Select]
var RenderDPIs = 120;

//RunAsync = true: Rendered pages image files are converted to PDF as created.
//RunAsync = false: Wait until all the PDF pages have being rendered, before converting all back to PDF
var RunAsync = true;

//================================================================================
var fso = new ActiveXObject("Scripting.FileSystemObject");
var objShell = new ActiveXObject("Wscript.Shell");

var st_exe = fso.GetParentFolderName(pdfe.FullName) + '\\pdfshelltools.exe';
var cmd = '"' + st_exe + '" ExtractImages -s "OutputPath=' + fso.GetSpecialFolder(2 /*TemporaryFolder*/ ) + '\\\\" ExtractType=0 ImageType=3 RenderDPIs=' + RenderDPIs + ' NamePrefix=';

function ExtractImages_GetImageFilePrefix(filename, WaitOnReturn) {
    var NamePrefix = fso.GetTempName();
    objShell.Run(cmd + NamePrefix + ' "' + filename + '"', 0, WaitOnReturn);
    return fso.GetSpecialFolder(2 /*TemporaryFolder*/ ) + '\\' + NamePrefix;
}

var Merger = pdfe.CreateDocumentMerger();
var ProgressBar = pdfe.ProgressBar;
ProgressBar.max = pdfe.SelectedFiles.Count;

for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
    ProgressBar.position = i + 1;
    try {
        var File = pdfe.SelectedFiles(i),
            Filename = File.Filename,
            Pages = File.Pages;
        var Path = Filename.substr(0, Filename.lastIndexOf('\\') + 1),
            Name = Filename.substring(Path.length, Filename.lastIndexOf('.'));

        pdfe.echo(' > rasterizing ' + Filename);
        pdfe.echo(' ');
        var ImageFilesPrefix = ExtractImages_GetImageFilePrefix(Filename, !RunAsync);

        for (var PageIndex = 0; PageIndex < Pages.Count; PageIndex++) {
            var imgfilename = ImageFilesPrefix + pad(PageIndex, 4) + '.png';
            var OKtoMerge = false;
            if (RunAsync) {
                pdfe.echo(' Page ' + (PageIndex + 1) + '/' + Pages.Count, 0, 2);

                //wait until page rendered image file has been created
                while (!fso.FileExists(imgfilename)) {
                    pdfe.sleep(1000);
                }
                //wait until image file not in use.
                while (true) {
                    try {
                        var ots = fso.opentextfile(imgfilename, 8, false);
                        ots.close();
                        break;
                    } catch (e) {
                        pdfe.sleep(1000);
                    }
                }

                OKtoMerge = true;
            } else {
                OKtoMerge = fso.FileExists(imgfilename);
            }

            if (OKtoMerge) {
                Merger.MergeDocument(imgfilename);
                fso.DeleteFile(imgfilename);
            } else {
                pdfe.echo('     Page ' + (PageIndex + 1) + ' failed to render', 0xFF0000, 2);
                pdfe.echo(' ');
            }
        }

        var NewFilename = Path + Name + '_rasterized.pdf';
        if (Merger.EndAndSaveTo(NewFilename)) {
            pdfe.echo('     Saved to: ' + NewFilename + ' [OK]', 0, 2)
        } else {
            pdfe.echo('     Saving to: ' + NewFilename + ' [Failed]', 0xFF0000, 2);
        }
    } catch (e) {
        pdfe.echo(e.message, 0xFF0000);
    }
}
pdfe.echo('Done');

function pad(num, size) {
    var s = "000000000" + num;
    return s.substr(s.length - size);
}

Not entirely sure if this is what you are asking for. Let me know if not.
In the first line of the script you may change the RenderDPIs variable, if you need higher resolution.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Rasterizing PDFs?
« Reply #2 on: October 23, 2017, 12:33:39 AM »
This doesn't work for me..

So, the extract images tool doesn't work - sort of. It looks like it does something but no output file, no error..
The extract images part works, but not extract pages part.

Any thoughts?

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Rasterizing PDFs?
« Reply #3 on: October 23, 2017, 01:52:34 AM »
Check now. I've made a little modification in the above script extract images command line parameters, in order to handle paths with spaces properly. I was testing with my PDF-ShellTools installed in a non-standard folder, so missed that issue.

If the problem continues, delete the -s parameter from the var cmd = '"' + st_exe + '" ExtractImages -s "OutputPath=' + fso.GetSpecialFolder(2 /*TemporaryFolder*/ ) + '\\\\" ExtractType=0 ImageType=3 RenderDPIs=' + RenderDPIs + ' NamePrefix='; line, and run the script with a single PDF. This way, the extract images tool will run in GUI mode, so you can check if it is working or not. It should show the thumbnails of each of the PDF pages. If yes, just hit the "extract" button, and the script will do its job and create a PDF with these rasterized page images. If not, let me know the details.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Rasterizing PDFs?
« Reply #4 on: October 23, 2017, 02:34:19 AM »
I managed to get a different file to work with the image extractor, but it is only letting me do a PNG?

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Rasterizing PDFs?
« Reply #5 on: October 23, 2017, 02:47:18 AM »
Can you run a test with this file? :o

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Rasterizing PDFs?
« Reply #6 on: October 23, 2017, 11:50:12 PM »
I managed to get a different file to work with the image extractor, but it is only letting me do a PNG?
What other format were you expecting? The tool extracts PDF image objects or the rasterization of the pages.
The above script just takes advantage of the tool page rasterization functionality, to create a rasterized PDF by merging all the generated page rasterization images files into a new PDF.

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Rasterizing PDFs?
« Reply #7 on: October 24, 2017, 12:06:41 AM »
Can you run a test with this file? :o
Worked fine. Check attachment.

If you have the ImageMagick tool still installed, as when you requested a "Script to count how many colour pages in PDF?", check if this ImageMagick based rasterization script works for you.
Code: [Select]
var RenderDPIs = 120;
//================================================================================

var imo = new ActiveXObject("ImageMagickObject.MagickImage.1");

var ProgressBar = pdfe.ProgressBar;
ProgressBar.max = pdfe.SelectedFiles.Count;

for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
    ProgressBar.position = i + 1;
    try {
        var File = pdfe.SelectedFiles(i),
            Filename = File.Filename,
            Pages = File.Pages;
        var Path = Filename.substr(0, Filename.lastIndexOf('\\') + 1),
            Name = Filename.substring(Path.length, Filename.lastIndexOf('.'));

        pdfe.echo(' > rasterizing ' + Filename);
        pdfe.echo(' ');
       
        var NewFilename = Path + Name + '_rasterized.pdf';
        imo.convert('-density',RenderDPIs, Filename, NewFilename);
        pdfe.echo('     Saved to: ' + NewFilename + ' [OK]', 0, 2)
    } catch (e) {
        pdfe.echo(e.message, 0xFF0000);
    }
}
pdfe.echo('Done');
Not so fast as the above solution, and the generated PDF is bigger in size, but should be less cumbersome.

You need to give me more details about what fails with the first script, so I can figure out what may be the cause.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Rasterizing PDFs?
« Reply #8 on: October 24, 2017, 11:42:41 PM »
I managed to get a different file to work with the image extractor, but it is only letting me do a PNG?
What other format were you expecting? The tool extracts PDF image objects or the rasterization of the pages.
The above script just takes advantage of the tool page rasterization functionality, to create a rasterized PDF by merging all the generated page rasterization images files into a new PDF.

In the image extractor tool (not the new script) I don't have the option of PNG or JPG or TIF etc?

In the help file / webpage it says that should be an option..

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Rasterizing PDFs?
« Reply #9 on: October 24, 2017, 11:44:10 PM »
Can you run a test with this file? :o
Worked fine. Check attachment.

If you have the ImageMagick tool still installed, as when you requested a "Script to count how many colour pages in PDF?", check if this ImageMagick based rasterization script works for you.
Code: [Select]
var RenderDPIs = 120;
//================================================================================

var imo = new ActiveXObject("ImageMagickObject.MagickImage.1");

var ProgressBar = pdfe.ProgressBar;
ProgressBar.max = pdfe.SelectedFiles.Count;

for (var i = 0; i < pdfe.SelectedFiles.Count; i++) {
    ProgressBar.position = i + 1;
    try {
        var File = pdfe.SelectedFiles(i),
            Filename = File.Filename,
            Pages = File.Pages;
        var Path = Filename.substr(0, Filename.lastIndexOf('\\') + 1),
            Name = Filename.substring(Path.length, Filename.lastIndexOf('.'));

        pdfe.echo(' > rasterizing ' + Filename);
        pdfe.echo(' ');
       
        var NewFilename = Path + Name + '_rasterized.pdf';
        imo.convert('-density',RenderDPIs, Filename, NewFilename);
        pdfe.echo('     Saved to: ' + NewFilename + ' [OK]', 0, 2)
    } catch (e) {
        pdfe.echo(e.message, 0xFF0000);
    }
}
pdfe.echo('Done');
Not so fast as the above solution, and the generated PDF is bigger in size, but should be less cumbersome.

You need to give me more details about what fails with the first script, so I can figure out what may be the cause.

This solution was actually much faster.. ? The other is literally just stuck displaying this:
 > rasterizing O:\12010 COOKE & D\og\combined revised hyd-0001.pdf
 Page 1/1

I'll leave ot be and see how long it takes but at this point it doesn't seem to want to spit out a file.

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Rasterizing PDFs?
« Reply #10 on: October 25, 2017, 01:32:57 AM »
I managed to get a different file to work with the image extractor, but it is only letting me do a PNG?
What other format were you expecting? The tool extracts PDF image objects or the rasterization of the pages.
The above script just takes advantage of the tool page rasterization functionality, to create a rasterized PDF by merging all the generated page rasterization images files into a new PDF.

In the image extractor tool (not the new script) I don't have the option of PNG or JPG or TIF etc?

In the help file / webpage it says that should be an option..
These file formats (TIF is not included) are only provided when extracting PDF image objects, not whole pages rasterization that is limited to PNG.
A multi-page TIF option may be a good addition, to join all the extracted images in one image file. :-\

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Rasterizing PDFs?
« Reply #11 on: October 25, 2017, 01:44:23 AM »
This solution was actually much faster.. ? The other is literally just stuck displaying this:
 > rasterizing O:\12010 COOKE & D\og\combined revised hyd-0001.pdf
 Page 1/1

I'll leave ot be and see how long it takes but at this point it doesn't seem to want to spit out a file.

When you run the GUIed extract images tool on that file does it take that much to show/extract the page(s) image? The extract images tool relies on the system registered thumbnail handler (the shell extension the Windows shell uses to show the files thumbnails) to do the rasterization (the next release has a builtin one ;)). Do you know what thumbnail handler do you have installed (usually it is the provided by the application set as the default PDF reader, e.g. Acrobat, etc.

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Rasterizing PDFs?
« Reply #12 on: October 25, 2017, 07:21:24 AM »
It never finished.. I need a DPI of about 300 for production printing.. is that just too high for this tool?
As I said, the new one you provided was much faster!

RTT

  • Administrator
  • *****
  • Posts: 918
Re: Rasterizing PDFs?
« Reply #13 on: October 26, 2017, 01:33:26 AM »
It never finished.. I need a DPI of about 300 for production printing.. is that just too high for this tool?
Just tested and both scripts worked just fine with 300 DPI. You just need to edit the first line of the script to var RenderDPIs = 300;

As I said, the new one you provided was much faster!
If the first didn't finished, how can you know that? Something is failing in the first script call to the extract images tool, and the script is just waiting for the page rasterization image file to be created, something that is not happening. That's why I asked if you are able to manually use the extract image tool to rasterize the page to an external .png image file.

Anyway, if the second script is working, you have your problem covered. Or isn't producing what you need?

nightslayer23

  • Newbie
  • *
  • Posts: 98
Re: Rasterizing PDFs?
« Reply #14 on: October 26, 2017, 02:06:22 AM »
If the first didn't finished, how can you know that? Something is failing in the first script call to the extract images tool, and the script is just waiting for the page rasterization image file to be created, something that is not happening. That's why I asked if you are able to manually use the extract image tool to rasterize the page to an external .png image file.

Anyway, if the second script is working, you have your problem covered. Or isn't producing what you need?

sorry mate, yeah when I manually use the extract image tool it takes forever and doesn't output. it did for one file, which was just a simple A4 with an image on it, but anything more complicated and it doesn't seem to be doing much.

how long did it take for you using the first script? just because you said the second script would be slower it made me think something was wrong on my version if option 2 was (slow) but still much faster to get a result. was hoping that if there was indeed something wrong with my version, it might get fixed then it would be a much faster process.