on Batch tools -> Search & Extract I can only choose 1 page (too little, isbn is usually between page 3 to page 5) or all pages (way too much)
Why not the first and last 10 pages?
That's a good point! I will try to address this in the next version.
the given regexp misses a lot of ISBNs and I´m not manage to use regexp that I found on other scripts:
RE_ISBN = re.compile("(?:ISBN[ -]*(?:|10|13)|International Standard Book Number)[:\s]*(?:|, PDF ed.|, print ed.|\(pbk\)|\(electronic\))[:\s]*([-0-9Xx]{10,25})",
re.MULTILINE)
// This is a combination of strict and relaxed versions of ISBN number format
var reISBN=/(ISBN[\:\=\s][\s]*(?=[-0-9xX ]{13})(?:[0-9]+[- ]){3}[0-9]*[xX0-9])|(ISBN[\:\=\s][ ]*\d{9,10}[\d|x])/g;
The regular expressions component I'm using only supports a subset of the Perl regular expressions, making many of these regular expressions out there incompatible, if not modified to match the supported syntax.
You can check the supported syntax in the attached file.
Try with this one, that also contemplate the relaxed version:
(\d{3}[-]\d{1,5}[-]\d{1,7}[-]\d{1,6}[-][\d,x,X]|\d{1,5}[-]\d{1,7}[-]\d{1,6}[-][\d,x,X])|(ISBN[\:\=\s][ ]*\d{9,10}[\d|x])
In this case only the full match is important, so don't forget to set the capturing group 0 to "Extract".
If misses continue, please mail me one of these PDFs, so I can take a look.
Some of the misses can also be related to the quality of the extracted text. You can use the PDFView (text only mode), or the text extractor tool, to better understand what text the tool is processing.