Tue. Jul 14th, 2020

Apache PDFBox 2.0.19 released: fix bugs

2 min read

Apache PDFBox

The Apache PDFBox library is an open-source Java tool for working with PDF documents. This project allows the creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command-line utilities. PDFBox is published under the Apache License, Version 2.0.

Apache PDFBox

Features

  • Extract Text: Extract Unicode text from PDF files.
  • Split & Merge: Split a single PDF into many files or merge multiple PDF files.
  • Fill Forms: Extract data from PDF forms or fill a PDF form.
  • Preflight: Validate PDF files against the PDF/A-1b standard.
  • Print: Print a PDF file using the standard Java printing API.
  • Save as Image: Save PDFs as image files, such as PNG or JPEG.
  • Create PDFs: Create a PDF from scratch, with embedded fonts and images.
  • Signing: Digitally sign PDF files.

Apache PDFBox 2.0.19 has been released.

Bug

[PDFBOX-4720] – cmap entries “<0000> <FFFF> <0000>” are cut
[PDFBOX-4722] – TestTextStripper doesn’t detect when less output
[PDFBOX-4724] – Wrong calculation of position in InputStreamSource#readFully
[PDFBOX-4727] – ExtractEmbeddedFiles.java example uses name tree keys as file names
[PDFBOX-4730] – /OC in form and image XObjects not handled
[PDFBOX-4738] – getDocument().getObjects() returns nothing for split result documents
[PDFBOX-4741] – NullPointerException in PlainText constructor
[PDFBOX-4742] – Incorrect handling of float Infinity and NaN
[PDFBOX-4745] – COSObjectKey.hashCode doesn’t work for generation numbers > 0
[PDFBOX-4750] – java.io.IOException: Error:Unknown type in content stream:COSNull{}
[PDFBOX-4753] – NumberFormatException while parsing a certain PDF document
[PDFBOX-4755] – Fonts improperly rendered
[PDFBOX-4756] – ScratchFileBuffer seek beyond the last page
[PDFBOX-4760] – wordSeparator not being inserted when word ends with ” ”
[PDFBOX-4763] – Can’t get inline image raw data
[PDFBOX-4765] – NPE in ExtractImages.ImageGraphicsEngine().run()
[PDFBOX-4771] – JPEG image with transparency can’t be extracted
[PDFBOX-4772] – Improve memory consumption of PDAbstractAppearanceHandler (2)
[PDFBOX-4777] – Avoid OOM for malformed PDFs using a huge First valkue within object streams

New Feature

[PDFBOX-4721] – Move Apache PDFBox from a low-API model

Improvement

[PDFBOX-4734] – ExtractImages should create CCITT G4 compressed TIFF files when possible
[PDFBOX-4735] – WriteDecodedDoc should create an xref table instead of an xref stream
[PDFBOX-4762] – Inconsistent handling of incorrect data
[PDFBOX-4766] – PDInlineImage.getSuffix() returns null
[PDFBOX-4779] – PDFBOX: Update Bouncy Castle Crypto to version 1.64

Task

[PDFBOX-4757] – Enable as much PDAcroFormFlattenTest tests as possible
[PDFBOX-4759] – Add tests for PDFBOX-4153 and PDFBOX-4490

Sub-task

[PDFBOX-4731] – Support RenderDestination
[PDFBOX-4732] – Support ImageType

Download