The PDF/A-3 standard (ISO 19005-3:2012) defines a file format based on the portable document format (PDF) to provide a mechanism for representing electronic documents in a manner that preserves their static visual appearance over time, independent of the tools and systems used for creating, storing, or rendering the files. However, preservation of the files’ static visual appearance is only possible if conforming PDF/A files are complete in themselves and require no external resources (e.g. unembedded fonts) to render their pages properly.
In a somewhat radical departure from its predecessor, PDF/A-2 (ISO 19005-2:2011) , PDF/A-3 permits the embedding of files of any format (including XML, CSV, CAD, images, binary executables, etc.) within a PDF/A file and does not require embedded files to be considered archival content. Further, a PDF/A-3 conformant reader is responsible for presenting only the primary document and permits the extraction of embedded files for use with other tools.
The U.S. National Digital Stewardship Alliance (NDSA) charged a Working Group to investigate the PDF/A-3 standard. Specifically, the Working Group researched the pros and cons of using PDF/A-3 in different preservation scenarios, including use as an extension to PDF/A-1 (ISO 19005-1:2005) and PDF/A-2 in circumstances for which those formats have been adopted or recommended, and use as a wrapping or bundling format for various digital asset/media types, such as textual, audio, video, photo, and GIS data.
In The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions, the Working Group provides a comprehensive assessment of the possibilities, risks, and general pros and cons of PDF/A-3 and the scenarios in which it might (or might not) be appropriate. The report also presents a general scenario for embedding supporting data in PDFs for scholarly documents and several scenarios specific to particular contexts or institutions such as a U.S. National Archives and Records Administration (NARA) scenario for using PDF/A-3 as an acceptable container to circumvent DoD 5015.2 records management application restrictions.
Conclusions
The conclusions address the following topics.
- PDF/A-3’s appropriateness – Its appropriateness for the long-term preservation of content depends heavily on three factors: the type of content, the nature of the workflow that created it, and whether the archival submission process allows for detailed negotiation on allowable formats for embedded files.
- PDF/A-3 and workflows – PDF/A-3 may be most appropriate for use in controlled workflows but may not be an appropriate choice as a general-purpose bundling format. However, the PDF Association’s proposed creation of “a free and open source PDF validation tool might mitigate the long-term preservation risks constituted by the complexity of the PDF/A format as a bundling format. Absent such robust validation tools, conversion of PDF files to PDF/A in preservation workflows remains a somewhat problematic preservation tactic.” (page 19)
- Additional tools – If the preservation community agrees PDF/A-3 is inappropriate as a general purpose archival bundling format, the community will need to identify and/or create tools to allow complex digital objects to be bundled with metadata to establish the relationship among the components in a bundle.
- Archival institutions’ policies – The manner in which archival institutions will treat embedded files depends on “the context for creation, the expressed relationships that embedded files have to the primary document, the expectation of future users, and an archival institution’s policies.” (page 19) The Working Group recommends that archival institutions treat PDF/A-3 separately from other PDF/A versions in preference lists and for action plans.
- Future standards development role – The report illustrates that the arbitrary embedding of files is a problematic feature of PDF/A-3. Consequently, the Working Group suggests the ‘community of memory institutions’ may need to take “a more strategic, active, and vocal role in the standards development process” (page 19) in the future to avoid the introduction of similarly problematic new features.
The report is recommended reading for organizations (particularly archival institutions) planning to use the PDF/A-3 standard.