Digital Preservation — Benefits and Risks of PDF/A-3

Digital Preservation — Benefits and Risks of PDF/A-3

The PDF/A-3 standard (ISO 19005-3:2012)  defines a file format based on the portable document format (PDF) to provide a mechanism for representing electronic documents in a manner that preserves their static visual appearance over time, independent of the tools and systems used for creating, storing, or rendering the files. However, preservation of the files’ static visual appearance is only possible if conforming PDF/A files are complete in themselves and require no external resources (e.g. unembedded fonts) to render their pages properly.

In a somewhat radical departure from its predecessor, PDF/A-2 (ISO 19005-2:2011) , PDF/A-3 permits the embedding of files of any format (including XML, CSV, CAD, images, binary executables, etc.) within a PDF/A file and does not require embedded files to be considered archival content. Further, a PDF/A-3 conformant reader is responsible for presenting only the primary document and permits the extraction of embedded files for use with other tools.

The U.S. National Digital Stewardship Alliance (NDSA)  charged a Working Group to investigate the PDF/A-3 standard. Specifically, the Working Group researched the pros and cons of using PDF/A-3 in different preservation scenarios, including use as an extension to PDF/A-1 (ISO 19005-1:2005)  and PDF/A-2 in circumstances for which those formats have been adopted or recommended, and use as a wrapping or bundling format for various digital asset/media types, such as textual, audio, video, photo, and GIS data.

In The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions, the Working Group provides a comprehensive assessment of the possibilities, risks, and general pros and cons of PDF/A-3 and the scenarios in which it might (or might not) be appropriate. The report also presents a general scenario for embedding supporting data in PDFs for scholarly documents and several scenarios specific to particular contexts or institutions such as a U.S. National Archives and Records Administration (NARA) scenario for using PDF/A-3 as an acceptable container to circumvent DoD 5015.2 records management application restrictions.

Conclusions

The conclusions address the following topics.

  1. PDF/A-3’s appropriateness – Its appropriateness for the long-term preservation of content depends heavily on three factors: the type of content, the nature of the workflow that created it, and whether the archival submission process allows for detailed negotiation on allowable formats for embedded files.
  2. PDF/A-3 and workflows – PDF/A-3 may be most appropriate for use in controlled workflows but may not be an appropriate choice as a general-purpose bundling format. However, the PDF Association’s proposed creation of “a free and open source PDF validation tool might mitigate the long-term preservation risks constituted by the complexity of the PDF/A format as a bundling format. Absent such robust validation tools, conversion of PDF files to PDF/A in preservation workflows remains a somewhat problematic preservation tactic.” (page 19)
  3. Additional tools – If the preservation community agrees PDF/A-3 is inappropriate as a general purpose archival bundling format, the community will need to identify and/or create tools to allow complex digital objects to be bundled with metadata to establish the relationship among the components in a bundle.
  4. Archival institutions’ policies – The manner in which archival institutions will treat embedded files depends on “the context for creation, the expressed relationships that embedded files have to the primary document, the expectation of future users, and an archival institution’s policies.” (page 19) The Working Group recommends that archival institutions treat PDF/A-3 separately from other PDF/A versions in preference lists and for action plans.
  5. Future standards development role – The report illustrates that the arbitrary embedding of files is a problematic feature of PDF/A-3. Consequently, the Working Group suggests the ‘community of memory institutions’ may need to take “a more strategic, active, and vocal role in the standards development process” (page 19) in the future to avoid the introduction of similarly problematic new features.

The report is recommended reading for organizations (particularly archival institutions) planning to use the PDF/A-3 standard.

Leave a Comment

Your email address will not be published. Required fields are marked *

Related Posts

archives

PDF/A-3 Standard published

Since the International Organization for Standardization (ISO) published the first portable document format/archive (PDF/A) standard in 2005, PDF/A has been heralded by recorded information management professionals and archivists as a mechanism for the secure, long-term preservation of electronic

Read More »
  • About the Author

  • Sheila Portrait
    Sheila Taylor
  • Sheila Taylor is a well known consultant, educator, speaker and writer with more than 25 years of experience in the information management (IM) field.

  • Recent Tweets

  • Company News

  • Search Site

  • Archives By Date

  •  Telephone

     

    (905) 702-8756
    1-877-857-7111

     

    Email

     

    info@eimc.ca

    Request A Call

    Case in Point

    That's A Lot of Records!
    Often the requirement for a needs assessment is driven by a specific initiative being considered or an immediate problem to be solved, rather than a general desire to establish a corporate (or organization-wide) IM program. We had a client wanting to improve its management of a specific group of critical records – thousands of member files in paper, microform and digital formats containing hundreds of unique document types.
    Assess, Plan and Schedule
    Ergo reviewed the organization’s current practices for managing those records, compared those practices to best practices, and identified risks and areas for improvement. From there we developed a strategic plan with a focus on records storage and retention. The plan identified the operational, financial and technological requirements for implementing the recommended changes, improvements and enhancements in the lifecycle management of the member records. Activities in the plan were classified as short term (next 6-12 months), medium term (next 12-24 months) and longer term (next 25+ months).
    Step by Step Success
    Implementation of the strategic plan enabled this organization to ensure its member records are properly identified, organized, accessible, protected and retained as long as necessary to meet operational and other requirements.
    Previous slide
    Next slide