What is an EPUB file?
EPUB is an open file format for electronic publications. By publication, we usually mean book, but the scope is wider and includes long-form articles, digitized comics etc. For the sake of simplicity, we’ll use the term ebook in this documentation. By open file format, we mean that the specification is in free access on W3C website, and that developers can freely create applications for generating or reading EPUB files (.epub), now and in the future.
EPUB is based on Web Standards: it defines a means of representing, packaging and encoding structured and semantically enhanced Web content — including HTML, CSS, SVG, images, and other resources — for distribution in a single-file format.
Using open Web Standards in EPUB brings many advantages to the publishing industry:
- By essence, Web Standards are interoperable, meaning they aim at being usable on any kind of device; so is the EPUB standard.
- Developers of the EPUB specification benefit from the work of the entire Web community. As an example, ebooks accessibility is leveraged by the work done by the W3C on the subject [WAI].
- Developers of EPUB authoring solutions can create these tools as variants of Web authoring solutions.
- Developers of reading applications can (and should) use as core for their rendering engine an off-the-shelves Web browser engine.
In short, using Web Standards, the publishing industry avoids reinventing the wheel … note that it must still adapt this “wheel” to the chapters and pages of ebooks, i.e. the electronic version of a codex.
EPUB is today the reference format for distribution and interchange in the digital publishing industry, as it allows publishers to produce and send a single digital publication file through multiple distribution workflows and offers consumers a great deal of interoperability between software/hardware, for reflowable and fixed-layout ebooks.
A brief history of the EPUB format
EPUB had its roots in the interchange format known as the Open EBook Publication Structure (OEBPS). OEBPS 1.0 was approved in 1999 by the Open eBook Forum, an organization that later became the International Digital Publishing Forum (IDPF). Subsequent revisions 1.1 and 1.2 were approved by the IDPF in 2001 and 2002 respectively.
It was realized that a need existed for a format standard that could be used for delivery as well as interchange, and work began in late 2005 on a single-file container format for OEBPS, which was approved by the IDPF as the OEBPS Container Format (OCF) in 2006. Work on a 2.0 revision of OEBPS began in parallel which was approved as the renamed EPUB 2.0 in October, 2007, consisting of three specifications: Open Packaging Format (OPF), Open Publication Structure (OPS) and the previously defined OCF.
EPUB 2.0.1, a maintenance update to the 2.0 specification intended to clarify and correct errata in the specifications, was approved in September, 2010.
EPUB 3.0 superseded EPUB 2.0.1 in October, 2011, and in June 2014 appeared a maintenance release named EPUB 3.0.1 (often written 3.01). EPUB 2.0.1 was then considered obsolete and no longer maintained.
EPUB 3.1, was released in January 2017, just before the IDPF merged into the W3C. This version wasn’t 100% backward compatible with EPUB 3.0.1. and for this reason the publishing industry did not adopt this version.
EPUB 3.2, was released in May 2019 and is the result of the work of the W3C EPUB 3 Community Group. This version is 100% backward compatible with EPUB 3.0.1. and this is the current version of the specification.
Features specific to EPUB 3
Here’s a list of what have been added and/or improved in EPUB 3:
- HTML5: EPUB 2 supports XHTML 1.1 and DTBook. With the support of the XML flavor of HTML 5 in EPUB 3, it is now possible to use more detailed semantic markup (e.g. use <section>, <aside>, <figure>).
- Semantic Inflection: a new epub:type attribute, when added to HTML 5 markup, defines the precise nature of structural markup, in line with book semantics.
- Audio and video: EPUB 2 has support for raster images only. Thanks to HTML 5, EPUB 3 publications can reference audio or video assets via the <audio> or <video> tags, and therefore audio and video assets can be natively processed by modern browser engines.
- Navigation: EPUB3 defines a new human-and-machine readable grammar for the navigation document, based on the HTML 5 <nav> element. It replaces the EPUB 2 .ncx file which now deprecated.
- SVG documents: they can now appear directly in the spine (they no longer need to be nested within an xhtml file).
- MathML: The XML markup language dedicated to the presentation of mathematical notations is now a first class citizen in EPUB publications.
- Content switching: it has been simplified by having its processing model defined so that it does not require document preprocessing.
- Linking: Linking schemes have been added. At the moment there’s only one available. Please refer to the Canonical Fragment Identifiers.
- Triggers: Trigger is an element included in HTML5 for EPUB that allows declarative bindings of activation events (such as “play”, “pause” for an audio event)
- Bindings: you can now script your own handles for uncommon media files.
- Fixed Layout: please refer to our chapter Reflowable vs Fixed Layout.
- Added modules from CSS3: it also includes alternate style tags, allowing the creation of custom viewing modes, such as day, night, etc…
- Media overlays: With the possibility of adding audio, EPUB includes a way to synchronize it with the text.
- Publication Metadata and Identity: a new but mandatory metadata has been added, dcterms:modified.
- Resource Metadata: there are new properties attributes on the Package Document, allowing the declaration of new metadatas about the resources.
- Text-to-speech: The possibility of a text-to-speech ebook is now implemented (using properties such as SSML attributes in XHTML content documents, the CSS3 Speech Module, etc.)
- Remote Resources: EPUB 3 added new restrictions to the resources not located in the container. Please refer to this page.
- Whitespace in MIMETYPE file: The restriction against trailing whitespace has been removed.
- Disallowed characters: the OCF list of disallowed characters has been extended.
Things that have been removed:
- Out-of-Line XML Islands
- Filesystem Container
- 2.0.1 meta element
Reflowable vs fixed-layout EPUB 3
The EPUB 2 format (2007) had no concept of fixed-layout; this notion appeared in EPUB 3.0.
You may open reflowable or fixed-layout EPUB files: what does this mean?
In a reflowable EPUB, the content is fluid and fits the size of the screen. If you read on a smartphone, you’ll get perfectly readable characters on small pages. Novels are usually published as reflowable ebooks because their layout is simple. But the reflowable format is not always ideal, especially when it comes to designs where the layout is critical, i.e. for:
1/ creating sophisticated layouts (such as art books)
2/ combining text and images (such as cookbooks)
3/ mixing text with audio and video (such as textbooks)
4/ creating interactive books, with embedded software to manage interactions.
A fixed-layout ebook defines a viewport, i.e. the size of the “page” in pixels, and usually contains many images exactly positioned relative to text. Some good examples of fixed-layout publications are photo books, magazines and comics. An electronic “page” is the strict rendition of a printed page : on a small screen, you’ll usually have to zoom and scroll in the page (often vertically AND horizontally) to read the text.
EPUB 3 support in reading apps
When it come to EPUB 3 and especially fixed-layout EPUB 3, reading systems have different skills. It is now quite common for reading applications on mobiles, tablets or PCs to support EPUB 3 with fixed-layout: this is especially the case for those based on the Readium toolkit. But this is not the case for most applications based on the Adobe RMSDK, which includes an obsolete Readium codebase and is only good at processing EPUB 2.
Legacy e-readers (specialized devices with a black and white, touch-less screen, low memory, low computing power) are also only capable of processing EPUB 2. But several modern e-readers process EPUB 3 and correctly support sophisticated text-oriented layout, combined with images. The user experience is still constrained by the black and white 256 shades of grey of the device: good enough for textbooks, but not for comics or children books. Multimedia and interactivity are impossible due to the lack of power of such devices. This makes e-readers good at processing a basic profile of EPUB 3.
We’re currently missing a clear indication of support for different “profiles” of EPUB 3 (including fixed-layout) on both bookseller platforms and reading applications. This plus the fact that many applications are still relying on the Adobe RMSDK and the associated Adobe DRM makes more difficult the raise of EPUB 3. Let’s hope that the multiplication of Readium based applications and the adoption of the LCP DRM by e-distributors will soon help solving this issue.
EDRLab is developing the Thorium Reader desktop application, free & open-source, as a way to promote the use of EPUB 3 and LCP worldwide.
- Getting started with EPUB (US National Center for Accessible Educational Material)
The Readium projects provide rock-solid, performant building blocks and applications for processing EPUB3 publications. EDRLab is participating to the Readium codebase maintenance and evolution.
Support for people wih print disabilities is a key part of our mission. We collaborate with European publishers and major inclusing organizations on the creation of a born-accessible ebook market. We also make sure that Readium projects take into account the assistive technologies used by visually-impaired users.