New EU accessibility laws mandate that digital books meet specific standards for readers with disabilities, with three principal axes: informing the reader, allowing reading with alternative rendering (such as audio and braille), and enabling visual layout adjustments to meet individual needs. Accessibility of new releases can be achieved by updating the workflow, but updating an extensive collection of backlist titles is a daunting task. Before engaging and to avoid discouragement, it is necessary to gain a better understanding of the accessibility of the collections. This step is also an excellent opportunity to start complying with the legislation by providing means to inform people about the accessibility of the ebook. The SOLIMAN project enabled us to explore this path and make significant advancements in open-source rules and tools to address extensive collections.

Rising to the challenge in a heritage collection

A situation and gap analysis of European collections was studied in 2024 by the ABELab project at a European level. The Final report pointed out two crucial challenges. First, the European backlist is not homogeneous, meaning that extracting statistics from the files is needed to efficiently slice extensive collections into sets of ebooks with similar needs. Secondly, there are too few open-source tools and documentation available for automation, resulting in redundant efforts when selecting and evaluating the quality of remediation operations.

The heritage mission of the FeniXX organisation, entrusted to it by both publishers and the Ministry of Culture since 2014, has resulted in the digitisation and marketing of the collection of 20th-century unavailable books, listed in the ReLIRE register published by the Bibliothèque nationale de France. The project, supported in particular by the Centre National du Livre, has enabled nearly 98,000 works to be republished in digital form and made available to readers and library users in France and abroad through booksellers. Digital production and sales of these works were completed in the first quarter of 2023.

With this mission in mind, from the project’s inception in 2025, FeniXX selected EPUB 3 as the publication format, in line with the recommendations made by the W3C. 71% of the Livres Indisponibles catalogue is now available in EPUB 3 format.

FeniXX also addressed the issue of digital accessibility in 2019. At that time, the company took steps to integrate part of the WCAG standards into its production workflow to achieve a first level of native accessibility on its EPUBs.

Due to its size and turnover, FeniXX is exempt from the decree relating to the accessibility of products and services for people with disabilities (Art. D. 412-60). However, due to the heritage nature of its catalogue and its initial mission, FeniXX would like to be able to make all the necessary adaptations without incurring a disproportionate burden in terms of accessibility.

The FeniXX collections have the particularity of being generated from an automated process; although two periods have produced different files, a quality baseline can be easily established. The same Stylesheets are powering the display of all ebooks, the funders logos are the same in each book, decorative images have been properly tagged with a role presentation on one of the two batches and a standard “Illustration” alt text was added to meaningful images waiting to be adequately described the day technology and budget would be available.

Initial analysis

Initially, we applied the ABELab analysis method described in the Gap Analysis document (available online in English), which enabled us to position the studied corpus in relation to European collections.

The remediation complexity indicators established by ABELab indicate that the Fenixx corpus scores between 4 and 56, with the majority of files falling within the range of 4 to 21. In contrast, the ABELab samples presented scores ranging from 4 to 77, with the majority of files falling between 10 and 26. Fenixx samples are therefore in the lower range compared with the study of samples representing European collections. These indicators allow comparison with the ABELab corpus. However, to better define the remediation difficulty of the FENIXX corpus, some indicators were not significant based on the detailed knowledge we could establish from studying the corpus.

  • 49% of the collection lacked accessibility metadata, even with the state-of-the-art inference rules.
  • 29 % of the collection had more than 20 probable erroneous language information for sentences with more than two words.
  • 10% of the collection lack proper identification on table columns and row headings.

More detailed knowledge of the corpus enables us to identify additional functional elements that enhance our understanding of the corpus. The probably most interseting additional indicators calculated are the number and identification of images including the number of pictures per book; the number of files containing a single or two images, the number of files containing identified recurring images (like logos), the number of images whose alternative text can be considered significant, i.e. containing an alternative text of more than one word, the number of images explicitly declared as decorative.

The corpus comprises 1,029,739 images, of which 7% are covers, 6% are the funders’ logos, 1% are marked as decorative, and 6% have alt text with more than one word, making them long enough to be considered significant. That means that a total of 20% of images do not require remediation.

On the other hand, 49% of the corpus has fewer than three images, and 28% have more than 100 images. Crossing both indicators, we understand that almost half of the corpus is readable as text, allowing it to be rendered in synthetic voice and braille. A quarter containing less than 11 images should be quick to enhance with meaningful image descriptions, while the last quarter will require more significant effort.

Finally, an expert analysis of the CSS revealed no particular blocker to display transformability, at least for applications using Readium CSS. A detailed list of potential improvements was compiled to enhance the user experience.

Achievements: Open-Source Tools and Smarter Metadata

With this data in mind, a development roadmap was set. It was chosen to enhance an existing tool, already used in parts of the industry, instead of building a separate one that would have needed to be adopted. The Readium Go Toolkit was the starting point, but those features were grouped into a separate toolkit, called Readium CLI, a set of command-line utilities that address a wide range of use cases.

In practical terms, the Readium CLI ingests EPUB files and generates a Readium Web Publication Manifest (RWPM), an open format meant to represent and distribute publications. The process also extracts and infers information like accessibility metadata and details on images (like size and identification) that can be used for optimisation and compatibility, and helps identify decorative ones.

A complementary writing tool will be available to leverage the analysis and inferences to the next remediation level. This tool enables users to update EPUB files and generate a human-readable list of attention points to be addressed in subsequent remediation operations.

By the end of the project, we expect 50% of the collection to be explicitly accessible files. The other half will receive updated metadata indicating what’s missing, and FeniXX will have accurate data to consider the next remediation step.

Designed for automation and integration, the Readium CLI is ideal for developers and content creators who want to automate publishing pipelines or enhance accessibility workflows. De Marque, which distributes FeniXX collections through Eden Livres in France, has already implemented this enhancement, meaning those improvements will benefit FeniXX collections and all EPUBs distributed by De Marque.

A global impact will be challenging to evaluate as the tool is open-source and does not centralise statistics.

Legacy: A Tool and a Blueprint for the Publishing Industry

By developing open-source tools for analysing and updating accessibility metadata, FeniXX is moving closer to meeting EU accessibility standards for its vast collection of 20th-century digitised works. Project SOLIMAN marks a substantial advancement in assessing and informing digital book accessibility. It provides an immediate benefit and opens paths for future work.

On the front side, it’s broadening access to literature by making the accessibility information of this extensive collection available. This collection comprises a digital library of approximately 70,000 books rendered in formats tailored to meet diverse user needs, allowing users to easily identify which ebooks in this collection are best suited to their needs. It’s a direct benefit for individuals with disabilities or specific reading requirements.

On the other side, FeniXX knows what efforts are needed to make more ebooks from the collection fully accessible and, therefore, how to prioritise the next steps.

Because the tool is available as open source and already used by major industry players like De Marque, both aspects are impacting not only FeniXX collections but also a significantly larger number of ebooks, and the effect will continue to spread over the coming years. We expect a significant adoption of the tool by prominent heritage preservation actors, such as national libraries, and hope this will come with further enhancements to the Readium CLI or derived solutions.

FeniXX’s dedication to safeguarding cultural heritage is demonstrated through this effort, as it actively works to eliminate barriers to digital content and guarantee that a larger and more diverse readership can fully experience and appreciate its rich literary resources. This undertaking highlights the vital role of inclusive design in fostering equitable access to knowledge and culture in the digital era.

0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*

Copyright © 2023 EDRLab. Legal informations

Log in with your credentials

Forgot your details?