Journals
Elsevier Backfiles on ScienceDirect
- Introduction
- Background
- Re-scanning project
- Diminishing missing issues
- What does this mean for customers?
- Report 7: available soon
- Report 6
- Report 5
- Report 4
- Report 3
- Report 2
- Report 1

Backfiles on ScienceDirect – Improving the quality of images and missing pages and missing issues – Progress Report 1
Introduction
The Elsevier Backfile collections on ScienceDirect have been very well received by customers and users and have brought many valuable research articles back into use in current day research. In addition, many of our customers see major advantages in being able to remove printed issues from the shelves if they are electronically available, thus freeing up valuable shelf space for print-only publications.
However, some customers who have bought Backfile collections over the last seven years have been notifying us of missing pages and, in particular, poor scanned images. In the last year we have received an increasing number of notifications and understandably some dissatisfaction. This has triggered us to look into the problem in a more structural way - rather than on an individual basis as was possible previously - in order to rectify the situation as quickly and efficiently as possible.
Background
This Backfile project is the biggest digitization project of its kind in the STM world and has spanned more than seven years since summer 2000. Most publishers were skeptical and not convinced of the need for such back file content and so watched and waited for several years before initiating their own programs and learnt from our experience. For example, when we first started the project resolution levels used were at 300 dpi, and produced black and white, which for text is perfectly good but less so for images. To include grey scale and color would have been impossible as storage requirements at that time would have been astronomical and rendering far too slow for users as the internet speeds even six years ago were considerably slower than today. In 2004, we were able to implement new technology to produce much higher quality images and include the full use of color and grey scale without compromising performance or storage requirements. Thus, the quality of many of the earlier packages was inferior to post 2003 Backfiles, especially in certain fields where images are so important. In addition, many of the older journal articles were scanned from microfiche, which was of fairly low quality even in the 1970s, let alone pre 1950s. Since then we have sourced print copies and rescanned and replaced many poorly scanned pages, based on feedback from users. In some cases we have simply rescanned the entire journals e.g., Brain Research, Neuroscience and Icarus.
Re-scanning project
Obviously we cannot go back and redo the entire project with the improved technology as this would be prohibitively costly, but we are determined to improve the user experience. Thus we want to isolate poor scans from the more than 40 million pages without massive manual intervention. We are currently running experiments on sample data to see if we can automate the recognition process and help speed up the project. The initial goal is to get a focused list of metadata indicating the problem pages. Following this we will work with our suppliers to plan the rescanning of such pages.
Diminishing missing issues
This has been a big issue for both customers and for us. When we started to source the near 200,000 issues for this project we did not have easy access to the majority of the issues involved. We have searched every office, meeting room, basement, warehouse and office corridor in every Elsevier office all over the world. In many cases we only had microfiche copies which, as describe above, has sometimes been of extremely poor quality. Many editors and librarians worldwide have leant us their sources to be non-destructively scanned and processed. Since 2002 we have run several projects to source missing issues in national libraries in The Netherlands and in the UK. We’re happy to report that we have made major progress on this front, in 2001 it was more than 2% missing issues, since then we have sourced and processed more than 2,500 such issues so that we are now missing less than 0.5% and every month this continues to improve.
What does this mean for customers?
Customers will start to see a marked improvement over time in all aspects of image quality and continued diminishment of missing content. Customers will not incur any charges for these improvements. It goes without saying that whilst we are very proud of our achievements so far, we recognize that we have quite a long way to go before good becomes much better! In the meantime, thank you for your feedback and for your patience.

