At Crossref and ROR, we develop and run processes that match metadata at scale, creating relationships between millions of entities in the scholarly record. Over the last few years, we’ve spent a lot of time diving into details about metadata matching strategies, evaluation, and integration. It is quite possibly our favourite thing to talk and write about! But sometimes it is good to step back and look at the problem from a wider perspective.
This year’s public data file is now available, featuring over 156 million metadata records deposited with Crossref through the end of April 2024 from over 19,000 members. A full breakdown of Crossref metadata statistics is available here.
Like last year, you can download all of these records in one go via Academic Torrents or directly from Amazon S3 via the “requester pays” method.
Download the file: The torrent download can be initiated here.
Earlier this year, we reported on the roundtable discussion event that we had organised in Frankfurt on the heels of the Frankfurt Book Fair 2023. This event was the second in the series of roundtable events that we are holding with our community to hear from you how we can all work together to preserve the integrity of the scholarly record - you can read more about insights from these events and about ISR in this series of blogs.
Crossref is undertaking a large program, dubbed 'RCFS' (Resourcing Crossref for Future Sustainability) that will initially tackle five specific issues with our fees. We haven’t increased any of our fees in nearly two decades, and while we’re still okay financially and do not have a revenue growth goal, we do have inclusion and simplification goals. This report from Research Consulting helped to narrow down the five priority projects for 2024-2025 around these three core goals:
We believe in Persistent Identifiers. We believe in defence in depth. Today we’re excited to announce an upgrade to our data resilience strategy.
Defence in depth means layers of security and resilience, and that means layers of backups. For some years now, our last line of defence has been a reliable, tried-and-tested technology. One that’s been around for a while. Yes, I’m talking about the humble 5¼ inch floppy disk.
This may come as surprise to some. When things go well, you’re probably never aware of them. In day to day use, the only time a typical Crossref user sees a floppy disk is when they click ‘save’ (yes, some journals still require submissions in Microsoft Word).
History
But why?
Let me take you back to the early days of Crossref. The technology scene was different. This data was too important to trust to new and unproven technologies like Zip disks, CD-Rs or USB Thumb Drives. So we started with punched cards.
Punched cards are reliable and durable as long as you don’t fold, spindle or mutilate them. But even in 2001 we knew that punched cards’ days were numbered. The capacity of 80 characters kept DOIs short. Translating DOIs into EBCDIC made ASCII a challenge, let alone SICIs. We kept a close eye on the nascent Unicode.
Breathing Room
In 2017 the change of DOI display guidelines from http://dx.doi.org to https://doi.org shortened each DOI by 2 characters, buying us some time. But eventually we knew we had to upgrade to something more modern.
So we migrated to 5¼ inch floppy disks.
At 640 KB per disk these were a huge improvement. We could fit around 20,000 DOIs on one floppy. Today we only need around 10,000 floppy disks to store all of our DOIs (not the metadata, just the DOIs). Surprisingly this only takes about 20 metres of shelf space to store.
The move to working-from-home brought an unexpected benefit. Staff mail floppy disks to each other and keep them in constant rotation, which produces a distributed fault tolerant system.
Persistence Means Change
But it can’t last forever. DOIs registration shows no sign of slowing down. It’s clear we need a new, compact storage medium. So, after months of research, we’ve invested in new equipment.
Today we announce our migration to 3½ inch floppies.
If it goes to plan you won’t even notice the change.