In this video, Joshua Tallent explains what metadata is and how it relates to book publishing. While a common definition of metadata is "data about data", Joshua goes deeper than just that, with examples of how we interact with metadata on a daily basis, even when we don't realize it. He also introduces students to the ONIX standard, and explains why it was initially created.
- There's a great scene in the movie The Matrix by the Wachowskis, where Morpheus explains to Neo what the Matrix is. He says, "The Matrix is everywhere. "It is all around us. "Even now in this very room. "You can see it when you look out your window "or when you turn on your television. "You can feel it when you go to work. "When you go to church. "When you pay your taxes." That description is actually a pretty good description of data. We absorb data everyday from sources all around us. We look outside and see the sun and clouds and compare that to what the weatherman says and try to discern how the weather will be today.
We look at the ingredients label on our cereal and try to decide if the sugar content is too high. Like the Matrix, though, the data we see and interact with goes much deeper than what we actually can see. In our modern world we interact with the data that is beneath the surface, the data about data, on a daily basis. If you go to Google and search for something you're interacting with data about data. Websites usually have keywords hidden beneath the surface of the content in the code where you're not going to see it. Those keywords are collected by Google and used to help you find that page that you're looking for when you search for the keyword.
This is what we call metadata. It is data that describes other data. Because the website is the data itself the keywords are the metadata. Again, you deal with metadata on a daily basis, and you may not even think about it. When you play some music on your phone and look at the song title or the artist name, that's metadata. When you go to the library to look at an old-fashioned card catalog to find a book, that's metadata as well. Metadata is important for publishing and it always has been. From the earliest days of books we have needed ways to know what data was stored inside a folio and how to find it.
Subject indexes are metadata because they describe the data in the book. Chapter and verse numbers in the Bible are metadata. They were not in the original text, but they were added to make the text easier to navigate. Therefore, they are data that describes the data. The most prominent form of metadata in publishing, though, is the data that we store outside the book itself that describes what a book is about and how to purchase it. This metadata has become the lifeblood of the publishing industry, the primary marketing and discovery mechanism for consumers. Book metadata comes in a variety of types and formats.
We will dig into the details of different metadata elements in later movies, but let's take a look at one example just to get a glimpse of the different metadata elements. Here's an Amazon page from a sample book. On this page we can see a variety of different metadata elements. We can see the title of the book, the authors, the cover image, the description, and the price. I mentioned card catalogs before as an example of book metadata. Before the advent of computers and the internet, card catalogs were the most common metadata source available.
Librarians painstakingly cataloged information about the books in their collections and insured that the data was correct. On the retail side, book retailers would receive some information about books from publishers and distributors, but often that information was very basic and had to be supplemented by information created by the retailers themselves. With the arrival of the internet online book sales became not only possible but commonplace. Retailers started looking for better ways to get information about books from publishers, and publishers started looking for ways to insure that the metadata about their books was correct and up to date.
Initially this was done with spreadsheets and other proprietary formats. However, it was clear that there was a need for a single consistent metadata format that could be created by publishers and ingested by retailers. In the year 2000 this lead to the creation of the Online Information Exchange format, commonly known as ONIX for books. ONIX for books is an XML format that can be used to deliver metadata about a book. It was developed by an organization in Europe called EDItEUR with input from a committee of book publishers, retailers, and distributors.
The ONIX for books standard has been updated many times since the year 2000, adapting and adjusting to the changing needs of publishing metadata. The most current version of the standard is ONIX 3, but many publishers and retailers are still using ONIX version 2.1.
- Understanding XML markup and the ONIX file structure
- Assigning subject categories in the metadata
- Adding keywords
- Adding descriptive text such as book descriptions, bios, and reviews
- Providing height, width, weight, page count, and other physical specifications
- Specifying price
- Adding images and ebook data