Log In?

Web Services for Programming with ISBNs

Published at 22:02:42-0700 Updated
Tags: chinese, translation, computing

Introduction

Paper Republic (PR) is a charitable organization which promotes Chinese literature in translation. PR's founder, a well-known translator named Eric Abrahamsen, doubles as the organization's sole technical staff member. He recently recruited me to assist with creating new platform functionality permitting PR website users to add to a database of Chinese literature. I'll provisionally refer to this new feature as "Book Lookup."

As currently designed, Book Lookup will ask the user to type in an ISBN. After being validated, the ISBN will be submitted to internal or external lookup services, through which Book Lookup can automatically populate the PR database with verified information.

The material here concentrates on resources which will be used to build Book Lookup. Thank you, Eric, for agreeing to let me publicly draft and release this writeup! I can't yet discuss Book Lookup's own design or implementation -- because they haven't happened yet -- but I hope to return to those topics in the future.

Goals

  1. Research and list web services which provide bibliographic data.
  2. Characterize each service's capabilities and drawbacks.
    • Who runs the service?
    • What information does the service return?
      • Author / Editor / Contributors
      • Year of publication
      • Place of publication
      • Bookbinding
      • Versions / Printings
      • Cover image(s)
    • What data formats are available?
    • Any support for ingesting multiple ISBNs in one request?
    • Free? Free with a rate or usage limit? Paid?
    • Does the service provide any speed guarantees?
  3. Assess each service's suitability for integration into Book Lookup's backend.
How this benefits you: documents services which can save you time when writing (web) programs to work with ISBNs and other book data.

Results and Recommendations

Of the major services surveyed, Open Library's API, or a dump of the data backed by that API, edges out Google Books as the first choice for providing Book Lookup's core backend functionality.

  1. All the desired bibliographic data is on offer.
  2. Multiple ISBNs can be served in a single request.
  3. OL's crawling restrictions and rate limits shouldn't be a problem for Book Lookup's low-volume usage.
  4. If user interest grows, switching the backend routing to serve Book Lookup requests from an OL data dump instead of the API will require minimal new code, eat less than 10 GB of local storage, and likely speed up request-response latency.
  5. The monetary cost is nil.
  6. The service provider is sufficiently famous, stable, and well-funded that there's little danger of them disappearing overnight. [That said, it may be good form for PR to calculate expected annual API usage, then make an appropriate donation to Open Library.]

While the Internet Archive has Python API bindings for its main service, archive.org, those don't appear to be able to interact with Open Library itself. Thus, one early step toward implementing Book Lookup will be to write a small library wrapping the necessary OL API calls. If this library is sufficiently reusable, PR should consider contributing it back to Open Library's development community as a gesture of goodwill.

Finally, if OL is missing a particular work, or edition thereof, Book Lookup should fall back on Google Books, which indexes roughly twice as many works as OL, and has equally good metadata. Adding this fallback option shouldn't be too much extra work, given the existing Python bindings for Google's APIs. (For what it's worth, OL's API is compatible with Google Books' Dynamic Links API.)

Research

Open Library

URL: https://openlibrary.org

Service Provider: the Internet Archive.

Summary: It seems that Open Library's data will cover most or all of Book Lookup's needs, though we may need to host our own copy of a data dump to avoid pestering them with too many API requests.

Available Data: Information for developers is available here. Open Library's RESTful API is documented here. The endpoints are split into several sub-APIs for particular types of data, with "books" being the main one. Querying the books API with parameter jscmd=data returns the following (and a few more things):

[It is also possible to get book dimensions as one of the response keys if the request specifies jscmd=details, but that request parameter is considered unstable and may change, and furthermore the provided dimensions seem questionable. If you search "physical" on the Books API overview, you will see that the example response describes a book as "1 x 1 x 1 inches." Does that mean a cube 1 inch on the side, or does it mean 1 foot by 1 foot by 1 inch? The former is of course absurd, but that result is less clear than the dimension result given by Google Books (see below.)]

The covers API also provides author photos, accessible by Open Library ID (OLID). The Don Knuth photo used by OL to document this feature can be found here. Key quotes from the documentation follow.

Costs and Restrictions: The service is free. Open Library's API should not be used for bulk downloads, and there are some additional restrictions on use of the covers API.

Although Open Library disallows API crawling and bulk downloading, application developers can obtain all of OL's data by downloading a dump. Dumps come in the form of tab-separated files, and are updated monthly. The "all types" dump (latest versions of editions, works, and authors data) measures 8 GB.

On a "future expansions" note, Open Library mentions here that the Library of Congress modified the Internet Archive's Book Reader to sit perfectly within their Rare Books Collection site. If PR was so inclined, they could use the same tactic to provide a "read on site" feature for any work in PR's database (so long as it had an Open Library entry.)

Book Database Online

URL: https://bookdatabase.online

Service Provider: Publisher Services, a division of Bar Code Graphics, Inc.

Summary: Very poor results. Searching "harry potter" at the main page gave me three versions of The Ultimate Fan Guide to The Wizarding World of Harry Potter, and absolutely nothing else. Those results included an ISBN for the book, but a second search for that exact ISBN returned no results. Not worth further investigation.

Google Books

URL: https://books.google.com/

Service Provider: Google

Summary: This seems like the second-most useful service to build against after Open Library. OL has the advantage of providing data dumps, but Google may have the advantage in number of works indexed.

Available Data: This 2019 press release claims that more than 40 million books have been scanned into the Google Books service. Judging from the API overview, the bibliographic metadata available through the API seems to rival that provided by Open Library, with the addition of detailed dimensions information (see the example API response for individual volumes.) Data can be retrieved by ISBN with a GET request to a URL of the form: https://www.googleapis.com/books/v1/volumes?q=isbn:{ISBN}.

The API's performance, and thus Book Lookup's performance, can be improved by doing a partial API request for only the fields needed to populate PR's database.

Google has a library of Python bindings which works with several of its APIs, including Google Books.

Costs and Restrictions: The Terms of Service. In addition, note that the API overview says it's in its first version, and is experimental. Were it change behind PR's back, the Book Lookup code would need adjustment, and Google is notorious for discontinuing services -- even popular ones -- without much warning.

ISBN DB

URL: https://isbndb.com

Service Provider: Price Network, Inc.

Summary: Though it provides a good selection of metadata and variety of works, as a paid product, ISBN DB is a less attractive option than OL.

Available Data: The API details are presented here. The available info is much the same as that offered by Open Library, although I don't believe the "dimensions" value (one of the results returned by the /book/{isbn} API endpoint) is available through OL.

Costs and Restrictions: ISBN DB is a paid product, as detailed here. The basic plan, which would be sufficient for PR's needs, costs $9.95 USD/month. Discounts are available for nonprofit institutions, but it would likely still cost some amount of money.

WorldCat

URL: https://www.worldcat.org/

Service Provider: Online Computer Library Center (OCLC)

Available Data: WorldCat's primary developer offering is its Search API, the endpoints of which are documented here. For Book Lookup purposes, the most useful endpoint is "GetByISBN"; unfortunately, that interactive API demo didn't work correctly on my computer, which makes me faintly suspicious of the service's quality.

Search API results are available in two formats: MARC XML, or Dublin Core. Dublin Core doesn't seem to allow for many types of metadata, and is probably not useful for our purposes. As an XML subtype, MARC XML is likely to have a ton of granularity, but may be harder to work with than the JSON from OL's API.

The API can also return information in standard bibliographic citation formats (APA, Chicago, Harvard, MLA, and Turabian), which might be a minor convenience for building webpages to view PR's database, but is not applicable to Book Lookup itself. Finally, WorldCat's API can tell you which libraries are holding a given item. That could be quite useful for PR members, but, again, that's not needed for Book Lookup itself.

Note that WorldCat provides some other, smaller sets of data through its "metadata APIs," listed here, which don't seem to be useful for Book Lookup's purposes.

Costs and Restrictions: It seems that use of the API is limited to libraries that maintain both WorldCat Discovery and OCLC Cataloging subscriptions, and partners interested in integrating WorldCat data into their software. Paper Republic would have to become an OCLC member -- specifically, a consumer-services partner -- to use the API. Membership would cost money, and the extent of the pricing info available on the website is "contact us," which is code for "expensive." For this reason, I won't dig any further into the API Terms of Service.

Library of Congress

URL: https://loc.gov

Service Provider: Library of Congress (LOC)

Available Data: "Library of Congress for Robots" gives an overview of LOC's machine-readable resources. The ones most relevant to Book Lookup are the JSON API (still in beta) and the MARC Open-Access resources.

The Library of Congress's online reading service, https://read.gov/books, is not relevant to Book Lookup.

Openbook4wordpress

URL: https://code.google.com/archive/p/openbook4wordpress/

Service Provide: none (OSS code)

Summary: Not relevant to Book Lookup, as PR isn't built with Wordpress -- but your site might be!

When you insert an OpenBook shortcode with an ISBN or other book number in a WordPress post or widget, the OpenBook plugin replaces it with a book cover image and other book data from Open Library. It also adds links to popular book sites, such as WorldCat, LibraryThing, GoogleBooks and BookFinder. Users have complete control over the content and styling of the display through templates. Librarians can configure OpenBook with an OpenURL resolver to point to their library's records.

World Digital Library

URL: https://www.wdl.org/en/

Service Provider: World Digital Library

Available Data: The API is documented here. The standard OpenSearch protocol may be used to search the World Digital Library and retrieve results in an XML format. Many of the items appear to be provided by the Library of Congress, and much of what is available is visual art, but hey: check out these China-related materials!

Summary: Given the overlap with LOC, and the focus on visual art and photography, www.wdl.org doesn't seem relevant to implementing Book Lookup.