Three Emerging Digital Platforms for 2015

‘Twas a world of limited options for digital libraries just a few short years back. Nowadays, however, the options are many more and the features and functionalities are truly groundbreaking.

Before I dive into some of the latest whizzbang technologies that have caught my eye, let me lay out the platforms we currently use and why we use them.

  • Digital Commons for our institutional repository. This is a simple yet powerful hosted repository service. It has customizable workflows built into it for managing and publishing online journals, conferences, e-books, media galleries and much more. And, I’d emphasize the “service” aspect. Included in the subscription comes notable SEO power, robust publishing tools, reporting, stellar customer service and, of course, you don’t have to worry about the technical upkeep of the platform.
  • CONTENTdm for our digital collections. There was a time that OCLC’s digital collections platform appeared to be on a development trajectory that would take out of the clunky mire it was in say in 2010. They’ve made strides, but this has not kept up.
  • LUNA for restricted image reserve services. You and your faculty can build collections in this system popular with museums and libraries alike. Your collection also sits within the LUNA Commons, which means users of LUNA can take advantage of collections outside their institutions.
  • Omeka.net for online exhibits and digital humanities projects. The limited cousin to the self-hosted Omeka, this version is an easy way to launch multiple sites for your campus without having to administer multiple installs. But it has a limited number of plugins and options, so your users will quickly grow out of it.

The Movers and Shakers of 2015

There are some very interesting developments out there and so here is a brief overview of a few of the three most ground-breaking, in my opinion.

PressForward

If you took Blog DNA and spliced it with Journal Publishing, you’d get a critter called PresForward: a WordPress plug-in that allows users to launch publications that approach publishing from a contemporary web publishing perspective.

There are a number of ways you can use PressForward but the most basic publishing model its intended for starts with treating other online publications (RSS feeds from individuals, organizations, other journals) as sources of submissions. Editors can add external content feeds to their submission feed, which bring that content into their PressForward queue for consideration. Editors can then go through all the content that is brought in automatically from outside and then decide to include it in their publication. And of course, locally produced content is also included if you’re so inclined.

Examples of PressForward include:

Islandora

Built on Fedora Commons with a Drupal front-end layer, Islandora is a truly remarkable platform that is growing in popularity at a good clip. A few years back, I worked with a local consortia examining various platforms and we looked at Islandora. At the time, there were no examples of the platform being put into use and it felt more like an interesting concept more than a tool we should recommend for our needs. Had we been looking at this today, I think it would have been our number one choice.

Part of the magic with Islandora is that it uses RDF triples to flatten your collections and items into a simple array of objects that can have unlimited relationships to each other. In other words, a single image can be associated with other objects that all relate as a single object (say a book of images) and that book object can be part of a collection of books object, or, in fact, be connected to multiple other collections. This is a technical way of saying that it’s hyper flexible and yet very simple.

And because Islandora is built on two widely used open source platforms, finding tech staff to help manage it is easy.

But if you don’t have the staff to run a Fedora-Drupal server, Lyrasis now offers hosted options that are just as powerful. In fact, one subscription model they offer allows you to have complete access to the Drupal back end if customization and development are important to you, but you dont’ want to waste staff time on updates and monitoring/testing server performance.

Either way, this looks like a major player in this space and I expect it to continue to grow exponentially. That’s a good thing too, because some aspects of the platform are feeling a little “not ready for prime time.” The Newspaper solution pack, for example, while okay, is no where near as cool as what Veridian currently can do.

ArtStor’s SharedShelf

Rapid development has taken this digital image collection platform to a new level with promises of more to come. SharedShelf integrates the open web, including DPLA and Google Images, with their proprietary image database in novel ways that I think put LUNA on notice.

Like LUNA, SharedShelf allows institutions to build local collections that can contain copyrighted works to be used in classroom and research environments. But what sets it apart is that it allows users to also build beyond their institutions and push that content to the open web (or not depending on the rights to the images they are publishing).

SharedShelf also integrates with other ArtStor services such as their Curriculum Guides that allow faculty to create instructional narratives using all the resources available from ArtStor.

The management layer is pretty nice and works well with a host of schema.

And, oh, apparently audio and video support is on the way.

Advertisements

Using Timeliner with ContentDM

This tutorial is based on some experimenting I did recently linking a ContentDM collection of maps to a Timeliner in order to plot the collection items on a map and a timeline. There are multiple methods to make this happen, including using the ContentDM API and Google Spreadsheets to bring the collection metadata into Timeliner.

Background

Timeliner is a hosted application that generates timelines and geo-spatial mappings of a given digital collection. The service is free and can be embedded into any webpage using an iFrame.

Timeliner provides a ready-to-use data template for Google Spreadsheets. An institution need only enter the appropriate metadata from a given digital collection, from ContentDM for example, into predefined columns in the template and then publish that spreadsheet. After entering the URL of the spreadsheet, Timeliner constructs an interactive timeline and map feature.

Timeliner is also open-source and can be installed and developed locally.

Harvesting the Metadata from ContentDM

There are two methods for bringing the data into Timeliner from ContentDM:

  1. via XML export
  2. via TSV export

XML Method

The XML method is preferred, but would require an institution to add specific fields to its collections that Timeliner can use. For example, a place field that provides a human-readable placename for a given location, or a date field. In other words, if the data in ContentDM is structured in a Timeliner-ready manner, creating Timeliner interfaces for collections can be automated and rather simple once basic spreadsheets with ImportXML queries are entered into the appropriate Timeliner columns.

Special Note About Errors

For undetermined reasons, it is possible that ImportXML queries using the ContentDM API noted below will not retrieve data. There are a few possible explanations:

    1. Google limits the number of cells for a given spreadsheet and, importantly, there are limits on the complexity of spreadsheets, such as references to other cells. More information can be found on the Google Spreadsheets Size and Complexity Limits help page.
    2. ContentDM does time out from time to time

An alternative solution, not covered in this document, would be to export the full XML of a ContentDM collection and store it remotely and then have an XSLT construct a spreadsheet that could then be uploaded to Google Spreadsheets (or generated with ImportData calls within the spreadsheet). The one drawback to this solution is that this method will not dynamically update as new items are added to a collection. Thus, an institution would need to run this process each time an update was made to a collection.

As an example of using the dynamic XML method, a query to retrieve date field data might resemble something as simple as:

=ImportXML(XMLPATH,XPATHSELECTION)

for example…

=ImportXML("https://server16106.contentdm.oclc.org/dmwebservices/index.php?q=dmQuery/lpnc1/CISOSEARCHALL/title!creato!subjec!date!descri/title/1000/1/0/0/0/0/0/0/xml", "//date")

Adding similar queries to each Timeliner column will dynamically retrieve the data without any post-ContentDM publication intervention. Again, using the above example, the “date” field would need to entered by catalogers specifically for Timeliner (i.e. using a yyyy-mm-dd format).

Location: Geocoding through open web services

One Timeliner field that might be best handled directly in Google Spreadsheets post cataloging, however, would be Location as it can be automated and save catalogers significant time.

The Location field requires machine-readable latitudinal and longitudinal coordinates for a given place. Fortunately, open-source web services can be queried in a Google Spreadsheet to retrieve these coordinates.

To spare the author of such a spreadsheet from having to write incredibly complicated formulas, it is recommended to carry out this automation in stages:

  1. create a new spreadsheet with multiple sheets:

    1. the first sheet will be your Timeliner Template

    2. the second sheet will be your Geocoding spreadsheet.

  2. Populate the Timeliner Template with metadata using the above ImportXML method. This will include the Place column which contains human-readable place names.

  3. In the Geocoding sheet, create four columns:

  4. Column A will contain a formula that retrieves the data from the Place Column (Column H2) in the Timeliner Template (Sheet 1). For example:

=’Sheet 1′!H2

  1. Column B of the Geocoding sheet will query a geocoding web service to obtain the latitude. We will use the MapQuest Nominatim-based Open Geocoding API: http://developer.mapquest.com/web/products/open/geocoding-service

In Column B, you can query this service using the following XPath query, where A2 is the first row of data in Column A (assuming your columnar labels are in the first row):

=ImportXML("http://open.mapquestapi.com/nominatim/v1/?format=xml&q=" & A2 ; "//place[1]/@lat")
  1. Column C follows exactly the same XPATH statement, but replaces the latitude attribute @lat with the longitude attribute @lon.

=ImportXML("http://open.mapquestapi.com/nominatim/v1/?format=xml&q=" & A2 ; "//place[1]/@lon")
  1. Column D simply needs a comma character entered. This will be used as a separator to separate the latitude and longitude values in the format required by Timeliner.
  2. Remember to copy all of these formulas down the columns. Google Spreadsheets should calculate the correct values as you do so.
  3. Finally, back in the Timeliner Template, under the Location column, add a concatenation formula to combine the last three columns of your Geocoding sheet. The structure is:
=CONCATENATE(LATITUDE,COMMA,LONGITUDE)

your actual Google formula might look like this…

=CONCATENATE(Geocoding!B2,Geocoding!C2,Geocoding!D2)

Generating Thumbnails and other complicated fields

ContentDM generates a thumbnail image for each item. To create this, simply construct the following URL:

SITE/utils/getthumbnail/collection/COLLECTIONNAME/id/POINTER

For example:

http://cdm16106.contentdm.oclc.org/utils/getthumbnail/collection/p16106coll1/id/2

The above example can be broken down like this:

    • SITE = cdm16106.contentdm.oclc.org
    • COLLECTIONNAME = p16106coll1
    • POINTER = 2

The pointer is available as an element in the XML output of a given collection. For example, in blue:

<record>
<collection>
<![CDATA[ /p16106coll1 ]]>
</collection>
<pointer>
<![CDATA[ 2 ]]>
</pointer>

And so, to construct an IMG tag reference for Timeliner to generate a thumbnail, you would create a field in your Spreadsheet with the following formula:

=CONCAT("http://cdm16106.contentdm.oclc.org/utils/getthumbnail/collection/p16106coll1/id/","2")

Often, when constructing these kinds of concatenations you may want to create a third sheet in your spreadsheet called, for example, “Build” or something along those lines. This is a intermediary spreadsheet to begin massaging complicated data that may need to pass through a few ImportXML and Contanetation steps  before it is ready for Timeliner.

For example, in order to generate the above concatenation, you would first want two columns to pull from. For example:

  1. one column would have the URL stem:

http://cdm16106.contentdm.oclc.org/utils/getthumbnail/collection/p16106coll1/id/
  1. the second column would have the pointer, drawn from an ImportXML statement:

ImportXML("https://server16106.contentdm.oclc.org/dmwebservices/index.php?q=dmQuery/lpnc1/CISOSEARCHALL/title!creato!date!descri/title/1000/1/0/0/0/0/0/0/xml","//id")

TSV Method

For those who are not comfortable with XML, it is possible to export Tab-Separated Value (TSV) files of ContentDM metadata. This method is not unlike the XML method, except that the TSV file will be imported directly into Google Spreadsheets and the appropriate fields will then be massaged until the data is suitable for use in Timeliner. This can increase the number of interrelated sheets one might need to lead up the completed Timeliner template.

For example, your spreadsheet might be constructed in the following way:

Sheet 1: Timeliner Template

Sheet 2: Geocoding Template

Sheet 3: Concatenating TSV values (for example, multiple Place fields)

When your data is not pre-structured for Timeliner

Often, ContentDM collections do not have the required fields for Timeliner. In these cases, significant manual intervention will be required. For example, you may have dates combined within the publication field, requiring that a person go through each row and clean up the data so that Timeliner has a simple date it can understand.

Examples

Published Google Spreadsheet ready for Timeliner:

https://docs.google.com/spreadsheet/ccc?key=0AjJ_C_koXVI6dE9fcVlPSzBJWkc3TnRtZnJjTGEtWkE&usp=sharing

Timeliner view of ContentDM data:

http://digicol.lib.depaul.edu/cdm/timeline/collection/p16106coll1