Digital Author Services

The producers of information at our academic institutions are brilliant at what they do, but they need help from experts in sharing their work online. Libraries are uniquely suited for the task.

There are three important areas where we can help our authors:

  1. Copyright and Author Rights Issues
  2. Developing Readership and Recognition
  3. Helping authors overcome technical hurdles to publishing online

Copywhat?

Several libraries are now promoting copyright and author rights information services. These services provide resources (often LibGuides) to scholars who may be sold on the benefits of publishing online, but are unclear what their publishers allow. In fact, in my experience, this is one of the most common problems. Like I said, academics are busy people and focused on their area of specialization, which rarely includes reading the legalese of their publisher agreements, let alone keeping a paper trail handy. This is particularly true for authors that began their careers before the digital revolution.

At any rate, providing online information followed up with face-to-face Q&A is an invaluable service for scholars.

Lucretia McCulley of the University of Richmond and Jonathan Bull of the University of Valpraiso have put together a very concise presentation on the matter, detailing how they’ve solved these issues at their institutions.

Another service, which I’m actually developing at my institution presently, is providing copyright clearance as a service for scholars. In our case, I hope to begin archiving all faculty works in our institutional repository. The problem has been that faculty are busy and relying on individual authors to find the time to do the due diligence of checking their agreements just ain’t gonna happen. In fact, this uncertainty about their rights as authors often stops them cold.

In the service model I’m developing, we would request faculty activity reports or query some other resource on faculty output and then run the checks ourselves (using student labor) on services like SherpaRomeo. When items check out, we publish. When they don’t we post the metadata and link to the appropriate online resource (likely in an online journal).

Developing Readership & Recognition

Another area where library’s can provide critical support is assisting authors in growing their reputations and readership. Skills commonly found in libraries from search engine optimization (SEO) to cataloging play a role in this service offering.

At my institution, we use Digital Commons for our repository, which we selected partly because it has powerful SEO built into it. I’ve seen this at work: where a faculty posts something to the repository and within weeks (and even days), that content is rising to the top of Google search results, beating out even Facebook and LinkedIn for searches on an author’s name.

And of course, while we don’t normally mark up the content with metadata for the authors, we do provide training on using the repository and understanding the implications for adding good keywords and disciplines (subject headings) which also help with SEO.

The final bit, is the reporting. With Digital Commons, reports come out every month via email to the authors, letting them know what their top downloads were and how many they had. This is great and I find the reports help spur word-of-mouth marketing of the repository and enthusiasm for it by authors. This is built into Digital Commons, but no matter what platform you use, I think this is just a basic requirement that helps win author’s hearts, drives growth and is a vital assessment tool.

Walking The Last Mile

MacKenzie Smith of MIT has described the Last Mile Problem (Bringing Research Data into the Library, 2009), which is essentially where technical difficulties, uncertainty about how to get started and basic time constraints keep authors from ever publishing online.

As I touched on above, I’m currently developing a program to help faculty walk the last mile, starting with gathering their CVs and then doing the copyright checks for them. The next step would be uploading the content, adding useful metadata and publishing it for them. A key step before all of this, of course, is setting up policies for how the collection will be structured. This is particularly true for non-textual objects like images, spreadsheets, data files, etc.

So, when we talk about walking the last mile with authors, there’s some significant preparatory work involved. Creating a place for authors to understand your digital publishing services is a good place to start. Some good examples of this include:

Once your policies are in place, you can provide a platform for accepting content. In our case (with Digital Commons), we get stellar customer service from Bepress which includes training users how to use their tools. At institutions where such services is not available, two things will be critical:

  1. Provide a drop-dead easy way to deposit content, which includes simple but logical web forms that guide authors in giving you the metadata and properly-formatted files you require.
  2. Provide personal assistance. If you’re not providing services for adding content, you must have staffing for handling questions. Sorry, an FAQ page is not enough.

Bottom Line

Digital publishing is just such a huge area of potential growth. In fact, as more and more academic content is born digital, preserving it for the future in sustainable and systematic ways is more important than ever.

The Library can be the go-to place on your campus for making this happen. Our buildings are brimming with experts on archives, metadata, subject specialists and web technologies, making us uniquely qualified to help authors of research overcome the challenges they face in getting their stuff out there.

Advertisements

New Thoughts on Digital Publishing Services

Back in early 2011, I gave an overview of the library as a disruptive publishing platform. Three years is a long time in “disruptive agent” years. So where do we stand today?

First of all, the publishing industry has not fallen yet…but the great disruption goes on.

A friend of mine was recently describing his rodent control neighbor, a charmingly opaque Eastern European gentleman whose central point about controlling rats can be summed up in a single pronouncement: “Fighting rats is F@#%ing 24×7 War!”

I’m seeing value in this statement for the effort to liberate information. As I’m learning in my contact with faculty and other librarians, the rat warrens run deep into our institutions. So invasive are their labyrinths that they threaten the very financial underpinnings of our information services.

Luckily, we are not passive observers in this state of affairs. We are active participants in creating something new. We have tools at our disposal to fill in the rat holes with a digital foundation that will ensure a long, fruitful future of open access publishing that will empower our users in ways traditional publishing could never do.

New Openings

I’m seeing a number of openings libraries are beginning to exploit that build on the “library as publishing platform” model I wrote about earlier. Namely, librarians are often becoming central hubs for a variety of digital services that include:

  • digital humanities and academic computing support
  • digital project consultant services for everything from how to migrate online content to advice on metadata to search engine optimization (SEO) and usability
  • helping faculty navigate scholarly communications issues from copyright to developing readership and recognition
  • and, of course, providing the place on campus for online publishing

Taken together, all of these emerging services suggest a fairly promising future for librarians interested in transforming the profession into something more in line with current and future trajectories for information.

Ready to enlist as a disruptive agent yet?

Over the next few posts, I’ll explore each of the above and how my library is building new services or augmenting older services to meet these emerging digital publishing needs.

First up, that thing that goes by the very vague and unhelpful term of digital humanities…

Ground Zero for Digital Humanities

At my Library, we have not rolled out a formal digital humanities support program…yet.

Nonetheless, we receive regular, unsolicited inquiries about platforms like Omeka and Digital Commons from faculty interested in creating exhibits and online course projects. To meet the demand so far, we’ve rolled out Omeka.net services, but what people really want is full-blown Omeka with plugins like Neatline and others the hosted version does not support.

Clearly, this organic demand suggests a far more robust DH service is required. As I write, we’ve deployed a faculty survey based loosely on one created by Rose Fortier’s work at Marquette University. With this, we hope to not only build awareness of our digital collections and services (spoiler: early results have 60% of faculty being unaware of our institutional repository, for example…24×7 war indeed!), but also we want to learn what services, like digital humanities support, would interest faculty.

Based on our Omeka.net experience, my guess is that digital humanities support services will generate healthy interest. If this is the case, then we will probably role out self-hosted Omeka plus Neatline and GeoServer, along with trainings and baseline technical support, sometime in 2015. The one hitch that will need to be overcome, will be multi-site capability, which will enable us to install Omeka once and then launch as many separate sites as are required with a single click of a button. That particular feature does not exist yet outside Omeka.net, but according to Omeka.org, the forthcoming Omeka 3/Omeka-S will provide this, greatly enhancing the practicality of launching an Omeka service for any library.

Meanwhile, as I recently presented at the 2014 Digital Commons Great Lakes User Group, we are also continuing to provide a measure of digital humanities support on our Digital Commons institutional repository. While not as sexy as Neatline, we are posting student-generated Map of the Month from the Geography Department, for example, in PDF format.

The recent enhanced, zoomable image viewer available in Digital Commons may also help in this regard.

We’ve also seen a few faculty interested in using Digital Commons for student projects, particularly around courses focused on digital publishing issues.

But, of course, as non-librarian content creators enter the collection-building business, they come ill-prepared for overcoming the kinds of problems library professionals excel at solving. And so, this is where I’d like to turn to next: the library as a digital project consultant service.

Separate Beds for ContentDM

separate beds for contentdmI tried to make things work, but in the end, short of a divorce, I told ContentDM if things were going to work out between us, we had to sleep in separate beds.

There’s been a lot of talk about “Breaking up with ContentDM” but for a library with limited tech staff to develop our own digital library platform, calling it quits isn’t in the cards…no matter how abusive ContentDM is to us.

Abusive? Well, let’s list it here to be on record:

  • As of this writing core functionalities like the image viewer and search do not work in IE10 due to Compatibility Mode (but then again IE10 users are just asking for it…time to move on folks!)
  • phrase search doesn’t work well
  • stop words are not stopped which is especially bad since phrase searching doesn’t fix this
  • commonly used JQuery UI features cannot be used in custom pages without conflicting with the Advanced Search drop down
  • Worst of all, once you upload a JS or CSS file, it’s in there for good…no deletions are possible!
  • Objects that are composed of an image layer and an OCR text layer do not display correctly in Firefox (but that’s probably more on Mozilla than OCLC)

So, I knew it was time to draw a line in the bedroom when our attempts at customizing the user experience within the ContentDM web configuration toolset went terribly wrong.

Our JQuery almost always caused conflicts, our attempts at responsive design went horribly wrong within the very unresponsive framework of ContentDM and the way ContentDM is structured (with separate CSS/JS uploads for each customized collection) spelled long-term disaster for managing change.

Then came the latest release update when even more went wrong (largely in IE10).

In the end, I couldn’t take it anymore and called up OCLC and begged them to reset the whole site to default configurations, so we could at least start fresh without concerns that legacy JS and CSS were going to cause problems (as I believe they were). They were very helpful and in a matter of 2 hours, had our collections all reset.

We’re doing it differently now as we roll out phased customizations.

Here are our hard-learned best practices:

  • Never upload any custom CSS or JS to ContentDM…at least until OCLC creates a way to delete these. Instead, where you need such things, simply upload a reference file that points to externally hosted files, which you can edit/delete as needed
  • For the system header, upload your banner image and resist the urge to include any navigation HTML. Instead, use the system menu creation tool. You can use your externally hosted CSS file (reference globally) to style these links (but if you have drop downs you need to given them using this method)
  • Use a meta tag redirect to force ContentDM to redirect traffic to your externally hosted home page since ContentDM doesn’t allow you to replace the home page completely with an external page without resorting to this trick. Probably not great for SEO, but avoids the aggravations we endured for so long
  • Use the custom page tools for your collection pages that allow you to replace the whole page (including the header and footer) with an externally hosted page. In our case, we are doing this for the really important collections, but others, we manage directly within ContentDM.
  • Put any custom interface features into your externally hosted pages and develop to your hearts content

The result: users can now enjoy JQuery-powered interface features and more responsive designs from the home page down to the individual collection pages. If you want to add proven page-turning or timeline technologies in your collection pages, you can now do so without worry. The users only deal with ContentDM once they enter search result or image viewer pages.

To help with the browser issues, we will be deploying browser checks that will deliver messages to users coming to our site with IE or Firefox, hoping to head off bad user experiences with one-time, cookie-based messages. In other words, the first time you come to our site with one of these known problem browsers (for ContentDM), you’ll be urged to use Safari or Chrome.

Conceivably, you could use a CMS like WordPress or Drupal to manage your custom pages and start adding timeline, mapping and other plugins as you like. We’ll probably work toward this in 2014 or 2015.

Speaking of user disruption, the other cool thing about separating most of your digital library GUI from ContentDM, is that you can work in test environments, out of sight, and only update the public pages when you’ve thoroughly tested them. This was impossible when we tried to work in ContentDM itself. And when things went wrong, the users in our DL saw every thing in plain sight.

View the current production version of our Digital Collections.

Yeah, separate beds are just what the doctor ordered. Post any questions in the comments as I’m sure I raced through many of the details…

Using Timeliner with ContentDM

This tutorial is based on some experimenting I did recently linking a ContentDM collection of maps to a Timeliner in order to plot the collection items on a map and a timeline. There are multiple methods to make this happen, including using the ContentDM API and Google Spreadsheets to bring the collection metadata into Timeliner.

Background

Timeliner is a hosted application that generates timelines and geo-spatial mappings of a given digital collection. The service is free and can be embedded into any webpage using an iFrame.

Timeliner provides a ready-to-use data template for Google Spreadsheets. An institution need only enter the appropriate metadata from a given digital collection, from ContentDM for example, into predefined columns in the template and then publish that spreadsheet. After entering the URL of the spreadsheet, Timeliner constructs an interactive timeline and map feature.

Timeliner is also open-source and can be installed and developed locally.

Harvesting the Metadata from ContentDM

There are two methods for bringing the data into Timeliner from ContentDM:

  1. via XML export
  2. via TSV export

XML Method

The XML method is preferred, but would require an institution to add specific fields to its collections that Timeliner can use. For example, a place field that provides a human-readable placename for a given location, or a date field. In other words, if the data in ContentDM is structured in a Timeliner-ready manner, creating Timeliner interfaces for collections can be automated and rather simple once basic spreadsheets with ImportXML queries are entered into the appropriate Timeliner columns.

Special Note About Errors

For undetermined reasons, it is possible that ImportXML queries using the ContentDM API noted below will not retrieve data. There are a few possible explanations:

    1. Google limits the number of cells for a given spreadsheet and, importantly, there are limits on the complexity of spreadsheets, such as references to other cells. More information can be found on the Google Spreadsheets Size and Complexity Limits help page.
    2. ContentDM does time out from time to time

An alternative solution, not covered in this document, would be to export the full XML of a ContentDM collection and store it remotely and then have an XSLT construct a spreadsheet that could then be uploaded to Google Spreadsheets (or generated with ImportData calls within the spreadsheet). The one drawback to this solution is that this method will not dynamically update as new items are added to a collection. Thus, an institution would need to run this process each time an update was made to a collection.

As an example of using the dynamic XML method, a query to retrieve date field data might resemble something as simple as:

=ImportXML(XMLPATH,XPATHSELECTION)

for example…

=ImportXML("https://server16106.contentdm.oclc.org/dmwebservices/index.php?q=dmQuery/lpnc1/CISOSEARCHALL/title!creato!subjec!date!descri/title/1000/1/0/0/0/0/0/0/xml", "//date")

Adding similar queries to each Timeliner column will dynamically retrieve the data without any post-ContentDM publication intervention. Again, using the above example, the “date” field would need to entered by catalogers specifically for Timeliner (i.e. using a yyyy-mm-dd format).

Location: Geocoding through open web services

One Timeliner field that might be best handled directly in Google Spreadsheets post cataloging, however, would be Location as it can be automated and save catalogers significant time.

The Location field requires machine-readable latitudinal and longitudinal coordinates for a given place. Fortunately, open-source web services can be queried in a Google Spreadsheet to retrieve these coordinates.

To spare the author of such a spreadsheet from having to write incredibly complicated formulas, it is recommended to carry out this automation in stages:

  1. create a new spreadsheet with multiple sheets:

    1. the first sheet will be your Timeliner Template

    2. the second sheet will be your Geocoding spreadsheet.

  2. Populate the Timeliner Template with metadata using the above ImportXML method. This will include the Place column which contains human-readable place names.

  3. In the Geocoding sheet, create four columns:

  4. Column A will contain a formula that retrieves the data from the Place Column (Column H2) in the Timeliner Template (Sheet 1). For example:

=’Sheet 1′!H2

  1. Column B of the Geocoding sheet will query a geocoding web service to obtain the latitude. We will use the MapQuest Nominatim-based Open Geocoding API: http://developer.mapquest.com/web/products/open/geocoding-service

In Column B, you can query this service using the following XPath query, where A2 is the first row of data in Column A (assuming your columnar labels are in the first row):

=ImportXML("http://open.mapquestapi.com/nominatim/v1/?format=xml&q=" & A2 ; "//place[1]/@lat")
  1. Column C follows exactly the same XPATH statement, but replaces the latitude attribute @lat with the longitude attribute @lon.

=ImportXML("http://open.mapquestapi.com/nominatim/v1/?format=xml&q=" & A2 ; "//place[1]/@lon")
  1. Column D simply needs a comma character entered. This will be used as a separator to separate the latitude and longitude values in the format required by Timeliner.
  2. Remember to copy all of these formulas down the columns. Google Spreadsheets should calculate the correct values as you do so.
  3. Finally, back in the Timeliner Template, under the Location column, add a concatenation formula to combine the last three columns of your Geocoding sheet. The structure is:
=CONCATENATE(LATITUDE,COMMA,LONGITUDE)

your actual Google formula might look like this…

=CONCATENATE(Geocoding!B2,Geocoding!C2,Geocoding!D2)

Generating Thumbnails and other complicated fields

ContentDM generates a thumbnail image for each item. To create this, simply construct the following URL:

SITE/utils/getthumbnail/collection/COLLECTIONNAME/id/POINTER

For example:

http://cdm16106.contentdm.oclc.org/utils/getthumbnail/collection/p16106coll1/id/2

The above example can be broken down like this:

    • SITE = cdm16106.contentdm.oclc.org
    • COLLECTIONNAME = p16106coll1
    • POINTER = 2

The pointer is available as an element in the XML output of a given collection. For example, in blue:

<record>
<collection>
<![CDATA[ /p16106coll1 ]]>
</collection>
<pointer>
<![CDATA[ 2 ]]>
</pointer>

And so, to construct an IMG tag reference for Timeliner to generate a thumbnail, you would create a field in your Spreadsheet with the following formula:

=CONCAT("http://cdm16106.contentdm.oclc.org/utils/getthumbnail/collection/p16106coll1/id/","2")

Often, when constructing these kinds of concatenations you may want to create a third sheet in your spreadsheet called, for example, “Build” or something along those lines. This is a intermediary spreadsheet to begin massaging complicated data that may need to pass through a few ImportXML and Contanetation steps  before it is ready for Timeliner.

For example, in order to generate the above concatenation, you would first want two columns to pull from. For example:

  1. one column would have the URL stem:

http://cdm16106.contentdm.oclc.org/utils/getthumbnail/collection/p16106coll1/id/
  1. the second column would have the pointer, drawn from an ImportXML statement:

ImportXML("https://server16106.contentdm.oclc.org/dmwebservices/index.php?q=dmQuery/lpnc1/CISOSEARCHALL/title!creato!date!descri/title/1000/1/0/0/0/0/0/0/xml","//id")

TSV Method

For those who are not comfortable with XML, it is possible to export Tab-Separated Value (TSV) files of ContentDM metadata. This method is not unlike the XML method, except that the TSV file will be imported directly into Google Spreadsheets and the appropriate fields will then be massaged until the data is suitable for use in Timeliner. This can increase the number of interrelated sheets one might need to lead up the completed Timeliner template.

For example, your spreadsheet might be constructed in the following way:

Sheet 1: Timeliner Template

Sheet 2: Geocoding Template

Sheet 3: Concatenating TSV values (for example, multiple Place fields)

When your data is not pre-structured for Timeliner

Often, ContentDM collections do not have the required fields for Timeliner. In these cases, significant manual intervention will be required. For example, you may have dates combined within the publication field, requiring that a person go through each row and clean up the data so that Timeliner has a simple date it can understand.

Examples

Published Google Spreadsheet ready for Timeliner:

https://docs.google.com/spreadsheet/ccc?key=0AjJ_C_koXVI6dE9fcVlPSzBJWkc3TnRtZnJjTGEtWkE&usp=sharing

Timeliner view of ContentDM data:

http://digicol.lib.depaul.edu/cdm/timeline/collection/p16106coll1

Omeka.net is Looking Good

We used Omeka.net to help partners at another University publish a database of Catholic letters (not published yet, so it’s not ready to share) and I think everyone was quite taken by the ease of using the hosted version of Omeka. Unlike the locally installed version of Omeka, Omeka.net is a freemium web service that requires little to no coding skills and no server to get started.

Essentially, Omeka.net works like a hosted blog platform such as Blogger or WordPress.com. You have limited options in terms of look and feel, but the underlying collection building tools are there.

What I like about Omeka.net:

  1. Free or very cheap. You can put together a free collection if your needs are small. If, as was our situation, you have to import your metadata via spreadsheet or have some other plugins you need (some you have to pay for an account to use), then you’ll have to pay a subscription fee. But the fees are likely quite affordable for your organization.
  2. Easy peasy. Very little if any understanding of web technologies is required. In our situation, we had some problems that required some technical thinking mostly in terms of cleaning up our spreadsheet , converting it to UTF-8 and making it work with Omeka, but this kind of expertise should be easy to find in your institution. Once the platform is set up, non-technical staff can pretty much run it and add to it without issue.
  3. It’s pretty good and just getting started. In the past few months, new themes and plugins have started to come online and I expect that this platform will be very robust in just another year or two.
  4. Your collection will become part of the Omeka.net network of collections, which should help with Search Engine Optimization as well as serendipitous discovery.
  5. It feels pretty rock-solid and reliable.

What I don’t like about it:

  1. It’s still early days and some features from the full Omeka version are not yet available.
  2. Very limited theming, meaning you have only 8 different themes and none of these let you control color. However, you can add banner images and footers to brand it.
  3. The help documentation is fairly helpful for most situations, but if you run into some advanced issues, you’ll have to hope someone on the Omeka forums can help.

Saved by the Cloud

I’ve been playing around with Omeka.net, the hosted version of the digital collections platform Omeka, and have fallen hard for it.

A number of months ago, my university partnered with the National University of Ireland at Galway to find a new home for an annotated catalog of letters and primary documents from the Vatican Archives. It turns out that online access to the collection was in danger due to the financial troubles in Ireland, specifically, that the funding for the server and the IT staff required to support it was going away.

So, my library offered its assistance and I began exploring the options.

The ideal platform would have to have staying power, be relatively cheap and satisfy the feature list as closely as possible from the old website. Also desirable, of course, would be that it would use web standards, be simple to maintain and require no IT support.

Fortunately, this was happening just after the folks at George Mason University had turned their open source Omeka platform into a hosted service. Omeka is a platform designed around familiar web publishing conventions similar to WordPress, so for administrators, it would be quite easy. However, the traditional in-house server-based version would require at least one full-time IT staff member who could configure a LAMP server, install and configure Omeka and then keep it updated and running.

That would be impossible at my university where no server was available (or at least no production servers) to the library and where PHP (which Omeka is based on) is frowned upon.

But, with the hosted version coming online, Omeka.net, we could meet all of our criteria with additional benefits:

  • No server required…just sign up for an account and you’re ready to get started
  • No IT staff required
  • Simple item and collection management through web forms, making it possible for the researchers to continue adding to their collection without further assistance
  • A growing list of plugins, including CSV imports, Dublin Core mapping, etc. to meet most of the feature requirements
  • OAI-PMH interoperability, making it possible for the collection to be harvested by other systems and uses
  • Plus, the collection automatically rolls up into the growing universe of other Omeka.net collections, enhancing SEO and find-ability

The only real shortcomings of the system were its very limited theming and some missing plugins, such as faceted browsing and timeline features. However, it’s clearly early days for this blossoming platform and I expect good things to be added in the near future.

The live version will go online soon after we finalize a few graphics and textual decisions. But the collection is now safe and sound and poised to grow and develop in a stable and promising platform.

Rolling out Campus Guides

We began implementing Campus Guides (LibGuides CMS) this week. Two projects will kick-off this new platform for us:

  1. Building out new informational pages that target specific campus groups (faculty will come first)
  2. Redesigning the “LibGuides” workflow for guide publication by using groups for admin purposes

The informational pages have been a usability disaster for some time, but we had delayed doing anything about this because our current CMS is it’s own kind of disaster. Now that we have Campus Guides, though, we can implement some information architectural changes and interface enhancements (using JQuery) to vastly improve the utility of these kinds of pages.

An admin group that requires admin access to edit content will serve as the content management system for things like navigation boxes, Jquery features and other code-intensive objects that non-technical staff should not be touching. Another group will house the “guides” themselves, where librarians can access templates embedded with the admin group’s content.

We’ll be doing something similar with our traditional libguides content: creating an admin group that contains the “steal this guide” content with things like canned search boxes, book feeds, etc. We’ll also be creating a Test group for these libguides where interesting features can be explored without allowing others to copy such content. Once these tests are approved as public content, we can then simply place that new content into the “steal this guide” group for use across the system.

I’m curious if others out there have tried this approach. On paper it looks great. We shall see…