Exporting bibliography from library catalog to XML

As part of a LibGuide project that I was working on recently, I wanted to export the bibliographic data for rare Bibles in our Mack Library collection so that I could generate a list for public viewing. The rare Bibles in our collection are housed in two rooms, the Jerusalem Chamber and Special Collections, and because few people ever see the books in Special Collections I thought it would be nice to create a browsable list. Our current library catalog does not make it possible to browse the items in a particular room (or “location code” in library science terms).

The process I came up with involves three major components:

  1. Finding and exporting the bibliographic records in Millennium using Create Lists
  2. Filtering, finalizing, and structuring the data in Excel 2007
  3. Converting the structured data into clean HTML with XSLT

I’ve posted my files in a github repo for others to adapt.

Step 1: Export records from Millennium

To create my list, I did an item record search based on location code, but stored the bib records in my list.

I then exported all the data to text. Because some of the variable-length fields included commas, I used tabs (“Control character” 9) as my field delimiter.

Step 2: Manipulation in Excel 2007

For my situation, data processing in Excel involved a number of small steps. Steps 2.2 and 2.3 may not be necessary for every use case.

2.1 Import data

A number of the titles in our collection are not in English, and so I had to specify that the imported data was Unicode (UTF-8). I found out that Excel 2003 does not support importing UTF-8, but Excel 2007 does.

On step 3 of the import wizard, I set the data format for all columns as “Text” rather than “General” to avoid any unwanted conversion of my bibliographic data.

2.2 Filter records (if necessary)

Because our Special Collections room includes many items besides the Bibles that I was interested in, I needed to filter my data. I did a quick filter for all items with Uniform Titles beginning with “Bible.” and used Select Visible Cells to copy and paste the data I wanted to a new worksheet to prepare for export.

2.3 Produce truncated bibID

The current version of the Innovative Web catalog at my institution does not respond correctly to a full 9 character bibID. Instead, I have to truncate the checkdigit at the end to get a valid link. Now, since that may not make much sense, let me just illustrate:

The first example does not include the checkdigit “6″ at the end. Since the checkdigit is included in the data exported by Millennium, I created a new column at the end of my data and populated it with this Excel formula:

=LEFT(A2,8)

where “A2″ is the reference to the cell containing the untruncated bibID.

2.4 Match columns to XML Schema

In order to prepare my data for export, I need to match each column to an attribute in a custom XML Schema so I can export structured data that I can meaningfully interpret into HTML.

To do this, I’ll need to create my schema file, load into Excel, and match each element to the appropriate data columns.

My XSD is as follows, and can be easily adapted to another situation.

<schema xmlns="http://www.w3.org/2001/XMLSchema">
  <element name="Bibliography">
    <complexType>
      <sequence>
        <element name="Item" maxOccurs="unbounded">
          <complexType>
            <sequence>
              <element name="ItemTitle" type="string" />
              <element name="ItemUniformTitle" type="string" />
              <element name="ItemAuthor" type="string" />
              <element name="ItemCallNum" type="string" />
              <element name="ItemEdition" type="string" />
              <element name="ItemPublication" type="string" />
              <element name="ItemDescription" type="string" />
              <element name="ItemSeries" type="string" />
              <element name="ItemNote" type="string" />
              <element name="ItemSubject" type="string" />
              <element name="ItemAddAuthor" type="string" />
              <element name="ItemAddTitle" type="string" />
              <element name="ItemISBN" type="string" />
              <element name="ItemTOC" type="string" />
              <element name="ItemBibID" type="string" />
              <element name="ItemBibIDTruncated" type="string" />
            </sequence>
          </complexType>
        </element>
      </sequence>
    </complexType>
  </element>
</schema>

Once I have a valid schema, I load it into Excel through XML Source window on the Developer ribbon. To add a new Schema, click the “XML Maps” button and then “Add.” Then I drag each attribute in the XML tree visible in the XML Source pane and drop it on the appropriate column in my data until the result looks like this:

2.5 Export to XML

To generate XML from my formatted file, I click the “Export” button in the XML section of the Developer ribbon and save to file. Excel will follow my custom schema to produce structured data that I can use to generate HTML output.

Step 3: Generate HTML with XSLT

Now that my data is in XML, generating HTML is a simple matter of applying an XSL stylesheet. My stylesheet is as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" indent="no" encoding="UTF-8"/>
  <xsl:template match="/Bibliography">
    <xsl:apply-templates select="Item">
      <xsl:sort data-type="text" select="ItemUniformTitle" order="ascending" />
    </xsl:apply-templates>
  </xsl:template>
  <xsl:template match="Item">
    <p>
      <xsl:variable name="bibID" select="ItemBibIDTruncated"/>
        <xsl:choose>
          <xsl:when test="ItemUniformTitle != ''">
            <b><xsl:value-of select="ItemUniformTitle" /></b>
            <xsl:text> </xsl:text>
            <xsl:value-of select="ItemTitle"/>
          </xsl:when>
          <xsl:otherwise>
            <b><xsl:value-of select="ItemTitle"/></b>
          </xsl:otherwise>
        </xsl:choose>
        <xsl:text> </xsl:text>
        <xsl:value-of select="ItemPublication" />
        <xsl:text> [</xsl:text>
        <a href="http://library.bju.edu/record={$bibID}" target="_blank">
          <xsl:value-of select="ItemCallNum" />
        </a>]<xsl:text/>
        <xsl:if test="ItemNote != ''">
          <br/>
          <i>
            <xsl:value-of select="ItemNote"/>
          </i>
        </xsl:if>
    </p>
  </xsl:template>
</xsl:stylesheet>

How to easily add customized links to an intranet page

For a long time, I always wished I had an easy way to add a custom set of links to my university’s intranet home page so I could easily access the pages for my current classes.

The idea was to have it look something like the image below. Notice the “Class Links” section above “Home.” The original page only had the “Home” section.

Custom class links embedded into my campus home page

Originally, this seemed to be a bit implausible. Now, thanks to Greasemonkey and Instapaper, it’s actually pretty easy. In fact, the solution I’ve built is general enough that it could be easily adapted to a number of similar situations where you want an easy way to personalize a set of links on a non-interactive page.

I thought that some of my classmates might be interested in this, so this post is to explain how to adapt my code for yourself. This will require some very basic JavaScript skills at one point, but hopefully I’ll be able to make that part easy enough. Continue reading

Birthday surprise

My wife and I wrote this little program for my little brother, who is a freshman computer science major. It was a lot of fun, and brought back memories of my C++ days in school.

Obviously the programming wasn’t challenging, but getting the parameters right was a lot of fun.


#include <iostream>
#include <windows.h>

#define doo 262 //frequency do 262 Hz
#define re 294 //frequency re 294 Hz
#define mi 330 //frequency mi 330 Hz
#define fa 349 //frequency fa 349 Hz
#define sol 392 //frequency sol 392 Hz
#define la 440 //frequency la 440 Hz
#define si 495 //frequency si 495 Hz
#define od 523 //frequency do 523 Hz

using namespace std;

int main()
{  
  
  Beep(doo,100);
  Beep(doo,100);
  Beep(re,180);
  Beep(doo,190);
  Beep(fa,200);
  Beep(mi,300);
  Beep(doo,100);
  Beep(doo,100);
  Beep(re,180);
  Beep(doo,190);
  Beep(sol,220);
  Beep(fa,300);
  Beep(doo,100);
  Beep(doo,100);
  Beep(od,260);
  Beep(la,260);
  Beep(fa,220);
  Beep(mi,200);
  Beep(re,300);
  Beep(466,100); //b-flat is 2^(1/12)*440 http://en.wikipedia.org/wiki/Note#Note_frequency_.28hertz.29
  Beep(466,100); //b-flat is 2^(1/12)*440 http://en.wikipedia.org/wiki/Note#Note_frequency_.28hertz.29
  Beep(la,260);
  Beep(fa,300);
  Beep(sol,300);
  Beep(fa,300);

  return 1;
}

Microsoft LifeCam VX-3000: Overview

This Christmas, my mother got the Microsoft LifeCam VX-3000 2 pack so that we can video chat with her when we’re at our home. Although the camera is certainly not a high-end webcam, it is enough for basic video chat needs. I won’t write a comprehensive review here, my purpose is just to clarify a few things that I’ve learned about the camera and then document how I set it up on my mom’s computer so others can properly configure things if anything later goes awry. For a helpful review of the VX-6000 (which appears to be the same camera except for a wide-angle lens and probably a few other perks), see this review from a guy in New Zealand.

The camera itself was fairly easy to install, and the proprietary Microsoft software works pretty well (although it’s a little too resource-intensive for my taste). You don’t have to install Windows Live Messenger to use the camera, but you will need it or Skype in order to make calls with the camera. Continue reading

Burning CDs and DVDs in Windows XP… without unusual punishment

Someday, burning optical media like CDs and DVDs may be a fully supported operating system feature. Until then, Windows users have to use burning software packages that unfortunately tend to bundle complexity and cost with a lot of frustration. Windows XP does support minimal burn operations when blank CDs and DVDs are opened as “writable” windows in My Computer, but this is not enough for creating anything beyond basic data disks.

I have used commercial burn software packages (like Nero and Roxio) before, but I chose not to use these packages again for a few recent projects. Here’s why:

  1. For a home user who rarely (read, almost never) needs to burn media anyway, it is difficult to justify the cost of either package.
  2. Both Nero and Roxio bundle a lot of useless junkware into their packages (perhaps partially to justify their pricetag) and all I need is basic burn functionality.
  3. Removing remnants of an expired “trial” version of either is not a task I relish. I’ve found (and Wikipedia concurs ((Wikipedia’s article on Roxio’s “Easy Media Creator” states “Once installed, Easy Media Creator is difficult to uninstall. See the Roxio Community Forum, containing over 5000 posts concerning this topic. Roxio’s official six-step uninstall process, which includes registry edits, is available at ….” ))) that even licensed copies of these programs are hard to remove.

With this in mind, I set out to find open source or freeware burn software without needless frills. I was looking for actively developed projects which might continue to be useful in the future. Two programs which do just that are InfraRecorder and ImgBurn.

InfraRecorder is an open-source project, and it supports creating various data and audio CDs and DVDs (at this writing, it is in version 0.45). It does not yet support the creation of video DVDs, but it has a fairly simple interface, and enough basic documentation to help most users. I used it to burn a a large collection of MP3 files to a DVD without any problems.

ImgBurn is freeware, and supports data, audio, and video for CDs, DVDs, HD-DVDs, and  Blu-ray (at this writing, it is in version 2.4.2.0). The interface is very clean and operation is straightforward. I probably like this program better than InfraRecorder because its basic approach appeals to me.

Rather than modeling all its operations like other burning software which combine high-level tasks like “Copy disk” or “Burn data disk” with lower-level processes such as creating a disk image, ImgBurn does everything from a low-level perspective. If you want to copy a disk, you first run the process to create an image file of the original disk, and then you insert a blank disk and burn the image file into the new disk. Although this approach may seem more technical to a new user than a higher-level approach, this approach is so simple (and the tutorials so straightforward) that when I first used it I feared that I was missing some detail simply because it seemed too easy to be true.

Amazingly, with all this simplicity, it really does support a high degree of customization for advanced users. The complexity is accessible, but you don’t have to deal with it unless you seek it out.

I’m glad I finally put some effort into finding burn software for future projects. Both of these are good programs that don’t annoy me in any of the ways the commercial packages did, and hopefully they’ll still be useful as technology and media standards progress. At least for now, these two will probably remain my burn packages of choice.

Clean pastes with Excel VBA

I’ve talked about some of the Excel VBA I’ve done before, and here’s a new thing I learned today.

Problem

One part of my application selects a range from one worksheet in the spreadsheet and pastes it onto another. Originally, the code I used looked like this:

destinationWorksheet.Range("A2").Select
ActiveSheet.Paste

There is one problem I was unaware of. Every time the .Paste operation executed, it added a new “Named Range” to the ActiveSheet. I only realized this when snooping around in the Properties for the Excel file, where I found (to much dismay) 840 different “Named Ranges” referring to data long since thrown away! An older version of my application, which used this unspecified Paste operation much more frequently, had even more of these attached to numbers of other worksheets. Continue reading

Lazy list management

I use a couple awk scripts to manage a list of students who come to church with me every week. When a student tells me he will be coming on Sunday, I put an asterisk (*) in front of his name in a static text file with everyone’s names. If they tell me they won’t be coming, I put a minus sign (-) in front of their name so I remember that they said so. If for some reason I’m not sure, I put a question mark (?).

My main script spits out a list of all the students who are coming (i.e., have an asterisk). There’s no genius to it, I’m just logging it here for reference.

I also have another script that wipes all the attendance marks out of the main list file. Continue reading

Producing valid RSS from my XSL Transform

In a previous post, I showed the XSL transform I designed to transform a simple XML file I use into “(almost) valid RSS.” I called it (almost) valid because of one problem, which I address here.

The problem reviewed

My original XSL assigned an arbitrary ID to each entry in the news file based on its order in the list. What this meant was that the first item received the integer 1 for an ID, the second 2, the third 3, etc. Here it is:

<!--old method, produces invalid link & guid-->
<link><xsl:text>http://www.duncanandmeg.org/news.php#</xsl:text>
  <xsl:value-of select="position()" /></link>
<guid><xsl:text>http://www.duncanandmeg.org/news.php#</xsl:text>
  <xsl:value-of select="position()" /></guid>

This was fine for my original purposes, but since this hardly represented a unique & permanent identifier for each post, it caused some problems when echoed into the RSS <guid> and <link> elements. The significance of the <guid> element in particular is eliminated with such an approach. Every time I added an item to the list (see the previous post for an explanation of my schema), it would be assigned the ID of 1, and the older posts would be assigned the ID numbers 2, 3, 4, 5 etc.

This meant that most RSS aggregators would not detect that a new post had appeared and would not update accordingly.
Continue reading

Fixing memory leaks in Excel VBA… (we hope)

One project I constantly return to is an application I built in Excel VBA that does a lot of different things. One specific problem remains unsolved with this application, so I’ve decided to throw it out here and see if any gurus have an answer.

Basically, as a result of some Excel memory leaks beyond my control (I can’t install recommended hotfixes on the computers where this application is used, and need to be able to handle the issue with/without hotfixes anyway), when I shut down Excel it typically remains stuck in memory (and visible under “Processes” in “Task Manager”) even though it looks like it is gone to the user.

Paranoia

I realize that many would argue that since this bug is Microsoft’s problem and not mine, that I shouldn’t trouble with addressing it. However, since this application I’ve built is something that gets used frequently (but not constantly) throughout the day, I can imagine this becoming a problem (or at least wasting resources) at some point.

Another rejoinder to this charge is quite simple: I’ve actually ignored this issue for about a year now already, and am only revisiting it because I’ve learned some things about VBA that enable me to possibly defeat it. If I didn’t have any hope of fixing this, I wouldn’t try. Continue reading