SQL for Librarians

Here it is, SQL for Librarians.  I closed out the Cambridge Evergreen Conference (for good or ill) and actually keep a few folks there until 12.  I had a lot of great comments so I think it was fairly successful despite being a tad loopy from allergy medication.  And I blame the medication for a few things that upon listening to this I cringed at.  In a perfect world I'd love to do this again and do it with a full workshop format.  

Slides: http://www.slideshare.net/roganhamby/sql-for-librarians

Youtube: https://www.youtube.com/watch?v=3Iz-HFiDq6E

Conversations At ALA About ILSes

Normally I leave my rabid pro-FLOSS pro-Evergreen attitude for the web.  In person I make a conscious effort to not be so forward as it's usually a hinderance to meaningful conversations.  Today at ALA in Vegas I threw that rule out the window.

I didn't do it right away but eventually I was worn down.  Worn down by what you ask?  Since this morning, I've had six conversations with people bitching about their ILSes.  And their complaints were legitimate.

"I went to Blue ILS years ago and it was great but the founders left and they're now evil corporate sociopaths who abuse us regularly."

"I was about to go to Red ILS which is great with great support but they just got bought out by evil sociopaths and I don't feel good about this anymore."

Valid concerns.  What annoyed me was the fatalism.  "Whatcha gonna do?"  Go open source.  There, I solved it for you.  I told the last one that in those terms.  I usually say it anyway but with more respect for the difficulties their situations face.  But I'm tired of having those issues used as excuses for why libraries should allow themselves to be abused.  The difficulties make things non-trivial, maybe even hard, but not impossible.  And it is the answer.  If you're not being abused, you just wish it was better and you're willing to live with it because you have higher priorities then that's fine.  But if your voice sounds like you're beaten regularly when you talk about your ILS vendor  ... yeah, you need an intervention.

So, how do you do this?

Well, you could host yourself in which case you only have to trust yourself.  But, that may not be efficient.  I use hosting from Equinox Software.  My hosting and support provider has the advantage of expertise from hosting many installs and economies of scale.  Why do I use Equinox? Because I trust them.  Why do I use Evergreen and open source?  Because I don't have to trust them tomorrow.

Implicit in the complaints is lock in - whoever they go with for support owns the software.  Changing support means changing software which is a huge deal.  But when no one owns it, everything is different.  My contract allows free access to my data.  If the leadership changed at Equinox I would just change service providers.  My users won't know.

And yes, that's why open source is the answer.  There's no reason for anyone to ask me if I'm happy with my ILS support because if I wasn't, I'd just change it, year to year if I had to.  And that's a very good thing for my library.  

Sharing is Good Business

I was thinking today about intellectual property in the library world.  Specifically, what prompted my musings was Elon Musk's blog post about patents and the Tesla Motor Company last week.

http://www.teslamotors.com/blog/all-our-patent-are-belong-you

Already the global news cycle has come and gone on it.  But, I think it's interesting to think about the obligations of those interested in shaping social change and how intellectual property plays a part in that.  Next week I will be at ALA and one of my favorite parts of a large event like ALA is looking at the vendor floor filled with businesses eager for my library to accept an invoice.  But the question is, are they people I want to do business with?

Let me back up a bit.

Elon Musk has stated that his goal isn't merely to build successful businesses but to push the world forward.  And he's now realized it has to inform how he as a capitalist interacts with and shares with others.  He wants to help create a common baseline any manufacturer could build a vehicle off of.  Should all businesses have an obligation to help move the world forward rather than just their profit margins?  Is there a possibility of one day developing a core set of freely shared technologies that anyone could build an ILS from?  Oh, wait a minute, that's already being done ...

Stepping back a bit more, the industrial age was dominated by the development of technologies that allowed goods to be made in greater precision and volume than ever before.  Often the knowledge of how to do this was freely stolen.  Note, I don't say they were shared but once a competitor acquired the knowledge of how to do something there was little going back.  And I'm not saying that this was good but it was an aspect of an age of expansion.  We did socially reap benefits from information being distributed, even illegally. 

The information age finds us making goods out of information itself.  Never have we been so well prepared to defend intellectual property.  As a society we litigate - comprehensively and aggressively.  Maybe we instinctively hoard information because we know it is valuable.  And libraries, entities who should be at the forefront of sharing, who make our very existence off making information available to our patrons, are as guilty of not sharing as anyone.

Fortunately, that is changing.  OCLC is embracing the Open Data licence and is encouraging their members to do the same, which is a wonderful thing.

http://opendatacommons.org/licenses/by/

I would love to see this go further into institutional data far beyond what is collected on the state and federal level.  I periodically find archives being loaded on the web by libraries using some variant of the Creative Commons licences, also a good thing.

http://creativecommons.org/

And finally, a few libraries are embracing Open Source.  The licences vary by project but the heart of all the licences is allowing new tools to be built on existing ones.  And there are many small projects like libraries to handle data types or protocols but those are building blocks.  Musk realized that he had to start sharing how you stack the building blocks - how you make the big stuff.  Are we doing that?  Frankly, Koha and Evergreen are pretty big things so companies involved in improving those are already building, and changing, the future.  Opening ILSes, making them freely available, giving powerful tools to everyone regardless of income, valuing knowledge over money, these things change the world if they gain enough adoption.  Don't believe me?  Look at Linux, Apache, PHP, MySQL, Postgres, Perl and so on.  If you think widely adopted open source products haven't changed the world you live in a state of denial.

Musk realizes that companies need to change the world as there is a role there no one else will fill.  His businesses are means of supporting positive change as well as generating profit.  Being open is not anti-capitalist, it's an adaptive strategy for a changing world.  He's invoking the ideology of the FLOSS movement in his blog entry even if he is not participating in it.  The simple act of adding to the base of freely used, functional, technology gives the future a deeper toolbox with each contribution.  That is a morally virtuous act that he wants to align with even if he can't adopt it due to the nature of the patent system.  But it's not the moral element that fascinates me about Musk's entry, it is the implication he makes that by opening access to technologies protected by his patents, essentially vowing to not pursue his intellectual property rights, he is declaring that is the ethical mandate of his company to share.  To rephrase and repeat, like any good reference librarian, Elon Musk is saying that profit is not the sole ethical mandate of his company. 

I would call out library corporations to look at what they can open source or at least share by some means.  I would say that library vendors who don't share where they reasonably can are acting immorally.  Note, I don't say unethically, that is an entirely different matter, determined by their own corporate structure.  In fact I worry that they have an ethical obligation in their roles to do immoral things.  

What can they share?  I don't know.  Clearly it's not realistic for an ILS vendor to GPL their entire codebase and dump it on Github.  But I find it hard to believe there isn't anything they can share.  Maybe a library of code for RDA checking.  Maybe a Z29.50 server.  Maybe a network diagnostic tool.  Maybe data about usage needs.  

Elon Musk's blog post raised eyebrows because he rejected the idea that hoarding information is a business's ethical obligation.  We already have vendors who support open source and believe that sharing is an ethical requirement to being in the library community.  I think libraries should hold vendors to that standard.  I want to support companies who act in a manner I think is both ethical and morale in regards to supporting not only my library this fiscal year but the libraries of tomorrow and open source is a big part of that.  

Virtual Box Image for Evergreen Redux

Long, long ago in a Hack-A-Way far, far away ... well actually it was just Michigan last month, Yamil Suarez and I talked about the challenges of getting folks to work on things like documentation and bugs when setting up Evergreen could be both very challenging and time consuming for them.  That kind of stuck in my mind and then I was reviewing some QA work done by Jason at ESI using this script originally developed by Bill Erickson, also at ESI.  I immediately liked the idea of building a Wheezy image with the host changes, git, xul runner 14, etc... preinstalled so that with git scripts you could build a new server with any changes for the configuration coming from the scripts.  I went this far as I'm pretty sure this will be useful to me at least.  So, now I wanted to share it with the community and see if they find it useful for the same purposes I imagine it will be.  If so I will look at doing tutorials for Virtual Box, this process and eventually git to help others.  

With this image and twelve lines of terminal commands and responding to 11 prompts (with things like 'yes', 'yes', 'evergreen', 'evergreen' and hitting enter seven times) you can get a fully functioning testable install of Evergreen.  [  Oh, and you might have to click on a couple of GUI elements like to open the terminal.  :)  ]

Here are the terminal commands:

./grab_script.sh      // [respond to a few prompts in the script]

su opensrf

osrf_control --start-all --localhost

exit 

/etc/init.d/apache2 start

su postgres

./grab_data.sh

exit

su opensrf

cd /openils/bin

./autogen.sh

exit

 

Hack-A-Way 2013 Day 2

day 2

I should have said this on the outset of yesterday's post - Hack-A-Way 2013 hosted by Calvin College and sponsored by Equinox Software.  I have no obligation to mention those in this forum but they both deserve the recognition (and far more). 

Priority one for day two was finding out how to hack hangouts so that my typing didn't mute the microphone (which they couldn't hear anyway since I was using an external microphone).  Some quick googling uncovered that this is a common complaint by people who use hangouts for collaboration and that there was a an undocumented tweak that only required minimal terminal comfort.  I'm still tempted to get a second laptop to make it easier to position the camera though and I'm definitely bringing the full tripod next time.  But, AV geekery behind me ...

We started with reports on the work the day before.  

Ben Shum reported work on mobile catalog.  That group was the largest of the working groups and had laid the ground work of saying that it should have full functionality of the TPAC and that was the goal.  The team worked on separate pieces and working on moving files into a collaborative branch on working repository.  A lot of the work is CSS copied from work done by Indiana as well as de-tabling interfaces and using DIVs.  

Our table worked on a proof of concept for a web based staff client.  Bill Erikson had previously done a dojo based patron search interface and checking out uncataloged items as a proof of concept.  We worked on fleshing that out, discussing platforms for responsive design, what would be needed for baseline functionality (patron search, checkout, see items out, renewals) and then later bills.  This is less a demo at this point than a proof of concept but one goal is to have something that might in a very limited way, with some caveats, also help those suffering from staff client memory leaks by having something that could handle checkouts without the staff client.  It is also bringing up a lot of conceptual questions about architecture of such a project.  Working directory and dev server are up.  Most of the work on this is being done by Bill and Jeff Godin with input from the rest of us.  

Lebbeous Fogle-Weekly reported for the the serials group.  They targeted some specific issues including how to handle special one off issues of an ongoing series and discussed the future direction of serials work.  In fact they already pushed some of their work to master.  However, because of their narrower focus they are going to break up 

Jason Stephenson worked on the new MARC export and has a working directory up.  New script is more configurable.  At this point I missed some of the conversation unfortunately due to some issues back home I had to deal with but apparently in a nod to Dan Scott MARC will now be MARQUE.  

In evaluating the 2.5 release process we spent a lot of time discussing the mostly good process and the big challenge the release manager had in it.  The community goal has been in making more stable releases.  During this release Dan Wells added more structure was good, the milestones, pointing out bugs was good but he also wanted feedback which was really hard for the developers who were very happy with his work.  But there are challenges and finding solutions is right now elusive.  Kathy Lussier addressed dig concerns about documentation and that ESI does a lot of the documentation work for new features but work not done by them is often left undone.  We had 380 commits since 2.4 with the biggest committers being Dan Wells, Ben Shum and Mike Rylander. Is that sustainable?  A rough guess i that those are half bugs and half features which is an improvement over the past.  Do we need to loosen review requirements?  Do we do leader boards as psychological incentive?  Concern that some would lower standards to increase numbers.  The decision about selecting a 2.6 release manager was put off as well deciding to let folks think about these issues more after we had a lot of discussion that lasted longer than we had planned.

Discussion also wandered into QA and automated testing.  A lot of progress has been made here since the conference.  In regards to unit testing there was a consensus that while it's a great idea it won't have a significant impact for a while.  Right now the tests are so minimal that they don't reflect the reality of what real data does in complex real world environments and it will take time of finding those issues and writing more tests to reflect that before the work has it's payoff.

Art.  Kinda looks like a grey alien to me.

Art.  Kinda looks like a grey alien to me.

I won't try to re-capture all of the conversation but maintaining quality and moving releases forward were discussed in great depth.  There was less interest in discussing 2.6 than really trying to clean up and make sure 2.5 is solid.  The decision about who would be the 2.6 release manager was put off and the idea proposed for a leader board to encourage bunch squashing.  A "whackin" day to targeting bugs like Koha does was also floated about.

I spent a lot of the day looking at some great instruction Yamil Saurez put together for installing OpenSRF and Evergreen on Debian for potential new users and chatting with Jeff and Lebbeous about the need for beefing up the concerto data set with new serials and UPC records.  Other projects included looking at the web site, starting a conversation about users, merchandising, IRC quotes, and so on.  

By the evening we had a nice dinner and a group of us headed out to Founders for a drink and to walk about downtown Grand Rapids in order to look at Art Prize installations which were quite nice.

 

Evergreen Hack-A-Way 2013 Day 1

Note: this is not a comprehensive report, just my notes from my memory.

I'm writing this as I eat a waffle at breakfast on day 2.  Day 0 was Monday and folks gathered up at the conference center for dinner but it was Tuesday that things really started.  Starting at breakfast everyone was immediately in work mode.  Talk was heavily on the future of the staff client and the other big issues that we've all been waiting to see others in person to hash out.  We wrapped up grub and headed as a group to the Calvin College library who is kindly hosting us in their conference facilities.  Power was at every table and coffee soon appeared.  The wifi wasn't perfect but we may have been pushing it's limit.  And those are really the three critical needs of this crowd - power, wifi and caffeine.  And I found myself in a bit of an AV geek role hosting the Google Hangout and coordinating things with IRC a little (and I certainly wasn't the only one multitasking back and forth so remote folks were involved).  

As we gathered (after laptops were setup) discussion immediately centered on the future of the staff client.  We discussed the issues with xulrunner.  Dan Scott noted that he had talked to Mozilla folks at a Google event and they were surprised at our use of xulrunner, noting that wasn't it's purpose.  Certainly newer versions cut off critical functionality and memory leaks are an ongoing concern.  With all of this in mind everyone was firmly in favor of moving forward somewhere.  

Ben, Kathy, Bill in the lobby Tuesday night.

Ben, Kathy, Bill in the lobby Tuesday night.

As we discussed where to go from xulrunner and what to go to, the discussion was web based client yeah or nay.  Although there were participants with a preference for a local staff client (specifically java based) the web based arguments took the day and those who had a preference for a local client were willing to support a web based client.  Discussion centered around using advanced Dojo and Chris Sharp worked on seeing how it would work with Evergreen 2.4 in the afternoon.  Everyone was concerned about the practical issues of how we could implement in stages a web based staff client, get testing and engagement as well with the community's limited resources.  The consensus was we needed to live with the staff client for a while but move away from xul within It, come up with standards for the new staff client as a draft and if we can move to modern dojo and then in staff client interfaces might be largely portable to a web based one.  Concerns about how to handle issues that can't be done in client and should they be done with a small local app versus plugin and best practices were discussed in regards to offline, staff client registration and printing along with various Windows OS concerns and authentication being maybe the trickest.  Offline with modern HTML5 was one of the lesser concerns.  Many words were also committed to how many and which browsers should be supported and although not absolute final answer was given most folks seem to agree upon supporting Chrome and Firefox by the community and that individual Evergreen members may support others.  

Collaborative notes were done by several parties and remote participation was good, both of which I was happy about.

After discussion we broke into groups looking at MARC export, web based staff client proof of concept, serials and mobile OPAC.  In between talking about the staff client I worked on some merchandising and web site issues for Evergreen (as well as handling some SCLENDS and York issues as they popped up).  

We worked until we were getting fairly punchy and broken to freshen up and head to dinner.  I ended up a great Thai place with good spicy food.  After that we ended up doing what I called the Hack-A-Way Lobby version with eight of us until I ran out of steam at 11:30.  Today is a new day.

Sound and Fury

Well, I got home from a road trip to find my comp copies of the July/August Computers in Libraries waiting for me and some emails!  I sat down to re-read it because frankly I wrote it long enough that I don't remember much of what I wrote.  

http://www.infotoday.com/cilmag/jul13/index.shtml

The article is about open source, including Evergreen, and selecting an ILS.  A few bit things:

1) They gave it a nice attractive spread.  That's vanity on my part but I like it. 

Front spread of the article.  

Front spread of the article.  

2) I'm still happy with my opening paragraph.  "Few decisions cause a library director to fret more than choosing a new integrated library system (ILS).  No matter what you acquire, a new ILS is expensive in terms of money, staff, time and stress.  Additionally, the wrong choice can damage morale and having lasting consequences.  Sometimes it is easy to identify which ILS is wrong for you - the contract costs are too high or maybe the features that you need aren't present.  But, too often, selecting the right one is like going to a car dealership where everyone speaks in tongues and the price lists are encrypted."

3) They re-used an old bio bit for me from my days working at the State Library which is wrong.  I'm at the York County Library System now.

Now, for the email I got and my response:  

From Greg, full name withheld to protect the guilty :)  :

"I just received my copy of the publication "Computers In Libraries", July/August 2013. I thought your article "Sound and Fury" was an excellent guide for libraries considering a migration of their library systems, but I was a bit surprised that you cited "LibLime Koha and Evergreen" as examples of open source ILSs. I rather suspect that many open source people would regard LibLime Koha as open source only by the letter of the law, and not by spirit or community. Evergreen is indeed an excellent example of open source software, but I wonder if it suffers by its apparent close association in this context with LibLime Koha.
Koha (!= LibLime Koha) is a much more openly developed and community supported example of an open source application than the LibLime fork. Your article deals very well with the subject of selecting vendors; the paid-support page for Koha (
http://koha-community.org/support/paid-support
) lists 37 vendors world-wide (if my quick count is correct and deducting two entries for PTFS). I'm under the impression that only PTFS supports LibLime Koha, but perhaps there are others. Many of the listed Koha service providers provide hosted application (ASP) solutions as you mentioned in your article. 
A quick count of my Koha mailing list messages for July 24-31 shows 86 entries (sorry, I got tired of counting after going backwards for one week), that probably extrapolates to about 350 messages per month. I don't follow the free support for LibLime, but I've been told that it's more questions than meaningful answers. Link of possible interest: https://listserv.nd.edu/cgi-bin/wa?A2=ind1308&L=web4lib&D=0&P=15401
Code contributions to the Koha development process are encouraged, with contributions and downloads available on a git code management system, and packages are available for Debian-based operating systems. Koha also has an IRC channel where developers discuss issues, and where users can <mostly> ask questions and get answers to problems they are experiencing. I'm not aware that LibLime Koha is as openly developed or freely supported. 
Again, I thought your article was excellent, but have misgivings about your citation of LibLime Koha instead of Koha as an example of open source software."

My ill thought out but honest response:

"Hi Greg,
I appreciate the feedback.  Looking back at the article I'm a chagrined about that.  I admit I'm an outsider in the Koha community though I have a fondness for any open source library project.
Just last week got a chance to chat at length with a gentleman [name and association redacted to protect those who didn't give permission to be used].  He actually reached out to me because of an upcoming talk I'm doing. I was aware of some community conflict with LibLime but he gave me a lot of context of the Koha VS KOHA issues.  Suffice it to say that if I had known I would have mentioned Koha differently.  Technically what I said is correct but obviously doesn't address the serious community concerns there and looking at community is central to the issue I wanted to discuss.  
Maybe on some level it's best to not have written about that there.  It really is an issue that deserves discussion in more depth.  I've thrown out the idea to the editors of CiL of doing an all open source issue (it's been about four years since they've done one).  If that happens I would love to work with someone to write about the Koha community issues in more depth. Still, whether it was the place for it or not I think I would have written that bit a wee bit differently.  I'm always glad to get opportunities to trigger discussion (even if the price I pay is putting my foot in my mouth occasionally).   "

 

Looking back and getting to read my article again it doesn't really detract from it.  It's just a quick reference at the beginning but I do regret it and feel that I should write something about communities in open source projects as a follow up which makes me start thinking about projects beyond Koha and Evergreen, failed and successful to look at.

Inventories With Evergreen

Recently I've gotten a lot of questions about doing inventories in Evergreen because I've proposed, and am looking for funding partners for, a full fledged inventory component to Evergreen.  I've heard folks complain about this missing for the past five years both inside and outside my consortium.  

Inventories themselves can be complicated or not but follow a fairly simple recipe:

1) Find out what you have.

2) Compare it to what you should have. 

3) Correct the exceptions. 

Within that you can have a lot of diversity.  What are your controls on what you have?  Libraries have items moving around all the time so there are a number of variables to control for.  And what level of correction do you perform?  This can range from the simple to very, very complicated.   Additionally, are you doing this in item buckets or via SQL?  That determines a lot of the scale that you can perform operations on.  What is your project management like and are there functions you want to have more segregated?  All of these are things I think baked in functionality will help with.  

Still, folks have managed to do their own inventories.   

Indiana has a great detailed writeup of their process here:   

http://www.in.gov/library/files/How_to_do_an_Inventory_Using_Evergreen.pdf

There was a presentation also done at the 2012 conference but the site's gone now and I can't remember who it was:  http://evergreen-ils.org/dokuwiki/doku.php?id=conference:2012

However, I get a lot of questions to the effect of "that's all nice and good and the development would be nice but I need to do an inventory now and we need something simple."  So, here is the barebones I process I setup for an SCLENDS member library and how I helped a Windows based admin set it up via SQL.  The admin can do it without SQL using smaller batches in buckets but there are several steps that are more difficult.  

 Mind you, that where this can get most complicated is in correcting for human error, the kind of thing that when it's built into Evergreen we can have the computer do more of for us since at this point we have to manually tell tools to do these checks each time rather than have had a programmer tell a computer in advance to do so in a stored manner.  

This process isn't perfect but can be done on large or small inventories and asynchronously by multiple groups. 

Step 1) Set everything in the set of items to be inventoried to trace if it's current status is checked in.  

Step 2) Setup carts with scanners and laptops.  Ideally use two person to each cart.  One moves materials while one scans.  Scan into the item status screen using the barcode.  Make sure that all item screens are setup the same to show barcodes.  Turn up the volume on the error blurting noise for a misscanned / not barcoded item.  Pull items that aren't in the catalog.  When done scanning save the file with information indicating branch and shelving location out to a delimited text file.  All you really need to display is the barcode.  If you're willing to pass on finding items that shouldn't even be there during the initial pass you don't even need an Evergreen client.  With some libraries having spotty wifi being able to just scan into notepad was really nice.  It also made it more reliable.  I found that the Evergreen staff client would occasionally crash causing people to lose work.

If you do have a very high volume of errors in your collection you will want to do smaller chunks and not have groups work asynchronously.  If this is the case you may want to display status to correct as you go, especially checking in checked out and lost items.  I don't recommend correcting checked out items by batch as there are a lot of potential variables that impact on customer service unless you're doing something like blanket wiping out associated charges in their favor.  You may also want to show shelving location and branch to resolve.  An item's call number will usually show you where something is out of place but not always, especially if it's the wrong branch or material from another library.   

Step 3) Now, combine text files. This same functionality can be done in buckets but I found relying on buckets was too slow for doing large updates.  If you can do it via SQL having it in text files is convenient.  However, to avoid losing work with the staff client this meant a lot of small text files.  I would sort them out by shelving locations into different folders and then combine them.  These commands work on Mac and Unix and Windows with Cygwin I believe.  

ls > filestocat.txt  

xargs < filestocat.txt cat > barcodes.txt

Essentially you're making a list of everything in a directory and then using that as a list to combine the files in one big data file. 

In a perfect world this step is done.  However, in the real world invariably folks won't follow some part of step 2 correctly (another reason for baked in functionality).  So, the list will probably need to be brought into Excel (or other tool for working with delimited data) and have mismatched columns corrected for.  

Step 4) Do reports and corrections.  This is the point at which you can get fancy or keep it simple.  Reports should be used to find things out of position.  You can do this manually or even have staff go through a list and find them.  If there are a huge number out of order you may be better off just doing a shelf reading.  At a minimum run an update statement to move anything in the list currently marked as trace to checked in.  You want to check for the current status of trace in case it was checked out in the interim.  You may want to run a list of the trace items for staff to look for.  You may want to do updates to correct for branch and shelving location in case those are wrong.  You may want to batch delete everything still listed as trace.  Whether or not you want to do a second inventory pass will depend on how many exceptions you found.  

Step 5) Rest and bake in the relaxed feeling of a perfectly shelved collection.  This usually lasts in minutes rather than hours.  :)

----------------------------------- 

None of these steps are sacrosanct.  Each organization will probably adjust them.  The needs of your organization will determine much of that.  But all of this has a lot of steps that are repeatable tasks that computers can do better but right now we have to manually manage.  However, instead of just adjusting a few preferences or org unit settings right now we have to adjust the work flow and documentation significantly for each change and trust to humans to be accurate and precise each time instead of letting the computer do the work for us.