Today we are liveblogging from the OR2012 conference at Lecture Theatre 4 (LT4), Appleton Tower, part of the University of Edinburgh. Find out more by looking at the full program.
If you are following the event online please add your comment to this post or use the #or2012 hashtag.
This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.
Topic: Multivio, a flexible solution for in-browser access to digital content
Speaker(s): Miguel Moreira
Multivio is a generic browser and visualizer for digital objects, a presentation layer for document servers, and an add-on for other infrastructure. It’s main principle: when searching a document server, users are provided with immediate access to content. It’s origins lie in RERO and its digital library. In 2006, an internal survey showed desire for a service that eventually became Multivio – an adequate presentation layer for full-text, structure-rich files and show patrimonial (heritage) collections. It does all of this quickly and directly, as opposed to traditional solutions.
Multivio was developed because other solutions were not flexible enough. It is co-funded by RERO and the Electronic Library of Switzerland. Development took place between 2008-2011, with an official release in 2011.
Using Multivio is straightforward. Provide a URL to a file (PDF, image, sound, video, etc) or a combination of files. Then Multivio will investigate structure and content, and provide it to the user in a convenient searchable interface in browser.
Multivio is a full-featured HTML5 document viewer. It allows zoom, search, copy and paste. It also has an elegant way of handling large and multi-file documents, which can be shown together without downloading. It is low-bandwidth consumptive, and based on widely accepted web standards. All it requires is a modern browser client-side. Server-side, the role of Multivio is rendering, search and extraction. It uses Python and Poppler (for PDFs). The only other requirement is that remote content be fetched and stored on-server.
Multivio.org to check it out. For a public demonstrator, go to demo.multivio.org – usable with any web-accessible document.
The advantage of Multivio is performance, customization, access control. It only requires a Unix server running Python.
The CORE Portal is using Multivio now.
In the future, support for audio and video will be added and improved, along with authentication and access control. Calendar-based navigation of publications is coming as well.
Q: Do you do PDF file processing beforehand?
A: No, it is all done on the fly. Poppler is very effective at doing this. It wastes no time or bandwidth in grabbing what it needs.
Q: Do you do OCR processing? Can individual pages of a document be shared/navigated to directly?
A: No OCR processing. As for page-specific URLs, the client API allows for file URLs with page numbers. This isn’t being used for analysis of document usage yet, but that is very interesting.
Q: For multimedia, what experience do you have working with it?
A: We are starting to have and use that content. Prototypes are showing one video format so far – we must work on that. It’s a challenge, but we know it’s possible. We will rely on HTML5 and modern browsers, and if needed maybe fall back on Flash. Further investigation has to be done.
Q: More details on access control?
A: It’s on the todo list. Right now the solution is to install the Multivio server alongside protected documents. Multivio needs access rights, then it can restrict what it displays.
Q: How will this interact with usage metrics?
A: There’s an intention to work on this in the future. It’s important. We will still provide direct download, and do basic view analysis, but we hope to go much farther.
Topic: Biblio-transformation-engine: An open source framework and use cases in the digital libraries domain
Speaker(s): Kostas Stamatis, Nikolaos Konstantinou, Anastasia Manta, Christina Paschou, Nikos Houssos
This will be a backend talk. Sorry in advance. This is an open source framework that has been in development for 4-5 years. It facilitates digital transformations in library systems. It’s a solution to a common problem.
This tool has been used extensively so far. Digital transformations are a necessary reality in libraries, repositories, everything. You need to transform data to get into any publishing system or database, to migrate it or share it. Such processes need to constantly be evolving, so the framework provides systematic management of code that does all that. This will accelerate common transformation tasks.
The first step in a framework is creating an analysis, finding the abstractions that will represent common procedures. From that, the steps are retrieving data records, applying processing and changing any given records or field values, then finally generating the desired output. The less obvious finding is that there is a demand for incremental or selective data loading – breaking up the task, say.
The design goals demanded customisability, non-intrusiveness, ease of use, and the ability to integrate or extend for anyone who needs the Biblio-transformation-engine.
The components of the engine. The Data Loader itself, which retrieves data from sources according to its own spec. The Processing step transforms information with a filter, then modifier, then initializer. The output generator actually creates the desired product.
The FLOSS library was developed in Java (maven-based). FLOSS is available online in EU Public License – free to download, use, comment upon.
Use cases. One is generating linked open data in repository records, legacy cultural material records, CERIF information. Corresponding data loaders are reused. Filters and modifiers can be totally agnostic of RDF and input formats. JENA RDF generates triples. It also adds or generates appropriate identifiers/URI for entities.
Another is populating repositories from EndNote, RIS, Bibtex, UNIMARC. A third and fourth are feeding VOA3R and European aggregators.
In the future, the project hopes to support more data transformations, extend declarative specification of mapping for complex cases. Also some infrastructure to reuse Filter and Modifier implementations. Finally, the project would like to study user experience to sort out the little things and make life easier.
Q: You’re using CSL and JS – are you running JS on client or server side?
A: JS on server side. A modifier calls a JS server.