Jul 102012

Today we are liveblogging from the OR2012 conference at Lecture Theatre 5 (LT5), Appleton Tower, part of the University of Edinburgh. Find out more by looking at the full program.

If you are following the event online please add your comment to this post or use the #or2012 hashtag.

This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.


Topic: The Development of a Socio-technical infrastructure to support Open Access Publishing though Institutional Repositories
Speaker(s): Andrew David Dorward, Peter Burnhill, Terry Sloan


Trying to create an infrastructure before the open access revolution happens. Sooner than later, it seems, so the team is trying to create a template for the UK and Europe, RepNet.

RepNet aims to manage the human interaction that helps make good data happen. This is an attempt to justify the investment that JISC has made into open access and dissemination.

RepNet will use a suite of services that enable cost effective repositories to share what they have.

First, they mapped the funders, researchers, publishers, institutions to see where publications are made.

RepNet hopes to sit between open access and research information management by differentiating between various types of open access, between standards,

Through conversations with all the stakeholders, they’ve put together a catalog of every service and component that would go into a suite for running such a repository.

Funders’, subject, and institutional repositories will all sit upon the RepNet infrastructure. This will offer service support, helpdesk and technical support, and a service directory catalogue for anyone hoping to switch to open access. All of this will then utilize various innovations, hosting, services to get to users.

RepNet also has a testing infrastructure.

RepNet is past the preparation stages now, and moving into implementation of a wave one offering that integrates everything. The next iteration will take what wave one teaches the team and improve the offering further.

Deposit tools, benchmarking, aggregation and registry are already available, and wave two will bring together more and bigger services to do these things with repositories.

The component catalogue is getting quite comprehensive, with JISC helping to bring in and assess new ideas all the time.

RepNet is being based on the information context of today – policy and mandates, plus the strong desire for open access.

The UK is a great country to be in for Open Access, there’s quite a bit of political support in favor of moving in this direction.

If the market is to be truly transparent, gold open access payment mechanisms will have to be handled. This is something new that RepNet is working on figuring out.

The focus now is on optimizing wave one components, a very comprehensive set of tools and funder-publisher policies working with deposit and analysis tools to make everything easily accessible. REPUK, CORE, IRS, OPEN DOAR, ROAR, NAMES2 are all components being looked at, which wave two will.

ITIL is being used as the language for turning strategies and ideas into projects.

There is also a sustainability plan, submitted by SIPG members: subscriptions, contributions, payment for commercial use. Further JISC underpinning is being considered as well.

Part of RepNet will be a constant assessment of services: when one needs to be retired, it will move back into the innovation zone and included again when there’s a demand.

RepNet provides an excellent service to support green and to further investigate gold open access. It will give us a great way of assessing repositories, better integration, and less human-intensive management of repositories.

The aim now is to move to data-driven infrastructure, letting different projects speak to each other through reporting mechanisms. This will make it more integrated and, ultimately, more useful.

Wave two will focus on micro services.

The sustainability plan will hopefully be put in place before 2013

Q: Academics can see how all this works. Are there plans for making these sorts of information and services available to the public?

A: It’s all about integration with common search tools. There’s a vast gap between what has surfaced because of professional search tools, and what something like Google finds via its own crawler. It’s also important to make deposit accessible to everyone else, of at least thinking beyond the academic lockdown instead of just focusing on the expert community


Topic: Repository communities in OpenAIRE: Experiences in building up an Open Access Infrastructure for European research
Speaker(s): Najla Rettberg, Birgit Schmidt

OpenAIRE is rooted in the interests of the European Commission to make an impact on the community. Knowledge is the currency of the research community, and open access needs to be its infrastructure.

The hope is that something stronger that the green mandate comes about as the European Commission talks more about this.

OpenAIRE infrastructure aims to use publication and open data infrastructure to release research data. It involves 27 EU countries and 40 partners to pilot open access and measure impacts.

OpenDOAR, usage data, and EC funding have all fed the growth of the project. The result is a way to ingest publications, to search and browse them, to see statistics linked to the content and assess impact metrics.

Three parts, technical, networking, service. Networking brings together all the partners, stakeholders, open access champions. They run a helpdesk, build the network by finding new users, researchers, publications. This community of practice is very diverse. With everyone together there is an opportunity to link activities and find ways to improve the case for OA.

OpenAIRE provides access to national OA experts, researchers and project coordinators, managers.

Research administrators can consider a few things in their workflows to follow the open mandate and OpenAIRE shows them statistics about their open access data as they do so.

Everyone is invited to participate by registering to OpenDOAR and following OpenAIRE guidelines. OpenAIRE offers a toolkit to project officers to get going. As of now there are about 10000 open access publications in the repository from 5000.

Part 2: OpenAIRE Phase 2

OpenAIRE phase 2 will link to other publications and funding outside of FP7, shifting from pilot to service for users.

300 OA publication repositories are being added, along with new data repositories and an orphan data repository. Don’t forget CRIS, OpenDOAR, ResearchID. All of this will go into the information space, using text mining to clean things up and make it all searchable

Now there are OpenAIRE guidelines being built for data providers. These look at how to connect metadata to research data, and how to export it for use externally. It isn’t so much prescriptive as exploratory. With these in hand, other countries and organizations with less developed OA might be able to improve their own data offerings.

The scope of OpenAIREs work is wide. Most fields and types of data welcome, so keep in touch.

OpenAIRE is building a prototype for ‘enhanced’ publications, letting users play with the data within. This will be cross-discipline, and can be exported to other data infrastructures. Also working on ways to represent enhanced publication more visually and accessibly.

What connects data to the publication? OpenAIRE is on the boundary exploring that question.

The repository landscape is very diverse, but so are the tools for bringing data and repositories together. OpenAIRE is aware of data initiatives, stakeholder AND researcher interests. OpenAIRE is running some workshops and will be at the poster sessions throughout the conference.

Q: CORE has done a lot of text mining work already. Have you spoke to them?

A: There has been discussion with CORE about repositories, but not text mining. OpenAIRE is working with several groups.

Q: You want to develop text mining services. In this area, with linking repos and content, CORE offers an API for finding those links and for reclassification. You aim to develop these services by 2013, are you aiming to use other tools to do this, and are you happy to do so?

A: The technical folks are here for that very purpose, so keep an eye out for the OpenAIRE engineers.



Topic: Enhancing repositories and their value: RCAAP repository services
Speaker(s): Clara Parente Boavida, Eloy Rodrigues, José Carvalho

RCAAP comes from a Portuguese national initiative perspective. It is a national initiative to promote open access for the sake of visibility, accessibility, dissemination of Portuguese work.

The project started in 2008 as a repository hosting service, moving forward to validation and Brazilian cooperation, then a statistics tool.

Overseen by the FCCN.

Learn more at projecto.rcaap.pt if you are a repo manager or journal publisher.

The strategy for SARI, a part of RCAAP, is creating a custom Dspace (Dspace++) for Portuguese academic users. It offers hosting, support, design services, and autonomous administration, all for free. 26 repos use this service, and because of the level of customisability offered, SARI repos all have their own unique look and feel.

Another service in RCAAP is Common Repository, for people who do not produce a lot of content but want it to be openly accessible in a shared area. 13 institutions are using this slimmed down repository tool.

RCAAP search portal enables users to search all open repositories in Portugal, and participating organizations in Brazil. 447934 documents from 42 resources, updated daily. Users can search by source, category, keywords.

Aggregated information is all OAI-PMH compliant. Further, it is an SRU provider. PDF and DOC files are full text searchable. Integration via various other tools.

A given entry can be shared, imported to reference managers. Author CVs and all related metadata are connected.

RCAAP Validator users Driver Guidelines to assess a URL based on validation options. A report will then be emailed, including statistics for openness and errors when checked against Driver Guidelines. Errors are described. Assessment checks aggregation rules, queue validation, XML parsing and definitions check, and also confirms that files in the repository all actually exist. Three types of metadata validation are done: existence of each element (title, author, date, etc), taxonomies (iso and driver types), and structure of metadata content (driver prefixes, proper date formatting). Checking all of this ensures a good search experience.

Dspace add-ons are enabling included repositories and infrastructure to ensure openness and compliance with standards.

Add-ons include OAIextended, Minho Stats, Request Copy (for restricted access content), sharing bar, Degois (for Portuguese researchers), Usage Statistics, OpenAIRE projects Authority Control (auto-documentation of workflow activities), Document Type, and Portuguese Help.

Another service is SCEUR-IR. It aggregates usage statistics and allows the creation and subscription of graphic information.

The Journal Hosting Service uses the same strategy as the repository hosting service mentioned earlier, and allows total autonomy to publishers.

In Portugal and in Brazil, open access is very successful. The factors for that success and interoperability guidelines, community help and advocacy, and integration with research systems.

Q: What sort of numbers does a medium sized university in Portugal see for downloads, use?

A: Thousands of downloads/hits per day. Bigger universities will see 5000 or more.

Q: JISC is mandating metadata profiles and analyzing with Driver guidelines, and is looking to make a validator tool. Is the validator tool open source?

A: No, but information exchange is always welcome.

Q: Further on the validator, you are actually checking every record for every file – whether accessible or missing?

A: That is an option. When checking, the user would set which repository format they are using, and then run a check against that profile. Once an attempt to download has been made, after 2sec of attempted download, the entry will be marked available. Failure to start the download would be marked as an error.


Topic: Shared and not shared: Providing repository services on a national level
Speaker(s): Jyrki Ilva

The national library of Finland, an independent institute within the University of Helsinki, provides many services to the library network in Finland.

While there are 48 organizations with institutional repositories, only 10 public instances exist. Most of these repositories use the national library service.

The National Library provides its services to about 75% of Finnish repositories. They are not the only centralized service provider, but they are in a minority. The ‘do it yourself’ mentality has taken root, but with instancing of repositories anyone can and should have one. The same work does not need to be done over and over again in every organization.

‘Do it yourself’ does not always make sense. It is more expensive and often not as well executed as using other services. In Finland, there is much more sharing going on than in other countries.

Many countries started OA repositories in the 90s and 00s. The National Library started the idea of the digital object management system in 2003 – the first attempt at a proprietary software platform, which did not work as planned. The National Library chose DSpace instead, starting in 2006.

One of the challenges was trying to make one giant DSpace instance for all organizations. Not enough instance, and so the idea was not sustainable. Local managers were concerned with the threat of requirements and demands in the national repository system.

Fortunately, the National Library was chosen to be the service provider at the Rectors’ Conference for Finnish Universities of Applied Sciences in 2007.

Work is divided between customer organizations and the National Library. Curation and publication done locally. National Library develops and maintains the technical system.

Theseus, a multi-institutional repository, has seen much success. 25 universities, tens of thousands of entries.

Doria, another multi-institute platform, is technology neutral and allows more autonomy of its participant communities. This freedom allows for customizability, but less quality metadata and increased confusion amongst users.

Separate repository instances are also provided at extra cost to customers. Some organizations just prefer their own instance. TamPub and Julkari are two examples.

Selling repository services comes down to defining strong practical needs amongst customer organizations. Little marketing has been done – customers have a demand and they find a supply. Long-term access, persistent addresses for content have been selling points. While not trying to make a profit, covering costs with a coherent pricing scheme is necessary. That said, many customers have relatively small needs, and so services must be kept affordable when necessary. National Library is also considering consultation as a service.

Negotiating user contracts is time consuming, though, and balancing customer projects requires a constant assessment of development in infrastructure in general.

Some Finnish universities will continue to host their own repositories, but cooperation benefits everything: technical and policy development can be improved.

Measuring success can be done on various levels. Existence of the repository is a start. Download metrics are another option. Impact assessment should be done. Looking at measures for success, National Library is still struggling with research data and self-archived publications. They’ve had success with dissertations, heritage materials, journals, but there is always room to improve.

Q: What is the relationship between the national library and non-associated repositories? Is meta-data being recorded? Metrics?

A: In most cases, organizations are reporting to an umbrella body, which keeps track of everything.

Q: Any details on the coherent pricing system?

A: Pricing has so far been based on the size of a given organization, how much data and how many hits they will have.

Q: Are you very service oriented, and is this something that the National Library does in general, or are repositories a special case?

A: Partly a special case, because funding is not guaranteed. It wasn’t so much intended as demanded for the sake of sustainability.


Topic: The World Bank Open Knowledge Repository:
 Open Access with global development impact
Speaker(s): Lieven Droogmans, Tom Breineder, Matthew Howells, Carlos Rossel

Publishers are not usually the pushers of open access, but the World Bank isn’t like many other organizations.

Why do this? The World Bank is trying to reach out and inform people of what the organization actually does, what its mission is: relieving poverty.

World Bank funds projects in the developing world, whether pragmatic solutions or research for the sake of outcomes.

When World Bank changes direction, it goes slow but it really changes. The Access to Information Policy is wide and distinct within the organization. In particular, there is a focus on open data, research, and knowledge. This launched in 2010 with the object of ensuring as much bank data was as accessible as possible for the sake of transparency and further reach.

The Open Access Policy has been adopted, as of July 1st. It is an internal mandate for staff to deposit all research into the repository. This also applies to external research funded by the bank. This data is on OKR and Creative Commons attributed (CC BY). Externally published documents are made available as soon as possible, with a more restrictive creative commons attribution.

World Bank wants to join the Open Access community and lead other IGOs to do the same.

There are benefits externally and internally. Externally, policymakers and researchers gain data. Internally, authors and inter-department staff can access information that they did not easily have before.

World Bank was, until a few years ago, a very traditional publisher, but the desire that the Bank has to free its information for reuse has caused a shift.

That’s the why, and here’s the how.

Content included? Books, working papers, journal articles internally and externally. In the future, the Bank aims to include author profiles with incentivized submission, and the recovery of lost or orphaned or unshared legacy materials. Making the submission process easy and showing usage stats are also forthcoming. The latter will make entries more useful and visually appealing. Finally, there will be integration with Bank systems for interoperability of data.

This is all being deployed on IBM Websphere, integrated with Documentum, Siteminder, and the Bank’s website.

Along with the Open Development Agenda, the Bank is also contributing open source code, including facet search code.

Facet search allows results to be refined by metadata category (author, topic, date, etc). Each facet will show result counts.

World Bank uses a taxonomy of items that includes the full map of category hierarchy. To get the facet search filter count right, World Bank checks entries against each other when showing results and return the proper number of items related to a given filter selection.

Now the problem is an attempt to resolve the need for drill-down of search infrastructure, showing multi-indexed browse of an entire hierarchy of a subject.


Q: Is it possible to showcase developing country research?

A: Yes. World Bank is looking for content in Africa right now, and attempting to gain access. This outreach is coming along slowly but surely.

 July 10, 2012  Posted by at 5:02 pm LiveBlog, Updates Tagged with:

Sorry, the comment form is closed at this time.