Jul 112012
 

Today we are liveblogging from the OR2012 conference at Lecture Theatre 4 (LT4), Appleton Tower, part of the University of Edinburgh. Find out more by looking at the full program.

If you are following the event online please add your comment to this post or use the #or2012 hashtag.

This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.

Topic: Griffith’s Research Data Evolution Journey: Enabling data capture, management, aggregation, discovery and reuse.
Speaker(s): Natasha Simons, Joanne Morris

Griffith’s research group has grown and developed into a project-driven organization over the last year.

Griffith University is young, starting in 1935. Five campuses, international student body. People might not end up studying when they go here, the beaches are just too nice. That said, the university is very active in research.

The Research Hub is a metadata store solution based on VIVO, which pulls from various databases and stores data plus relationships. It can export that information and has researcher profile systems built in, showcasing outputs and data of authors.

Developing the hub was driven by a global push to manage large volumes of research data worldwide, thanks to improvements in technology. Improving accessibility is a key feature, especially for making Griffith a world class research institution.

Done with funding from the Australian National Data Service (ANDS). Their discovery portal pulls together Australian research and makes it available to the public. The MetaData Exchange Hub, which Research Hub is built upon, collects appropriate metadata and provides custom feeds from CMS in a standard format. This improves discovery in a sustainable way.

Where to get data, and information about it? Griffith already had a database and publications repository, as well as research management databases and a meta-directory for other services. The meta-directory made some private information stores accessible in an indirect way.

Sources go into the hub. From the hub, there is interaction with persistent IDs of authors and personnel. Hub pushes out discovery environments to governmental and educational institutions and beyond.

Uses the ISO 2146 standard as its Registry Interchange Format. Four different kinds of objects: collection ,party, activity, service. Each of these objects is related to one another.

Uses VIVO, a semantic web, RDF triple-store approach to gather and share.

The research hub is a one-stop shop for all Griffith research activity, an open source and international solution. Enter data once and use it everywhere. It is automated, aggregates multiple sources, preserves.

Challenges. Early versions of VIVO were not final products, making Griffith a guinea pig. Getting private information or scraping public information from it took some hacks and workarounds. There are some latency issues. Self-editing in the hub is preventing proper presentation of occurring – two versions of a piece of content then exist.

In the future, phase 2 will allow further export and visualization support. It will also begin to track citations in a data citation project. This will help show the value of the hub.

Q: Usefulness of scraped metadata? What is the feedback on usage?

A: It is a fairly rich bit of information, though statistics have not yet been collected on actual use. Data citation is not being used fully yet, so statistics are lacking thus far. It seems that cross-discipline researchers are using the data more.

Q: In terms of persistent IDs, can you explain what standard you are using, how?

A: VIVO has its own standard, as does ANDS. Other identifiers are supported, ones that are used at a national level. The DSpace repository uses handles, and DOIs are being minted.

Q: It seems that people are risk-averse in the UK when it comes to publication and citation. What returns are sought in Australia?

A: Tracking citations is further down the track for Griffith. Until automation can be figured out, the citation and sourcing problem will be a struggle.

 

Topic: Building an institutional research data management infrastructure
Speaker(s): Sally Rumsey

It’s all about collaboration with everyone.

The need for public data repositories seems to be coming from a different direction than library repos and data management originally did. At Oxford, many projects are underway, but this particular project takes bits from each and adds more for a core research data management infrastructure.

The demand comes from disappearing data, response times, citation tracking, etc.

The EPSRC is leading the way pushing for solutions to these demands.

The DaMaRo project has four strands: research data management policies (which Edinburgh is ahead of the curve on), something in development; training, support and guidance – not just students and researchers, but support staff reskilling to bridge university communities; technical development and maintenance; and sustainability, the business plan post-funding.

Data governance is guiding the action. DataStage with SWORD to DataBank, and ViDaaS are headed toward the Oxford DataFinder. DataFinder will be the hub of the whole infrastructure for the university. It holds metadata, relates content, assigns IDs, pulls from regional DataFinders and Colwiz.

The plan is to integrate with federated data banks and share with everyone. DataFinder is just the store for metadata – a convenient search and discovery tool. DataFinder is built on DataBank, so they are totally compatible and import-export functionality is seamless. DataFinder is metadata agnostic – no sense in being picky.

DataFinder can hold records for just about anything, including non-digital objects (papers, specimens, etc). Populating with all of that will require some manual entry, also pulling in existing data. There will be a minimum metadata set about a given object: it will be as small as possible for the sake of the researcher and for cross-discipline functionality. Optional information fields will also be included, and contextually require particular fields of metadata depending on funding bodies and disciplines.

DaMaRo is not a cure-all, it is a foundation for research management in the future. By the end of March, some set-in-stone goals should have been met: it has to be good enough, not perfect. Just enough services, metadata, and the capacity to provide immediate and flexible needs.

Q: Why is content format not a metadata requirement?

A: Hopefully it will be automatically picked up like the size of the content file. It could be asked for, for preservation purposes, but automatic is ideal.

Q: What is the status of the code for DataFinder?

A: Really new. Look around in the Autumn for something to play with.

Q: Did you weigh up the pros and cons of making researchers give more, better data? Is there a balance between ease of use and quality?

A: People don’t like to do this stuff manually. There is encouragement for as much information as possible, but mandating fields doesn’t necessarily work across fields. Certain fields have their own complex metadata stores as well, but starting with low barriers will give a realistic view of what people are willing to do.

Q: Will the cost of running this be passed on to researchers?

A: That seems inevitable. As part of research funding, the cost of sharing outputs will have to be put in. This will be included in researcher training, hopefully, but unfunded research posts a problem as well. How will that all be funded? Who knows. Most institutions will have to handle this problem.

 

Topic: Institutional Infrastructure for Research Data Management
Speaker(s): Anthony Errol Beitz, Paul Bonnington, Steve Androulakis, Simon Yu, David Saint

Why do researchers care about data more? We know they have an onslaught of data they access and share, whether it’s their own or others. There are legal obligations of privacy, of fulfilling grant requirements that infrastructure can help with. And access to data will change the way researchers propose and work in general.

Research institutions can get additional funding, better folks with good data stores. And they can escape legal risk.

Researchers work in an interpretive mode, focusing on outcomes. They are open ended and thrive on ambiguity, and they are very responsive. Things are always shifting and so are they. Researchers will be loyal to their research community, if not their institution or their ICTs.

Universities that support researchers with IT divide that responsibility into administration, education, and research. Each works toward different ends but all need to work in the same space. Continuity is a general priority. IT groups work in an analytical mode, not an interpretive one, so there is a bit of a clash.

Data is growing at an exponential rate, and budgets are not. How to keep up is a big worry.

Data management planning fits into the design stage of research, while research data management covers everything from experimenting to publishing. Repositories and portals handle exposure.

Research data management is in its early days. Researchers are still using physical media to move their data around. That isn’t just inefficient, it’s dangerous. Providing one RDM solution for every field, every institution, will not work. It takes a good cultural fit, a community solution and not an institutional solution – it comes back to loyalty. That means universities need to adopt tools, adapt them, and develop from there. Creating unique tools is not sustainable, or even initially possible. Developing new things is expensive and it breaks the collaboration cycle of academic communities.

There are many deployment considerations to take into account: hosting options, ethical or legal or security obligations must be met. This just means institutions need to be flexible.

RDM platforms will help researches capture and share and publish. Data Capture infrastructure feeds the whole RDM platform, which is built on storage infrastructure foundations. Support infrastructure, the forward-facing aspect, puts all of the information into discovery services so it can be used in a meaningful way for everyone. Monash university is building infrastructure only when it fits into the data management planning that has already been laid down. Go through the checklist, not just for the sake of acting.

Uptake of interoperable effective RDMs in Australia, and particularly Victoria, is quite high because everything is as easy and functional as possible.

Ensuring fit-for-purpose at Monash University. The technical aspect of a given solution is not pushed – early engagement leads to development. So adopt Agile software development methodologies. This is a product, treat the researcher like a customer with lots of demands.

Promoting good adoption means creating a sense of ownership in the community. Let ‘them’ support it, raise awareness, and find funding for it. This ensures sustainability, and it’s been quite effective so far.

Supporting eResearch services requires a different psyche. Bespoke systems addressing unique needs, with unique support (no vendors or huge communities – so eResearch support groups in general). And many groups need to be engaged.

Again, adopt first, then adapt, and develop your own solution as a last resort.

Q: You’re plugged into specific federated repositories, not a university-specific one. Are there ever cases where there is no disciplinary solution?

A: Yes, but then academics go to a national repository instead of an institutional one. Monash can host data and disseminate it to whatever repository. It doesn’t publicize, it exposes data to research portals.

Q: There’s a primary data policy at Monash – at what granularity? Only publication-associated data or…?

A: It’s pretty much all data, without discrimination. Researchers and institutions will choose, but as of now all is OK. Storage is a non-issue so far.

 

 July 11, 2012  Posted by at 9:58 am LiveBlog, Updates Tagged with:

Sorry, the comment form is closed at this time.