Today we are liveblogging from the OR2012 conference at George Square Lecture Theatre (GSLT), George Square, part of the University of Edinburgh. Find out more by looking at the full program.
If you are following the event online please add your comment to this post or use the #or2012 hashtag.
This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.
John Howard, Chair of the Open Repositories Steering Committee is introducing our opening keynote. I know there are lots of people who have been to Open Repositories previously. I think it’s fair to say that OR is a conference for people with the hearts and minds to make open repositories work: it’s for developers, suppliers, for everyone involved in the expanding ecosystem of repositories. It’s a learning opportunity, a networking opportunity, a way to step out of our day to day roles and make some new connections and gather new ideas.
This is our 7th OR conference. You will hear more about OR2013 in the closing plenary.
OR was started by people very much like yourselves who are passionate about repositories and wanted to share ideas and experience on an international scale. I wanted to thank the current OR steering group. The Steering Committee steer the direction of the year and select the programme chair, this year Kevin Ashley is the programme chair and he has been engaging in fortnightly calls with ourselves and the c0-chairs of the local Host Organising Group: Stuart Macdonald and William Nixon. Thank you also to our User Group chairs: John Dunn, Robin Rice and William Nixon.
If you want to stay in touch with us throughout the year I would ask you to join our Google Group and follow our new twitter account: @ORConference. And if you have ideas about a logo for OR please do let us know.
Now to Kevin Ashley, Programme Chair of OR2012.
Welcome on behalf of my own organisation the DCC, to EDINA and Edinburgh University! This year we have more attendees, more sponsorship and more sunshine at this year’s conference… one of those may be untrue!
This year we have tried to bring the spirit of the fringe to Open Repositories as we have, for many years, run the annual Repository Fringe event. So please join in the fringier aspects of the programme! And connected to that I want to remind you that ideas for the Developer Challenge must be in by 4pm today so submit them soon!
Cameron has been an advocate and activist for open research for years. Cameron has just taken up a new role as Director of Openness at Public Library of Science (PLoS)
I am going to talk about what we have learned about open repositories from a high overview
Please do use, film, tweet, share, use anything I share today in whatever ways are useful to you?
So, what is the challenge that we face in making research effective for the people who fund it? For researchers we have this incredible level of frustration about being unable to deal with demands of funders, of stakeholders, or colleagues to make the most of what we do. We often feel like we are shouting at each other, more broadcasting than understanding of issues.
There is a sense that something is missing. WE HAVE ALL these tools to do something new but in terms of delivering what we can create and convert from the money we receive, the resources we have access to… but there is something missing in this delivery pipeline that’s not getting us to where we should be.
So I’m going to structure this talk around a sort of 3-2-1 pattern. 3 things to change, wrapped in 2 conceptual changes, around one central principle which, for me, is the useful way to bring these issues and thoughts together.
Lets start from my background. I’m now working for PLoS and I have been involved in open access advocacy for 7 years but I’ve also been interested in open things, open data, open science etc. for years.
I could make a public good argument for open research but that’s not really the environment we are in today. We need more hardnosed and pragmatic reasons to approach these problems. I shall take the tie-wearing approach. A business case. What do we have to deliver as a business, as a service provider, for the people who fund our salaries, our work?
I want to talk about quality of service, value for money, sustainability.
And if I’m shaping a business case then who are we serving? Who is the customer? Who are we marketing to?
You might think it’s policy makers… but they just funnel money through to us. Yes it’s important to make arguments to government but it’s much MORE important to make the case to taxpayers – that wider public that includes us. There is something sophisticated about research and the amount of time it takes to deliver. There is an appreciation of research and the time it takes to do. We saw that last week in the case of the Higgs Boson announcement last week (albeit in comic sans). So the customer is the global public. And they want outcomes. It’s not research outputs but how we effectively translate that to meaningful outcomes.
So why are we having this conversation, why is it happening and why is it happening now? Well we are going through the biggest change in technology since the reprinting press, perhaps since writing. Our ability to communicate has changed SO radically that we are in a totally different world than 20 years ago. Networks qualitatively change what we can do and achieve.
Most of you can remember a time without mobile phones. 20 years ago if I’d shown up and wanted to meet for a drink it would have been difficult or impossible. Email wasn’t useful back then either as so few people had it. When you start with nodes and start joining up the network… for a long time little changes. You just let people communicate in the same way you did before… right up until everyone has access to a mobile phone. Or everyone has email. You move from a network that is better connected network to a network that can be traversed in new ways. For chemists this is a cooperative phase transition. Where the network crystalises out from a solution.
But that’s a really big concept. So if we look at Tim Gowers, a mathematician and a blogger. He wondered if there was a new way to do academic math. He posted a problem on the web, a hard one. He said he didn’t intend to solve the problem but he wanted to involve as many people as possible in commenting on his approach. He expected the problem to take 6 to 9 months. And 6 weeks later he felt his problem was solved, along with a much larger problem being approached in a new way. And it wasn’t as he expected. What happened was a large group of mathematicians discussing the issue on a WordPress blog have been able to think through approaches and solve together a problem one of the worlds greatest mathematicians had not been able to solve alone. It allowed things to be done that were not possible before. A qualitative change in research capacity mediated by a pretty ropey system in which conversations can take place.
I want to talk now about GalaxyZoo. Astronomy is very much driven by the idea of testing hypotheses and that means looking through huge amounts of data. That’s a problem because you can only do about 100 sets of data a week. But you need about 10,000 classifications of galaxies to reach a level where you can publish a paper. Even a PhD student can only do 50,000 in the course of his or her studies. And there is a further problem. Lots of people look at the same data as well so this is hugely inefficient. But that data is from Sloan digital Sky Survey – an open data source of sky data. And there were a million sets of data. Computers don’t classify this stuff effectively. So what did they do? Well they took that data, they put it on the website, and they created a simple mechanism. Those million galaxies were checked 5 times over by 300,000 people in 6 months. That’s qualitatively different.
In both cases the change is because of scale, because of connectivity and mobility of data, and critically because of the efficient transfer of information. Galaxy Zoo could push high resolution images and data could be pushed back by users.
So the question as service providers has to be “how do we get some of that?” How do we make networks? And how do we deliver them so they are the right shape, the right size, the right connectivity for the right problems.
2. Low friction
3. Demand on side filters
The first two of those are easy. We have the web. And really easy transfer of digital and even physical (as long as metadata objects are good) is fast and efficient. But that last issue is hard as out current approach is based on limiting and controlling access. And if you are doing that with research then you are delivering something that no one wants.
So how do we think about that? How can re reconfigure the way we do things?
So, his is a paper by Gunther Eysenbach (2011) JMIR 4:e123, it’s quite a controversial paper, but it starts from the principle that letting people know about research will increase how much it is used. If you can connect those who can and want to use research to that research it makes sense. But it’s a naive way to think about this. Connecting up just the research network overlooks the 400 million people on twitter. There will be someone who will make that connection and help connect those two connections. More than one person. This is a serendipity engine. And you can do new things you haven’t thought of before, expand into new areas of research, you can connect people who do not work in that research process and are interested in that, there are more of them than researchers at this scale.
The problem is that as we let people know that this research exists those connections drop, the effect fades out, because of acces to that research, those publications. Each time you break that network you lose potential outcomes, you lose value, you fail to optimise the network here. You guys know this. You know that open research and collaboration leads to more and better research…
But the problem is that we are used to thinking commercially. The analogy is we take our car to be serviced, and then we rent it back. The problem is that the garage has the ability to say any loaning of a car or renting out breaches the contract. They have to find new ways to make money out of new opportunities. But if we turn that on its head, if you pay upfront in kind or in cash, then the service providers’ interests can be aligned to those of the researcher or the public – if the service provider provides access to the most people possible.
When we talk about publications we need to talk about first copy access. But we can look at recent research in Denmark about economic cost to small business of not having access to research equivalent to maybe £700m in the UK. Let alone saved costs to government etc.
So we talked about those three aspects.
1) Scale the network to make things available. This is being addressed as the old publication model ends. This service industry, ways of making content sharable and discoverable, is a great service to be in.
2) We need to think about filtering at the demand side of the system. We are used to peer review as the filter. But that filtering is a friction if it’s on the supplier side. Whether peer review works or not it can’t always be the right filter, certainly not the right filter for everyone. The thing that you don’t share because the results aren’t useful I need to understand methodology, those results you don’t share as it doesn’t support your argument I need because I want the data, and that garbage paper you wrote I need to learn from myself how to do things better.
We need filters that we control to deal with the issue of filter failure. As a reader or use I want a way to discover what I want, for the purpose I want, at that point in time. Ideally I want to know about this stuff ahead of time. I think this is the biggest opportunity to make everything available in a way that progresses research. This is what you do!
So what does this mean in practical terms?
Well we were at a stage about putting things into the repository, we’ve moved beyond that to thinking about using things in repositories and understanding that use. We need to optimise that repository? What are the barriers? What is the friction? Licensing. Just sort it out. Make open the default. But we also still have lots of broken connections, how can we connect them up? How can we aggregate data on usage and citation? What is the diversity in your repositories? How can we connect things to the wider graph and systems? How can we support social discovery? And how can we enable annotation and link this across resources. Annotation is a link; it probably won’t come from depositors. Mostly it will come from fairly random people on the web!
And the other big shift is to think about quality assurance. Badge it, make high quality stuff clear, But share everything. Just badge and certify the good stuff. It saves you filtering it all down and allows all sorts of usage.
So repositories must be open, they must be accessible, and they have to be open to incoming connections from the global networks,
We are judged on research outcomes, usage when the right person finds it. And in that context a new connection could be more valuable than a new resource. This is a change to our way of thinking. We have to build those networks.
So again 3 areas to deliver… Scale and connectivity of the networks, reduced friction, and demand side filters.
1. The old model of giving away our intellectual property to pay for printing of it is dead! We need mechanisms, maybe through repositories, to make sure research is effective as possible
2. Filter on demand side, probably even automated
And that’s wrapped in one central idea. Think at the scale of networks. Assume that hundreds of thousands of people are looking at your work or want to. Assume that you cannot predict the most important use of your data. How you apply limited resources to engage with the fact that we are operating at a whole new scale is crucial.
We can’t build a system on the old truths. We could build a system on today’s truths but it wouldn’t last long. The only thing we can do for the future is to build for things we don’t expect, to be ahead of trends. Innovators don’t follow markets. They build them. When we provide services for the general public as innovators we need to build for the future. The network and its infrastructures and its systems and capacity are our future.
Q1 – Brian Kelly) You talked about building, not following, markets. We have twitter etc. Should we build the open one?
A1) That’s a really good question. I’ve always been against a Facebook for Science or Github for Science… the best Facebook is Facebook, the best Github is Github. But that was a world where the web is more open. There could be a point where it is worth our while to build tools for connectivity
We are probably a long way away from needing a new Twitter. But we need to be looking out for that. Twitter has 400 million people; whatever we build will have less. It has to get much worse to be worth shifting but we should argue against it getting a lot worse.
Q2 – Les Carr) You talked about the web, about systems. But the web is a socio-technical framework full of people with their own agendas. The web is a disruptive technology, how do we create disruptive academic communities that will make a real difference rather than playing it softly as we have been?
A2) Part of the answer is getting in people’s faces more. And making opportunities for that. For me what will drive that is the way the government is monitoring outcomes and use of outcomes. There is pressure on researchers to do that but we are not used to that. The place to be disruptive is at the point of maximum pain. That’s coming soon for EPSRC funded research. It might be coming soon with implications of Finch report impacts on UK publications. Pick the point carefully but in next 6 to 24 months there will be the right pain point to be disruptive and to show that we can ease that pain. The time of sitting back and facilitating researchers has probably passed.
Q3 – Dave Tarrant) The New World Journal answer is to charge for journals. So how can we connect up this community together rather than still have the serials model where we charge for good stuff and have other stuff out there.
A3) That issue of silo-ing is important but that issue is solved for me by proper licensing, where people can pull content together in any form they want for free. That problem hopefully goes away. There are still technical barriers but they can be overcome. But that only works if the content is properly licensed. The other problem is we don’t want to exchange a problem with access on the read side to problems on the write side. Publishing formatting and distributing research costs money. Anyone funding research really needs to insure those costs are part of that funding. We have a lot of thinking to do around the transitional process. One way would be to shift the peer review process. If we could flip or change that model that would bring costs down. Contributions in kind should be considered – not exactly sure what that is but we need to think about it. And those of us on the publications side have to facilitate this. At PLoS when you publish we say how much this journal costs to publish in, what can you afford to pay, even if nothing. But that’s not long term as a solution for all. That becomes charity. We need to remember that a paper is just research and a publication is just a repository. If it’s not worth the cost of publication and the IR is the solution then so be it. Publishers worry about value for money – otherwise you wouldn’t see embargoes. Open access publishers are not threatened about the value they add to the deposited copy.
Q4) I think that model works for research papers, for software too. But for data? That’s much more complex?
A4) I was in New Zealand last week and my default CC0 answer isn’t possible with copyright law. There is licensing and there are legal instruments. We want stuff to be interoperable and open licences allow the most reuse possible. The principle should be for maximum interoperability with the most open license you can. So adopting Susan Morrison’s work there are some licenses to suggest for different sorts of objects. I hope in CC version 4 that the licence can deal properly internationally with data. Another problem here is that licensing is used for social signaling. Many people do not use them as a legal instrument. There is a social signaling element that has gotten tied up with legal instruments. I hope we can resolve that in the long term by thinking about transfer across network and use of research because it’s in their own best interests to see stuff used. The end game has to say please use this as much as possible, in as many ways as possible and I’d like to hear about it. I think that’s what I’d like to see and I think that should solve out problem.
And Kevin is closing the keynote with a big thank you to Cameron.