Software sustainability research with rOpenSci

  Daniel S. Katz


May 25, 2016

I’m happy to announce that I’ve started a project with rOpenSci under their recent award from the Helmsley Foundation.

My work with rOpenSci will focus on sustainability of the project itself. Sustainability can be defined as having the resources to do the necessary work to continue and grow rOpenSci. This is one of the most difficult challenges for rOpenSci and for many other research software projects.

rOpenSci has a very broad and very ambitious goal, as stated on their web site, “Transforming science through open data.” In practice, the work being done by rOpenSci is “creating packages that allow access to data repositories through the R statistical programming environment” with tools that “not only facilitate drawing data into an environment where it can readily be manipulated, but also one in which those analyses and methods can be easily shared, replicated, and extended by other researchers.” An interesting question is how much rOpenSci will choose to move beyond R to meet its goal, which I would encourage as much as possible. I actually might rephrase the goal as “Transforming science through open data and open software” better matching what is now happening in the project while not calling out R, since I would prefer to try to affect the non-R science community as well.

Sustainability = Resources > Work

There are a variety of sustainability models that have worked for other open source projects to bring in the resources needed to do the work, such as institutional support, diversified grants, membership fees, donations and gifts, fees for services, volunteers, etc. With the rOpenSci team, I will test elements of these models for rOpenSci. As part of this, I will also work with the rOpenSci leadership team to identify new funding opportunities for rOpenSci from public and private funding organizations.

I will discuss and document how other projects have increased their scale of impact without an equivalent increase in their staff, and how these models could work for rOpenSci. I and the rOpenSci team will then test elements of one or more of these activities. I will also seek to find new partnership opportunities for rOpenSci with academic, government, and private organizations.

Sustainability is a key challenge for many academic open source projects, as I saw when I was leading NSF’s SI2 program. One of the review criteria for proposals to this program was to describe the proposed sustainability plan. Few proposers gave fully convincing plans, and the most common was probably to have a diverse set of funding sources. Additionally, many projects decided that they would leave sustainability until near the end of the project. While we don’t know what will lead to sustainability for academic open source software, it seems clear that waiting to address it is not likely to make the effort more successful.

If we can solve this for rOpenSci, or even just make progress and provide guidance for other projects about what might work and what probably won’t, this will be a significant contribution. The WSSSPE events will be a way to bring together some of the academic open source community to understand how our lessons apply to other projects.

Promoting Open Source

The last part of this work with rOpenSci, which overlaps some of my other work, involves ideas about how to develop the overall scientific software community. Working with the rOpenSci project and the wider scientific software community, I will examine incentives, citation/credit models, and metrics, including understanding contributions of people in teams; funding models; multi-disciplinary science; career paths for software developers and scientists who also develop some software; software engineering; software communities and sociology; training and education; software dissemination, publication, and peer review; catalogs, search, and review; portability; and reproducibility. At least some of this will be done under the WSSSPE umbrella.

I will also work with rOpenSci to attempt to promote and encourage software metrics. A number of projects, including sempervirens and Depsy, have begun to focus on particular parts of the problem space, but more general work and tools are needed, in particular to impact those outside of just the R and python communities. There’s also some related work underway at NCSA and RENCI (that came out of WSSSPE3) to survey a set of developers on what impact metrics they find useful, and I think the results of the survey will also help this overall community.

The Culture around Open Source Needs Improvement

Finally, I want to try to change our cultural understanding around the idea of open source software. The fact that open source software is commonly viewed as free software is a problem, since to many people, free means of no value. In particular, I believe we need to try to get to the point where companies that make heavy use of open source software feel that they have an obligation to support either the software or its developers in some way. This is part of the larger sustainability discussion, since this is the way that open source projects can gain more resources and reduce the amount of work that they have to do internally.

Any thoughts, comments, or suggestions are welcome. This is also cross-posted on Dan’s blog