Empowering Research: The Vital Role of Citing Research Software for Reproducibility and Innovation

11 minute read

Published: February 25, 2024

As I started my Ph.D. journey in numerical optimization back in 2014, I noticed something that really stood out to me: despite the abundance of scientific papers discussing algorithms and their numerical results, the availability of corresponding open-source codes lagged far behind.

This discrepancy posed significant challenges to reproducibility and the continuous improvement of algorithms, ultimately hindering the advancement of knowledge in the field in my opinion. Understanding the reasons behind this gap provide insights into the complexities inherent in code development:

Providing an open-source implementation means navigating platform choices and addressing questions and issues from users;
Moreover, the journey from a basic implementation to a robust and efficient one requires a considerable investment of time and effort;
The process of writing, maintaining and implementing a code wasn’t necessarily valorized in scientific research.

While these challenges could be frustrating, they also presented an opportunity for transformation. The necessity of numerical results underscores the importance of investing the development of codes. By shifting our focus from rewriting similar codes for each publication to refining and enhancing existing implementations, we could pave the way for greater reproducibility and innovation in the field.

Furthermore, the potential impact extended beyond academia. By improving the accessibility and quality of optimization codes, we could empower practitioners to tackle real-world problems with greater confidence and efficacy. This shift toward research-level, super-powerful codes held the promise of addressing some of the most pressing challenges facing our society.

In essence, what initially seemed like a frustrating discrepancy evolved into a call to action. By embracing the challenges and opportunities inherent in code development, we could not only advance knowledge within the field but also make meaningful contributions to real-world problems. This, I believe, is one of the most important challenges in research and one that holds the potential to create lasting positive change.

WHY SOFTWARE MATTERS

Over the past decades, there has been a shift regarding scientific software.

Increased Diversity and Accessibility

Open-source software has helped democratize access to scientific tools and resources, making them more accessible to researchers worldwide. Additionally, the open nature of these projects encourages diversity and inclusivity, as contributors from diverse backgrounds can participate in their development.

Increased Adoption of Open Source

There has been a significant rise in the adoption of open-source software in scientific research. Researchers increasingly recognize the benefits of open-source collaboration, including transparency, reproducibility, and community-driven development.

Personally, my research journey began at INRIA, renowned as one of France’s largest and most dynamic research center in computer science and applied mathematics. At INRIA, the development of software is ingrained in the fabric of their research culture, reflecting a deep commitment to innovation and advancement in scientific computing.

Inria and open source software: unwavering support for nearly 20 years

Growing Emphasis on Reproducibility and Transparency

With the replication crisis in various scientific fields, there’s been a heightened focus on reproducibility and transparency. Open-source software plays a crucial role in enabling researchers to share their code and data, facilitating reproducibility and validation of results.

Towards Reproducibility in Research Software

Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research

Collaborative Development and Sharing

Platforms like GitHub have facilitated collaborative development and sharing of scientific software. Researchers can openly collaborate, contribute improvements, report issues, and share their work with the wider community, fostering innovation and knowledge exchange.

Hard to believe now but Git exists only since 2005 (A short history of Git) and Github started being developped in 2007 (The untold story of Github). Now, pretty much everyone hosts their open source projects on GitHub, including top companies like Google, Facebook, Twitter, and even Microsoft. Julia programming language is housed on GitHub and it’s entirely public.

In 2023, Github reported 284 millions public repository.

Integration of Best Practices

There’s been a greater emphasis on incorporating best practices in software development within the scientific community. Initiatives like the Software Carpentry and Data Carpentry programs offer training in software development skills tailored for researchers, promoting good coding practices, version control, and reproducible research. While it used to be common to criticize academic code as being poorly organized and in need of extensive refactoring, this perception is becoming less prevalent over time.

Recognition and Reward for Software Contributions

There’s a growing recognition of the importance of software contributions to research outcomes. Funding agencies, institutions, and journals are increasingly valuing and rewarding researchers for their software development efforts, whether through citations, funding, or career advancement.

Even though, I classified this in the new practices, this was long known in Montréal. The story of GENCOL and the company AD OPT is one of Montréal classic using research-level software to solve major challenges

Overall, the past decade has witnessed significant progress in the development and adoption of open-source scientific software. These trends have played a crucial role in advancing scientific research, fostering collaboration, and promoting transparency and reproducibility within the scientific community.

OPEN SOURCE SOFTWARE AND THE ROLE OF JULIA

The past decade has seen the emergence of numerous specialized tools, libraries, and frameworks tailored for scientific computing and data analysis. These resources provide researchers with powerful and efficient tools for tackling complex scientific problems across various disciplines. One that stood out for me is the Julia programming language.

In this context, the plateforme and language plays a major role, and I have grown to think that Julia is a perfect match and particularly in an academic context (but not only). In numerical optimization, and computational science in general, performant open-source codes have been developped using Fortran, C++, and in some measure R and Python. There have been numerous success story, however without bringing together a large community of contributors.

The Julia-verse is mostly focus toward Github, and there is a continuous development toward continuous integration tools that would benefit any Julia package. Thus, making coding and maintaining repository in Julia a better experience.

The Julia programming language version 1.0 appeared in 2018 turning this very promising language into a stable one where large codes can rely on without too much maintenance. Julia has opened numerous success story, one of the early that was impressive was Julia joining the petaflop club Julia Joins Petaflop Club. In other words, Julia is fast, which is good in general because you don’t want to invest time into code that will be obsolete. In my experience, and our recent paper Scalable Adaptive Regularization was another example, Julia is doing more than OK comparing to solvers in Fortran or C++.

Finally, one of the most important point for me is the accessibility of the language. I have been teaching numerical optimization classes for the past years with laboratories in Julia (usually 6 blocks of 3 hours) to students that never coded in Julia, but usually had some Matlab or Python experience. I loved the idea that in a couple of hours students were capable of starting doing meaningful code, and we were able to cover very exciting topics. The same experience extended to the research teams were I have been involved.

To me the perspective of being able to write meaningful and efficient code with few lines, a clear syntax and without 10 years of experience sounds like modern and good science.

Citation File Format (CFF) FOR SOFTWARE

To valorize the coding process in scientific research, the generally accepted procedure recently is to make software citable. Therefore, we would register packages to Zenodo.org to generate a DOI that can be cited, although this is not considered a research paper. Some well-known research journal have a focus on the code either accompanying research or regarding as the main contribution the code itself or some update of it (Mathematical Programming C, INFORMS Journal of Scientific Software, …). A complementary project is to write paper about the code itself and not the associated mathematical content. This is the purpose of journal like Journal of Open Source Software or Journal of Open Research Software. They are peer-reviewed journal where a short article is reviewed as well as the package itself and its documentation. The peer-review process is not anonymous and is done via a Github discussion, which I think is in the open-source spirit we discussed earlier.

What is a CITATION.cff file?

According to [1] CITATION.cff files are plain text files with human- and machine-readable citation information for software (and datasets). Code developers can include them in their repositories to let others know how to correctly cite their software. It is very easy to correctly cite a paper: all the necessary information (metadata) can be found on the title page or the article website. Software and datasets have no title page, the relevant information is often less obvious. People who want to cite your software may ask questions like: What is the name of the software (it’s probably not my_scripts/run.m or analysis.py)? What label should I use to uniquely identify the version of the software I have used? What is the appropriate set of people that should be cited as authors? The person who wants to cite your software will probably not be able to answer these questions accurately and consistently themselves, but you can! Give them all the right information in a CITATION.cff file, and they can cite your software correctly.

Create a CITATION.cff file with ease using the form on the cffinit website.

According to [1], the development of CFF is supported by The Institute for Software Technology of the German Aerospace Center (DLR), The Netherlands eScience Center, and The Software Sustainability Institute.

They choose CFF

Github: When you put a CITATION.cff file in the default branch of your GitHub repository, it is automatically linked from the repository landing page, and the citation information is rendered on the repository page, and also provided as BibTeX snippet which users can simply copy! This makes it easy for other users to cite your software project, using the information you’ve provided. https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files [2] If your repository is hosted on GitHub, they will show the citation information in the sidebar, which makes it easy for visitors to cite your software or dataset correctly.

Zenodo: When you have a CITATION.cff file in your GitHub repository, make a release and publish it on Zenodo via the Zenodo-GitHub integration, Zenodo will use the citation information you’ve provided to populate the publication entry! This makes it easier for software developers and maintainers to publish their software with complete and correct metadata. https://twitter.com/ZENODO_ORG/status/1420357001490706442 [2] When you publish your software on Zenodo via the GitHub-Zenodo integration, they will use the metadata from your CITATION.cff file.

Zotero: When you have a CITATION.cff file in your repository, and someone uses the Zotero browser plugin to import a reference to your repository into their Zotero reference manager, it will use the citation information you’ve provided to populate the reference entry! This makes it easier for users to get a complete and correct reference to your software, that they can use when they cite your software in their work! https://twitter.com/zotero/status/1420515377390530560 [2] People can import the correct reference to your software into the Zotero reference manager via a browser plugin.

Share on

Twitter Facebook LinkedIn

Presenting at JuMP-dev 2024 and Publishing in JuliaCon 2023 Proceedings

3 minute read

Published: August 30, 2024

I’m thrilled to share two major milestones in my recent work within the Julia ecosystem. First, I presented the latest developments in optimization solvers at JuMP-dev 2024, and second, my paper on JSOSuite.jl was accepted in The Proceedings of the JuliaCon Conferences.

These two achievements highlight both the ongoing evolution of JuliaSmoothOptimizers (JSO) and its growing impact on large-scale nonlinear optimization problems.

JuMP-dev 2024: Advancing Nonlinear Optimization with JuliaSmoothOptimizers

This year’s JuMP-dev workshop, held independently from JuliaCon for the first time in Montreal, offered a focused platform for deep dives into JuMP and its surrounding tools. In my presentation, I discussed the latest progress within the JuliaSmoothOptimizers (JSO) ecosystem, my slides and the replay.

At the core of my talk was an introduction to new solvers and packages like AdaptiveRegularization.jl, which address the unique challenges of large-scale optimization problems with Adaptive Regularization with Cubics. I emphasized the following key innovations:

Automatic Differentiation (AD) support and integration with JuMP for easier problem modeling.
Memory pre-allocation for in-place solvers, reducing runtime overhead.
Support for multi-precision solvers and GPU-based computations, essential for modern large-scale applications.
The value of factorization-free solvers, which excel in tackling large, complex problems, such as those in discretized PDE-constrained optimization.

For newcomers to JSO, JSOSuite.jl serves as a critical entry point, simplifying solver selection and benchmarking through automatic algorithm matching. This tool eliminates the complexity of choosing from multiple solvers by providing a user-friendly interface that adapts to the problem at hand. My talk also touched on the broader adoption and longevity of JSO, which now spans over 50 registered packages, making it one of the most comprehensive platforms for numerical optimization.

JuliaCon 2023: JSOSuite.jl – Simplifying Continuous Optimization

While JuMP-dev 2024 focused on recent developments, my publication in The Proceedings of the JuliaCon Conferences looks at the core philosophy and implementation behind JSOSuite.jl. Titled JSOSuite.jl: Solving Continuous Optimization Problems with JuliaSmoothOptimizers, the paper introduces JSOSuite.jl as a package designed to bring ease-of-use to complex optimization challenges.

JSOSuite.jl covers a range of problem types—from unconstrained to generally-constrained and least-squares problems—and eliminates the need for users to understand the intricate details of individual solvers. Instead, the package conducts a preliminary analysis of the problem and automatically selects the most appropriate solver, offering significant advantages to both experienced practitioners and newcomers alike.

This paper builds on the innovations within JSO, reinforcing its versatility and ease of use across various fields and applications. The package is a natural fit for researchers who need efficient, reliable solvers without the overhead of manually configuring them for different problem types.

Looking Forward

Both my presentation at JuMP-dev 2024 and the publication of the JSOSuite.jl paper reflect the significant strides made by the JuliaSmoothOptimizers organization over the past year. The JSO ecosystem is positioned to continue driving innovation in the field of numerical optimization.

I’m excited to see how these advancements will be applied across diverse optimization problems in the coming years and look forward to continuing this journey with the JSO community.

New Preprint on HAL: Exploring Projected Dynamical Systems in Geochemical Reactions

2 minute read

Published: July 05, 2024

This project holds a special place in my heart as it touches on the very applications in geochemistry that first drew me into research. Equilibrium reactions, particularly in slow processes like the water cycle in aquifers, have always fascinated me. Moreover, this paper represents an important milestone for one of the authors, Bastien, as it was part of his Ph.D. thesis. The use of projected dynamical systems, a model I am particularly fond of, adds an additional layer of personal significance to this work.

Performance Profile Benchmarking Tool

16 minute read

Published: June 25, 2024

The Dolan-More Performance Profile is a method used for comparing the performance of algorithms.

ARCqK published in Mathematical Programming

3 minute read

Published: December 15, 2023

I am thrilled to share that the article Scalable adaptive cubic regularization methods has been published in the journal Mathematical Programming, Series A. This has been a really exciting journey with my co-authors Jean-Pierre Dussault and Dominique Orban on this really exciting work that I hope will help explore the numerical possibilities of ARC methods. The proposed implementation is a perfect fit for large-scale application as it solves the subproblem inexactly and only required Hessian-vector products, so no need to evaluate and store the Hessian matrix. As usual, the code has been done in Julia and is available in the folder paper in the Github repository AdaptiveRegularization.jl. Full text published version available from here, enjoy!

Tangi Migot

Empowering Research: The Vital Role of Citing Research Software for Reproducibility and Innovation

WHY SOFTWARE MATTERS

OPEN SOURCE SOFTWARE AND THE ROLE OF JULIA

Citation File Format (CFF) FOR SOFTWARE

What is a CITATION.cff file?

They choose CFF

Share on

You May Also Enjoy

Presenting at JuMP-dev 2024 and Publishing in JuliaCon 2023 Proceedings

New Preprint on HAL: Exploring Projected Dynamical Systems in Geochemical Reactions

Performance Profile Benchmarking Tool

ARCqK published in Mathematical Programming