Introduction to Open Code

Dr. Johanna Bayer

McGill University, CA, Donders Institute, NL

About myself

Post Doctoral Fellow at McGill University, CA, Donders Institute, NL.
Developing and using machine learning models to create normative models on large neuroimaging data sets.
Studying computer science.
Involved in several open source and open science initiatives (OSSIG, DataTalks.club, TOPS, scikit-learn, Brainhack.org, The Turing Way).
Cat lover.

Introduction to Open Code

Learning Objectives

After completing this lesson, you should be able to:

Define open-source software and distinguish it from closed-source software.
List common benefits and challenges to the production of open code and describe how researchers can respond to some of the challenges while maximizing openness when appropriate.
Describe the function and purpose of a Software Management Plan, and its utility as a guidebook for everyone involved in a scientific project.

What is Code vs Software?

Code:

structured way of conveying information.
term not necessarily computer-specific.
high level code that a human can understand has to be compiled by a compiler into machine language (low level code) that the computer can understand.

Software:

collection of programs, data, scripts and code that are bundled together and executed together.
Software can be open and closed.

Open-source software:

distributed with its source code without cost, making it available for others to use, modify, and distribute with its original rights and permissions.
often transparently shared in a public repository, and sometimes maintained through collaboration.
the basis for a vast range of research software packages.
is often protected by a license that governs the sharing and the use of the software.

History of computing

Principles behind open code - a bit of a manifesto

Principle	Description
Transparency	Whether you are developing software or solving a research problem, we all have access to the information and materials necessary for doing our best work. When these materials are accessible, we can build upon each other’s ideas and discoveries. We can make more effective decisions and understand how those decisions affect us
Collaboration	When we’re free to participate, we can enhance each other’s work in unanticipated ways. When we can modify what others have shared, we unlock new possibilities. By initiating new projects together, we can solve problems that no one can solve alone. And when we implement open standards, we enable others to contribute in the future.
Share Early & Often	Rapid prototypes can lead to rapid discoveries. An iterative approach leads to better solutions faster. When you’re free to experiment, you can look at problems in new ways and seek answers in new places. You can learn by doing.
Inclusivity	Good ideas can come from anywhere, and the best ideas should win. Only by including diverse perspectives in our conversations can we be certain we’ve identified the best ideas, and good decision-makers continually seek those perspectives. We may not operate by consensus, but successful work determines which projects gather support and effort from the community
Community	Communities form when different people unite around a common purpose. Shared values guide decision making, and community goals supersede individual interests and agendas.

Types of software - is what I write, software/code?

You might have encountered different types of software:

Software type	Description	Example
General purpose Software	produced for wide use; can be open or closed	Linux kernel, GNU userspace, and various Linux and UNIX distributions, Browsers; Android Operating System
Operational/Infrastructure Software	used by data centers and large information technology facilities to provide data services	APIs, Web Apps
Libraries	generic tools for implementing well-known algorithms, providing statistical analysis, or visualization which are incorporated in other software categories; small	sci-kit learn, numpy, pandas, ggplot, etc.
Modelling and Simulation Software	implements solutions to mathematical equations given input data and boundary conditions, or infers models from data	OpenFoam, Matlab libraries, Stan
Analysis Software	developed to manipulate measurements or model results to visualize or gain understanding	R, SPSS,
Single-Use Utility Software	written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way. This code often uses libraries for analysis, plotting, or reading data; gets included into Open Science and Data Management Plans	plots for a paper, data analysis script

Licensing code

Licensing code is a whole can of worms and I am not a lawyer.

When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you. Source.

Licenses can range from restrictive to liberal (copy-left).

Type of license	Permissions	Examples
Public domain	Grants all rights	CCO
Permissive license	Grants use rights, forbids almost nothing (allows proprietization)	BSD, MIT, Apache
Copyleft (protective license)	Grants use rights, forbids proprietization	GPL, AGPL
Proprietary license	Traditional use of copyright; no rights need be granted	Proprietary software

Make your code citable

Add a digital object identifier (DOI) to your repository, for example using Zenodo
Add Citation.cff file to your repository

Software Management plans

The best way to work with software.

Document that describes how a specific software project is developed, maintained, and curated.
Written by the developers, maintainers, and/or other stakeholders of a software project.
Goal of an SMP is to ensure that the software is usable and maintainable in the long term.

Open code in the time of LLMs

Code as training data:

Large amounts of public GitHub code have been used to train LLMs (e.g. GitHub Copilot, The Stack).
Opt-out mechanisms exist.
Feeding code into commercial LLMs (e.g. ChatGPT free tier) may include it in training data.

LLM-generated code and licensing:

Code generated by LLMs is not clearly copyrightable.
Using LLM-generated code in a licensed repository raises unresolved legal questions.
Disclose LLM use in your methods when relevant.

Using LLMs for coding

Should I use LLMs for coding? YES.

LLMs are new: Everyone is learning how to use them. You can be a pioneer in your field by figuring out how to use them effectively and sharing your insights with others.

LLMs are fantastic teachers

“Can you explain this codebase to me?”
“What does this function do?”
“Explain this method section as if I were 5.”
“Give me example code to reproduce this graph.”

But you still need to own your code:

Be able to read, understand, and explain it.
Be able to maintain and debug it.
Never use LLMs on sensitive or patient data.

LLMs as a leveler:

Researchers without a formal CS background, non-native English speakers, and those from under resourced institutions can now write and document code more confidently

Getting started:

:::

Tension between openness and LLMs

LLM output is probabilistic, not deterministic — unlike traditional code:

The same prompt can produce different output depending on context: the training data, the prompt, and the conversation history.
The context window — the amount of context the LLM can consider, measured in tokens (words or subwords) — also influences output.
Even sharing the exact prompt does not guarantee reproducibility.
This creates a genuine tension with open science principles of transparency and replication.

What helps:

Share the generated code itself, not just the prompt.
Pin and document the model version used.
Test and describe what the code actually does.

Treat LLM-generated code like code from Stack Overflow: a useful starting point, but you are responsible for understanding and validating it.

Exercise: How do you use LLMs for coding?

Do you use LLMs for coding? If so, how? If not, why not?
What benefits or challenges have you encountered?
Do openness and LLMs conflict or complement each other?
Do you have tips to share with others?

Key Takeaways: Relating Principles to Benefits and Challenges

Making software more open has benefits and challenges, which are related.
Greater benefits typically come with greater challenges.
In most cases, individual scientists and society will both benefit from more open software.
LLMs can both accelerate and hinder the coding process, and might create a tension with reproducibility

Using Open Code

Learning Objectives

After completing this lesson, you should be able to:

Describe the process of using open code and know some key repositories to find open code.
Describe how, where, and under what circumstances one should acknowledge (cite) code.

Discovering open code

What locations do you already know where you can find code?

Software repositories


Software Heritage	Open Source Development Network (OSDN)	SourceForge	Free and Open-Source Software Hub (FOSSHUB)

Googlecode	Comprehensive Perl Archive Network	PyPl	CRAN

What makes a good README?

The README is the first thing anyone sees when they visit your repository, it determines whether your code gets used.

A good README includes:

Title and description — what the project does and why
Installation instructions — dependencies and setup steps
Usage example — at least one working, copy-pasteable example (if applicable). If you shared code to a paper, describe how it should be used to reproduce the paper’s results.
Citation — how to credit the software in a paper (if applicable).
Downloads, releases and Change log — if applicable, describe how to get the latest version and what has changed.
Contributors and acknowledgements — who contributed to the project and who should be acknowledged (if applicable).
License — what others are allowed to do with it (if applicable).
Contact / contribution info — Contributing.md: how to contribute

Rule of thumb: if a new lab member couldn’t run your code from the README alone, it needs more work.

Citing Open Source Code and Software

When should open code be cited?

It has played a critical part in your research.
It provides something novel
It impacts the results of your analysis

If you run a simulation using a specific software package, cite that package. You do not need to cite the word processor you used to write the paper. However, if your research is on comparing word processors, then you would need to cite the word processor you used.

10 Tips for citing software

Describe any software that played a critical part in your research in enough detail for a peer to repeat and validate what you did.
Options for citing: footnotes, acknowledgements, methods sections, and appendices.
A licence may place you under an obligation to attribute the software.
Cite papers that describe software as a complement to, not a replacement for, citing the software itself.
In the first draft, always put software citations in references or bibliographies.
Be prepared to debate with reviewers, you are acknowledging the contribution of the software’s authors.
Inform reviewers if you are legally obliged to cite the software.
If a reviewer disagrees, you can still make a general reference in the paper.
Recommended citations may lack detail; add more yourself if needed.
If the software has a DOI, use it. Otherwise use the software’s URL.

Citation formats

Software purchased off-the-shelf:

ProductName. Version. ReleaseDate. Publisher. Location.
SuperScience. 1.2. December 2012. ResearchSoftware. Edinburgh, UK.

Software downloaded from the web:

ProductName. Version. ReleaseDate. Publisher. Location. DOIorURL. DownloadDate.
OGSA-DAI REST. 4.2.1. December 2012. OGSA-DAI Project. http://sourceforge.net/projects/ogsa-dai. 27/04/2012.

Software checked out from a public repository:

ProductName. Publisher. URL. CheckoutDate. RepositorySpecificCheckoutInformation.
OGSA-DAI REST. OGSA-DAI Project. http://sourceforge.net/projects/ogsa-dai. 27/04/2012. Check-out: ogsa-dai/branch/ogsadai4.1/, revision 1657.

Software on GitHub with a DOI:

Author. ProductName. Version. [Type of Work]. DOI/URL.
Lisa, M., & Bot, H. (2017). My Research Software (Version 2.0.4) [Computer software]. https://doi.org/10.5281/zenodo.1234.

AI tool:

AI Company Name. (year). Tool Name/Model in Italics and Title Case [Description; e.g., Large language model]. URL of the tool.
Anthropic. (2025). Claude 4 Sonnet [Large language model]. https://claude.ai/new

AI chat:

AI Company Name. (year, month day). Title of chat in italics [Description, such as Generative AI chat]. Tool Name/Model. URL of the chat
Anthropic. (2025, May 20). Essential grammar topics for high school graduates [Generative AI chat]. Claude Sonnet 4. https://claude.ai/share/329173b2-ec93-4663-ac68-4f65ea4f166d

Exercise: Pick one

One concrete first step towards sharing code

Think about a concrete example (research, project).

What are the two next concrete steps towards making your code (documentation) for this project share-able?

Citing your first software package

Think about a software package you have used in your research.

Look up the recommended citation for that package and add it to your reference manager (e.g. Zotero, Mendeley, Endnote, etc.) or your bibliography file (e.g. .bib file).

Thank you so much!

This presentation is reproducible!

a = "Thank you!"
b = "What Questions Do You Have?"

print(paste(a, b))

[1] "Thank you! What Questions Do You Have?"

Introduction to Open Code

About myself

Introduction to Open Code

What is Code vs Software?

History of computing

Principles behind open code - a bit of a manifesto

Types of software - is what I write, software/code?

Exercise: Benefits and challenges of sharing code

Exercise: Benefits and challenges of sharing code

A simple way of getting started to share your code

When Not to Share

Licensing code

Make your code citable

Software Management plans

Open code in the time of LLMs

Using LLMs for coding

Tension between openness and LLMs

Exercise: How do you use LLMs for coding?

Key Takeaways: Relating Principles to Benefits and Challenges

Using Open Code

Discovering open code

Software repositories

What makes a good README?

Citing Open Source Code and Software

10 Tips for citing software

Citation formats

Exercise: Pick one

Thank you so much!