A Primer for 
Applying and 
Interpreting Licenses 
for Research Data 
and Code

Authors: Lisa Chinn, Matthew Murray🦇, and Isaac Wink.

Mentor: Jennifer Huck

Affiliate Contributors: Talya Cooper, Laura Hjerpe, Katherine Klosek, Sophia Lafferty-Hess, 
Allison Langham-Puttrow, and Jiebei Luo.

Suggested Citation: Chinn, Lisa; Murray 🦇, Matthew; and Wink, Isaac. (2024). A Primer for 
Applying and Interpreting Licenses for Research Data and Code. Data Curation Network GitHub 
Repository.

Overview
Topic Description

Summary Evaluating and selecting copyright licenses 
applied to research datasets and code in a 
U.S. context.

Primary fields or areas of use Relevant to all fields in which data or code 
are shared.

Key questions for curation review ● Does copyright, a data use 
agreement, or terms of service impact 
how a researcher may reuse and 
disseminate previously created 
datasets or code?

● When choosing a copyright license for 
their own dataset or code, what 
copyright, institutional, and funder 
factors may impact a researcher’s 
choice of license?

Context-specific considerations Consider data use agreements, institutional 

1


policies covering data ownership, and funder 
requirements for data sharing. 

Tools for curation review Tables of commonly-applied copyright 
licenses to datasets and code. (see below)

Date created November 11, 2024

Created by Lisa Chinn, Matthew Murray 🦇, Isaac Wink 
DCN Mentor: Jennifer Huck

Date updated and summary of changes made Initial publication, Dec 2024

Overview
Introduction
Copyright Licensing for Datasets and Code
Common Licenses Applied to Datasets
Code Ownership

Who owns the copyright on computer code?
Common Licenses Applied to Code
What license should be used for computer code?
Code Documentation

Institutional Policies Impacting Data Licensing and Ownership
Institutional data governance does not always address research data
Data Ownership

Interpreting and Applying Dataset Licenses
A note on legal advice
Choosing a license for a newly created dataset

Factors impacting data sharing
Helping researchers choose a license
Navigating and Interpreting Licenses Applied to Datasets That Are Being Reused
Navigating and Interpreting Licenses Applied to Code That Is Being Reused

Challenges to Understanding Licenses
Conclusion
Bibliography and Further Reading
Additional Resources on Navigating Licenses

2

https://orcid.org/0000-0003-2801-2874
https://orcid.org/0000-0001-5799-8471
https://orcid.org/0009-0009-5750-2283


Introduction
This primer gives data curators an overview of the licenses that are commonly applied to 
datasets and code, familiarizes them with common requirements in institutional data policies, 
and makes recommendations for working with researchers who need to apply a license to their 
research outputs or understand a license applied to data or code they would like to reuse. While 
copyright issues are highly case-dependent, the introduction to the data copyright landscape 
and the general principles provided here can help data curators empower researchers to 
understand the copyright context of their own data.

Copyright law in the US exists to serve the public interest. Copyright is the legal framework that 
grants the creators of original works the right to control how those works are copied, adapted, 
and reproduced. Copyright law also includes exceptions and limitations allowing the use and 
reuse of copyrighted works for classroom teaching, scholarship, research, preservation, and 
accessibility, particularly in the nonprofit scholarly context. While copyright law can help 
encourage the creation of new works by allowing creators to maintain control and potentially 
profit from their own work, many scholars and other creators have sought to develop simple 
frameworks that make it easier for others to share and reuse their creations, such as text, 
datasets, and code.

Considering the copyright of datasets and code introduces a number of complexities, including 
ambiguity on whether or not a dataset constitutes a “creative” work, the ease with which 
datasets are copied and combined, and institutional policies impacting research data and code 
(such as copyright policies, data ownership policies, and data use agreements) that may lay 
claim to the intellectual property produced by researchers. Understanding copyright is also 
highly relevant to researchers working to comply with open data policies from funders and align  
their data with the FAIR principles (particularly Accessible and Reusable). 

Note that while the vast majority of countries use roughly similar copyright frameworks, there are 
important variations among them, including some that apply to datasets and code. This primer 
applies to the U.S. copyright context.

Copyright Licensing for Datasets and Code
Generally speaking, an individual who produces a creative work holds copyright to the work, 
meaning that others are not allowed to reproduce or republish the work without the original 
creator’s permission. When a creator allows someone else to reproduce or republish their work, 
they grant a license to do so. 

Copyright licenses are often part of a negotiated contract between parties (for example, a 
publishing house may pay an author for a license to publish their novel). Increasingly, however, 
producers of scholarly works attach a license to their outputs that apply to anyone who uses it in 
order to encourage reuse. For example, the author of an open educational resource may attach 
a license to their work that gives anyone the right to republish, reuse, or adapt the resource so 
long as they cite the original author (commonly expressed as a CC-BY license). 

3

https://www.copyright.gov/what-is-copyright/
https://www.arl.org/know-your-copyrights/
https://www.go-fair.org/fair-principles/
https://creativecommons.org/licenses/by/4.0/


Because the copyright framework depends on the production of creative or original works, 
licensing becomes less clear when applied to datasets, which may or may not be creative or 
original. In the United States, data that is factual is generally not copyrightable because no one 
may copyright facts or ideas (as established in Feist Publications, Inc. V. Rural Tel. Serv. Co).1 
However, the arrangement of facts (such as a dataset) may be copyrightable if it represents an 
original or creative structuring. A list of all the pizza restaurants in New York City would likely not 
be a copyrightable dataset, but it could be if the list also ranked the restaurants by their quality. 
The line separating copyrightable and non-copyrightable datasets is fuzzy and depends on 
contextual factors. 

To clear up any potential confusion, researchers can apply copyright licenses to their data and 
code. By including a license, researchers communicate to all future users the contexts in which 
they are or are not allowed to reuse and republish their datasets. Examples of common licenses 
applied to datasets and code are given below. 

Not providing a license in fact limits the data’s long-term reusability, because having no license 
means that the creator claims full copyright and thus full control over materials. Licensing acts 
as a tool of communication for the larger research community: researchers do not know how a 
dataset can be used or reused if no license information is provided. It is the curator’s 
responsibility to communicate this distinction to a researcher, and providing a researcher with 
information about how licensing benefits their research is an important part of any data sharing 
conversation. Conversely, when a researcher wishes to reuse a dataset that they did not create, 
it is also important that the curator help the researcher identify relevant license information so 
that they are empowered to comply with it.

Common Licenses Applied to Datasets
“Data” is a very broad term (you can look at the list of other DCN Primers to see some of the 
things it includes). Some data can be copyrighted and some data can’t be copyrighted, but 
regardless of its copyrightable status, it’s important to give any published datasets a license. 
Giving published datasets a license encourages researchers to reuse the data, since they will 
know explicitly what is and is not allowable. Licensing also promotes citation of the dataset, 
either through reuse or by providing the dataset as evidence in research.

Most data may be licensed through two major avenues: 1) Creative Commons licensing, or 2) a 
custom license or “Data Use Agreement (DUA)” agreed upon between the data holder and the 
data user. Computer code uses a different set of licenses (see below). Creative Commons 
licensing allows creators of all types of research outputs (including datasets) to tell the public 
how their work can be reused. It gives someone who wishes to reuse a work permission to 
reuse in a particular way.2 First, let’s go over Creative Commons licenses:

2 A dataset may also be released under multiple licenses. For example, a dataset may be released in a 
data repository under a non-commercial license, but the copyright holders may also license the dataset to 
an individual or organization specifically for commercial purposes.

1 A “fact” is something that is not created, but already exists and is discovered and recorded. This applies 
to scientific, historical, biographical, and news data. Examples include things like temperatures, dates, 
demographics, speeds, and weights (Compendium of U.S. Copyright Office Practices, 313.3(C) Facts).

4

https://supreme.justia.com/cases/federal/us/499/340/
https://datacurationnetwork.org/outputs/data-curation-primers/
https://creativecommons.org/share-your-work/cclicenses/
https://www.copyright.gov/comp3/


License Type License Logo What It does Allows for 
Commercial 

Use?

CC-BY Enables reusers to distribute, 
remix, adapt, and build upon 
the material in any medium 
or format, so long as 
attribution is given to the 
creator.

Yes

CC BY-SA Enables reusers to distribute, 
remix, adapt, and build upon 
the material in any medium 
or format, so long as 
attribution is given to the 
creator. If you remix, adapt, 
or build upon the material, 
you must license the 
modified material under 
identical terms.

Yes

CC BY-NC Enables reusers to distribute, 
remix, adapt, and build upon 
the material in any medium 
or format for noncommercial 
purposes only, and only so 
long as attribution is given to 
the creator.

No

CC BY-SA-NC Enables reusers to distribute, 
remix, adapt, and build upon 
the material in any medium 
or format for noncommercial 
purposes only, and only so 
long as attribution is given to 
the creator. If you remix, 
adapt, or build upon the 
material, you must license 
the modified material under 
identical terms.

No

CC BY-ND Enables reusers to copy and 
distribute the material in any 
medium or format in 
unadapted form only, and 

Yes

5


only so long as attribution is 
given to the creator. 

CC BY-NC-ND Enables reusers to copy and 
distribute the material in any 
medium or format in 
unadapted form only, for 
noncommercial purposes 
only, and only so long as 
attribution is given to the 
creator.

No

CC0 (CC Zero) CC0 (aka CC Zero) is a 
public dedication tool (rather 
than a license), which 
enables creators to waive 
their copyright and put their 
works into the worldwide 
public domain. CC0 enables 
reusers to distribute, remix, 
adapt, and build upon the 
material in any medium or 
format, with no conditions.
While use of CC0 does not 
require attribution, scholarly 
norms expect datasets 
available under these terms 
to be fully attributed.

Any conditions

Note: While other kinds of licenses, such as Open Data Commons licenses, may be applied to 
datasets, we focus mainly on Creative Commons licenses in this primer because they are 
commonly used across a variety of institutional, generalist, and domain-specific repositories.

Custom data licenses are usually negotiated at the time of data acquisition and may be used by 
your library if your library provides datasets as part of its collections strategy on campus.3 
Additionally, individuals, departments, or research centers might be responsible for licensing 
data, and contract offices may manage DUAs in addition to or in lieu of your library. As a curator, 
it is possible that you interact with data licensing only at the end of the curation process, when 
ingesting data into a repository. Nevertheless, it can be helpful to know the data licensing terms 
applied to datasets held by your library so that you can help researchers understand the terms 
by which they can reuse or adapt the datasets. 

3 Fair use rights can be preserved in database and software licenses, but if a library agrees in a license 
agreement or other contract to forego fair use rights when making a dataset available to its users, then 
fair use no longer applies.

6

https://opendatacommons.org/


Code Ownership
Computer code is an important type of research output and may take many forms from small 
scripts to full suites of software. Currently, there is limited agreement across funder and journal 
mandates about whether code created for research projects counts as research data. NASA 
specifically requires computer code and software to be shared as part of its data sharing 
guidelines. The NIH has Best Practices for Sharing Research Software (separate from its Data 
Management & Sharing Policy) and requires a statement listing software needed to access or 
work with datasets as part of Data Management and Sharing Plans. However, software is not 
considered research output by the NIH and is not required to be shared. General NSF policies 
encourage (but do not require) researchers to share software they have created, while specific 
NSF programs (such as the Office of Polar Programs) may have policies that require code and 
software to be shared. For any project that involves code, it is important to check the specific 
grant policies and requirements.

Institutional policies may cause computer code to be classified differently from other research 
data. For example, an institution might not claim ownership of a dataset but will claim ownership 
of code or software written to analyze that data. One reason for this is that software and code 
may be copyrightable and patentable in ways other datasets might not be. Institutions might 
then claim ownership of the code as intellectual property in order to exploit it commercially.

Another challenge researchers may face is the difference between “code” (the lines of 
programming text written in a specific language, such as Python) and “software” (an executable 
file or application) and what licenses apply to each. Additionally, it may not be possible to apply 
one code license to all code developed in a project as different pieces of code may require 
different licenses, depending on how they were developed.

There are also situations in which the Code of Best Practices in Fair Use for Software 
Preservation may be relevant.

Who owns the copyright on computer code?
Computer code is typically copyrightable. The owner of code created by a researcher depends 
on the academic institution’s policies. Some institutions claim ownership (or a share of 
ownership) for all code created by faculty or staff while employed by the institution or only code 
created using resources provided by the institution (such as a laptop or high-powered 
computing system), while others will not claim any ownership. This can be complicated when 
computer code includes contributions from multiple people on a research project, each of whom 
may have different statuses within an institution.

While for many works the owner of the copyright controls how it may be released, the 
collaborative nature of code means that the owner of the copyright may be limited in how they 
are permitted to release the code due to licenses that apply to projects as a whole. Consider the 
following scenarios:

7

https://www.nasa.gov/wp-content/uploads/2021/12/nasa-ocs-public-access-plan-may-2023.pdf
https://www.nasa.gov/wp-content/uploads/2021/12/nasa-ocs-public-access-plan-may-2023.pdf
https://datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-software-faq
https://sharing.nih.gov/data-management-and-sharing-policy/about-data-management-and-sharing-policies/data-management-and-sharing-policy-overview
https://sharing.nih.gov/data-management-and-sharing-policy/about-data-management-and-sharing-policies/data-management-and-sharing-policy-overview
https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-for-data-management-and-sharing/writing-a-data-management-and-sharing-plan#elements-to-include-in-a-data-management-and-sharing-plan
https://new.nsf.gov/policies/pappg/24-1/ch-11-other-post-award-requirements#ch11D4
https://www.nsf.gov/pubs/2022/nsf22106/nsf22106.jsp
https://www.arl.org/wp-content/uploads/2018/09/2019.2.28-software-preservation-code-revised.pdf
https://www.arl.org/wp-content/uploads/2018/09/2019.2.28-software-preservation-code-revised.pdf


● Code A is owned by a student because it was written on their own time using their own 
computer and is submitted to an open-source project.

● Code B is owned by an institution because it was written by a staff member during work 
hours using an institutionally supplied computer and is submitted to an open-source 
project.

In both of these scenarios, the copyright holder differs; however, because the open-source 
project requires contributions to be submitted under a specific license, the license for both will 
be the same. In this case, once the code has been released, the project license, and not the 
copyright holders, indicates how the code can be reused by others.

Common Licenses Applied to Code
There are many different licenses available for computer code. GitHub currently lists over 30, 
while the Open Source Initiative lists over 100. Thankfully, there are a smaller number of 
licenses that are more commonly used for code generally and academic code specifically. 
Helpful resources for comparing and selecting licenses include GitHub’s “Choose an open 
source license” website, the Open Source Initiative’s OSI Approved Licenses, and the European 
Commission’s Joinup Licensing Assistant. TLDRLegal provides software licenses in plain 
English.

Below are five of the most commonly used code licenses. Several of these licenses require that 
the entirety of the license be included within the code or software itself and may not be suitable 
for small pieces of code (under 300 lines). While some licenses are intercompatible, others are 
not. Permissive licenses allow users more freedom in how they reuse code and all of the 
following licenses allow for commercial use. 

8

https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository
https://opensource.org/licenses
https://choosealicense.com/
https://choosealicense.com/
https://opensource.org/licenses
https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses
https://www.tldrlegal.com/


License Name Type of License Requirements on Reuse of Code 
or Software

Apache License 2.0 Permissive License ● Full license and copyright 
information must be 
included.

● Changes to code must be 
specified.

BSD licenses (assorted 
variations)

Permissive License ● Full license and copyright 
information must be 
included.

CC0: Creative Commons 
Zero v1.0 Universal

Public Domain Waiver ● Does not need to include 
full license or copyright 
information.

GPL: GNU General Public 
License 3.0

Copyleft License ● Full license and copyright 
information must be 
included.

● Changes to code must be 
specified.

● The source code for any 
software must also be 
released.

MIT License (the most 
popular license on GitHub)

Permissive License ● Full license and copyright 
information must be 
included.

What license should be used for computer code?
The biggest concern for choosing a license for computer code will depend on whether the code 
is entirely original or based upon or expanding upon existing code. While some licenses are 
intercompatible, this is not always the case. There are also some licenses, such as GPL, that 
require any derivative works to be released under the same license (similar to how a CC BY-SA 
license works). It’s also possible that an institution will have a preferred license that they require 
code to be released under.

9

https://choosealicense.com/licenses/apache-2.0/
https://en.wikipedia.org/wiki/BSD_licenses
https://choosealicense.com/licenses/cc0-1.0/
https://choosealicense.com/licenses/cc0-1.0/
https://choosealicense.com/licenses/gpl-3.0/
https://choosealicense.com/licenses/gpl-3.0/
https://choosealicense.com/licenses/mit/
https://creativecommons.org/licenses/by-sa/4.0/


What is the researcher doing? What sort of license should be used?

Creating entirely new code from scratch The researcher gets to choose whichever 
license they want for their work. They should 
make sure that any collaborators working on 
the project with them are aware of the 
selected license.

Forking an existing project (copying an 
existing piece of code to develop it separately 
from existing projects/developers)

When creating a fork, check to see what 
license is being used for the existing project. 
Some licenses will require the use of the 
same license as the original projects, while 
others will allow a new license for the code to 
be used.

Creating new code that requires other pieces 
of code or libraries as dependencies

Some licenses require that any code that 
uses libraries or other code as dependencies 
must use the same license, while others allow 
a new license to be used. Some licenses may 
be modified from the default version with 
additional permissions or exceptions that 
allow them to be used as dependencies.

Contributing to an existing project Those contributing to an existing project 
should use whichever license the project 
already uses, otherwise, their code may not 
be accepted.

Code Documentation
Documentation for code and software may be found embedded within the code itself (in the 
form of code comments) or as separate files. When more robust documentation (for example, a 
user guide or help documentation) is provided, it is not included under the code license and can 
be released under a separate license. This may be the same license as the code, a Creative 
Commons license, or a documentation-specific license such as the GNU Free Documentation 
License. If using a separate license for documentation, make sure that any code samples within 
the documentation are released under the same license as the code as a whole.

10

https://www.gnu.org/licenses/gpl-faq.en.html#GPLIncompatibleLibs
https://www.gnu.org/licenses/fdl-1.3.en.html
https://www.gnu.org/licenses/fdl-1.3.en.html


Institutional Policies Impacting Data Licensing and 
Ownership
Curators should be aware of any institutional policies on data licensing. You may find policies 
organized under larger umbrella policies. There are a few places where one may find data 
licensing policies for a given institution, namely 1) institutional data governance policies and 2) 
other internal policies. You may also confront the reality of university or organizational data that 
is internal to that institution. In other words, universities are collecting and analyzing more and 
more data on their own students. This type of institutional data is out of the scope of this Primer, 
as we are focused on research data, but is an important distinction to note. We will take a look 
at these avenues below.

Institutional data governance does not always address research 
data
While institutional data governance policies may govern both research data and institutional 
data, most institutions separate institutional data governance from research data governance. If 
a curator is working with institutional data, or data that is derived from administrative research 
about the organization, then such guidelines can often be found on organizational websites. 
Such data usually includes data about campus admissions, matriculation of students, and other 
core institutional data. 

Some universities have institutional research data policies that have been approved by a Board 
of Regents/President of the University or the Chancellor/Provost of Research. Curators may find 
policies on a separate web page devoted exclusively to all policies as approved by the 
authorized parties. Check with your institution to see if such policies are housed in the same 
place. Some policies may be housed under an “institutional data governance” umbrella, while 
research data policies might be categorized and found under the auspices of the Office of 
Research (e.g., “Vice Chancellor of Research”, “Vice Provost of Research”, etc.).

Data Ownership
Research institutions often speak about “data ownership” to fit data within the framework of 
property; however, they may also use the term to communicate researchers’ responsibility to 
protect or disseminate data they produce. 

Curators may find policies at their own institutions dictating that the institution owns data 
collected, created, and disseminated as a part of research done at that institution. For instance, 
Harvard’s Research Data Ownership Policy states that “The University asserts ownership over 
research data for projects conducted at the University, under the auspices of the University, or 
with University resources.” Many institutions share this reading of data ownership, so it is 
imperative to read university policies implicating data ownership carefully to understand the 
landscape of data ownership and how it applies to data licensing within a particular data 
repository. Institutions may also have different rules in place for data generated by faculty, staff, 
and students or for institutional data and research data.

11

https://cpb-us-e1.wpmucdn.com/websites.harvard.edu/dist/6/18/files/2020/07/data_ownership_policy_08.06.19.pdf


While some institutions have a policy on data ownership, other institutions may instead have 
policies on data stewardship, while others only have an overarching data custodianship policy. 
Ownership, stewardship, and custodianship are part of a broader conversation about the 
relationship between data and property rights. All three are integral to conversations on the 
what, how, and who of data licensing. Licensing is fundamentally about sharing data, and each 
institution will have variations on how data may be shared beyond its walls. 

Research funders typically refer to data ownership to emphasize researcher responsibilities and 
highlight that data is frequently co-created with research participants. The National Institutes of 
Health’s National Center for Advancing Translational Science defines data ownership broadly 
and within the context of data registries: 

“Data ownership refers to both the possession of and responsibility for 
information. Data owners have the ability to access, create, modify, package, 
derive benefit from, sell, or remove data, as well as the right to assign these 
access privileges to others. Data in a registry traditionally has been owned by the 
registry sponsor. If there is more than one sponsor of the registry, ownership of 
the data should be clearly defined and legally documented. However, 
increasingly patient registries, especially those sponsored by rare and genetic 
disease patient groups and umbrella organizations, are providing more 
ownership rights to individual participants, including allowing participants to 
decide on a case by case basis who can view or access their data.” 

Here, the landscape for data ownership is context-specific: up to now, we have seen registry 
sponsors, most commonly disciplinary societies, institutions, or organizations, claim ownership 
of the data held in their repositories. However, a shift toward ownership of data originating from 
human subjects is currently changing discussions of ownership and access.

This shift towards patient-owned data is especially influenced by Indigenous data sovereignty, 
reflected in the CARE Principles for Indigenous Data Governance. The CARE Principles,4 as 
they are commonly known, emerged from the Global Indigenous Data Alliance and in response 
to the FAIR data principles. As a complement and critique of FAIR, CARE places the 
sovereignty of Indigenous people at the center of data practices and policies. Decisions about 
how, when, where, and what data are collected are connected to the historical and power 
differentials inherent in Indigenous communities across the globe. CARE is an acronym 
standing for Collective Benefit, Authority to Control, Responsibility, and Ethics. While CARE 
provides an ethical grounding for the shift towards patient-centered data, TK Labels provide 
licensing labels for accepted use of local, contextual data. As individual ownership of data may 
affect institutional policies, it is imperative to know and understand your institution’s data 
licensing and ownership policies.

Interpreting and Applying Dataset Licenses
Data curators have an important role to play in assisting researchers in interpreting and applying 
dataset licenses in two common cases: 1) suggesting the appropriate license for a researcher’s 

4 See related primer for more information: 
https://github.com/DataCurationNetwork/data-primers/blob/main/CARE%20Primer/care-primer.md 

12

https://toolkit.ncats.nih.gov/glossary/data-ownership/
https://toolkit.ncats.nih.gov/glossary/data-ownership/
https://toolkit.ncats.nih.gov/glossary/registry/
https://toolkit.ncats.nih.gov/glossary/data/
https://toolkit.ncats.nih.gov/glossary/registry/
https://toolkit.ncats.nih.gov/glossary/patient/
https://www.gida-global.org/care
https://localcontexts.org/labels/traditional-knowledge-labels/
https://github.com/DataCurationNetwork/data-primers/blob/main/CARE%20Primer/care-primer.md


original dataset, and 2) understanding licenses that have been applied to a dataset a researcher 
would like to reuse. This section will provide recommendations for both cases. 

A note on legal advice
There is a crucial principle that curators should keep in mind when discussing copyright: Under 
no circumstances should a data curator give legal advice to a researcher. Specifically, this 
means that the curator should not tell a researcher that the reuse of a particular dataset is 
definitely permissible, nor should they offer guarantees that a researcher has the required legal 
permissions to share certain data. The role of the data curator is to provide relevant information 
to the researcher on licenses that best meet their research aims and funder or institutional 
requirements for their data and code, but it is ultimately the responsibility of the researcher to 
determine the best course of action based on that information. The curator should avoid giving 
the impression that they are making a legal determination of the researcher’s situation with 
regards to copyright.

How to Avoid Giving Legal Advice

When a researcher asks… Curators should… Curators should not…

If they can republish an 
existing dataset as part of a 
new combined dataset.

Help the researcher locate 
and understand a license or 
data use agreement that 
applies to the dataset.

Tell the researcher they are 
definitely permitted to reuse 
the dataset.

If they have full copyright 
over a dataset they have 
produced.

Provide information on what 
other groups might have a 
copyright claim to a dataset in 
general.

Offer a determination that the 
researcher has exclusive 
copyright.

If they can include certain 
information (such as images, 
text of books, or social media 
posts) in a published dataset 
under fair use.

Offer resources on making a 
fair use determination.

Perform a fair use analysis 
for the researcher.

Choosing a license for a newly created dataset
Whenever a researcher shares a dataset, applying a clear and easy-to-find license is a crucial 
step in making data reusable because it unambiguously indicates to others how they are 
allowed to adapt and share the dataset. While the ultimate choice of what license to apply falls 
to the researcher, the curator can help identify any potential limitations on what licenses can be 
applied and support the researcher in making their data as open as is appropriate. 

Factors impacting data sharing
There may be some limits that impact a researcher’s ability to share some or all of their data or 
require them to use a particular license, and the curator should check for each of them in turn.

13


First, the curator should ask the researcher questions to help them determine if their dataset is 
copyrightable at all. In the U.S. context, whether or not a work can be copyrighted depends on 
“its originality rather than its creator’s effort.” This means that a dataset comprised of an 
unoriginal organization of facts cannot be copyrighted. While the line between a “creative” and 
“non-creative” dataset is not always clear, a useful point of comparison comes from the 
Supreme Court case Feist vs. Rural Telephone Service Company, which found that telephone 
books, as compilations of names and telephone numbers, are not sufficiently creative to be 
copyrightable. A data curator cannot definitively determine if a given dataset is sufficiently 
creative, but they can provide this context where relevant to researchers.

The US Copyright Office has stated that copyright only applies to work created by humans. 
Thus, works (including data) that lack human authorship (such as photographs taken by 
animals) are not copyrightable. Similarly, works produced by machines or mechanical processes 
that operate “randomly or automatically without any creative input or intervention from a human 
author“ are not copyrightable. The Copyright Office currently interprets this human authorship 
requirement to mean that content created by large language models (and other generative 
artificial intelligence systems) is not copyrightable, but notes that “a human may select or 
arrange AI-generated material in a sufficiently creative way” as to warrant a copyright claim. 
This remains a developing area of copyright law and policy.

Another factor relating to a researcher’s authority to copyright is whether or not the dataset was 
co-created with others. In research involving human participants, the researcher may have 
collected data to which others would have a copyright claim. For example, if a researcher 
interviewed individuals about their favorite childhood memories and wants to share transcripts of 
their stories as a dataset, then the interviewees would feasibly have a claim to copyright. When 
collecting data of a qualitative nature to which participants could have such a claim, researchers 
are advised to obtain informed consent agreements that permit broad data sharing and waive 
any potential copyright claim.5 

Lastly, institutional data ownership policies (discussed above) or policies set by funders may 
impact how a researcher may license their data. For example, some funders may require that 
datasets be licensed in open terms that facilitate reuse, while others, particularly private entities, 
may seek to limit sharing our reuse without the funder’s permission.

Assuming the dataset can in fact be copyrighted, the curator should ask if the dataset was 
created in whole or in part by reusing previously shared content generated by someone other 
than the researcher. If the researcher has merely subset publicly available data from another 
source without applying any new arrangement to it, then they should reshare the data under the 
same license previously applied to the data (or indicate that the data is in the public domain). 
Similarly, if the researcher accessed the content for their dataset by consenting to a data use 
agreement or terms of service for a database, then those agreements should be checked to 
determine if they put any restrictions on data sharing. 

Finally, the curator should ask if there are any non-copyright reasons to restrict the resharing of 
the original dataset. While making data easily accessible is a laudable goal, it must be balanced 
against potential harms, such as exposure of private or otherwise sensitive information. Other 
data curation primers such as the Human Participants Essentials primer and the CARE primer 
can be useful starting points for considering these non-copyright factors.

5 For more information on curating informed consent forms, review “Curation of Data Collected by 
Informed Consent” primer.

14

https://supreme.justia.com/cases/federal/us/499/340/
https://www.copyright.gov/comp3/chap300/ch300-copyrightable-authorship.pdf#page=21
https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute
https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute
https://copyright.gov/ai/ai_policy_guidance.pdf
https://github.com/DataCurationNetwork/data-primers/blob/main/Human%20Participants%20Data%20Essentials%20Data%20Curation%20Primer/human-participants-data-essentials-data-curation-primer.md
https://github.com/DataCurationNetwork/data-primers/blob/main/Human%20Participants%20Data%20Essentials%20Data%20Curation%20Primer/human-participants-data-essentials-data-curation-primer.md
https://github.com/DataCurationNetwork/data-primers/blob/main/CARE%20Primer/care-primer.md
https://github.com/DataCurationNetwork/data-primers/blob/main/Consent%20Forms%20Data%20Curation%20Primer/consent-forms-data-curation-primer.md
https://github.com/DataCurationNetwork/data-primers/blob/main/Consent%20Forms%20Data%20Curation%20Primer/consent-forms-data-curation-primer.md


Helping researchers choose a license
Once the potential constraints above have been navigated, the curator can help the researcher 
assess their available licensing options. In order to ensure that future data reusers can easily 
understand how reuse is permitted, curators should encourage researchers to choose one of 
the well-known licenses described in the “Common Licenses for Data” and “Common Licenses 
for Code” sections above. The curator should also steer the researcher away from choosing a 
nonsensical license for their use case, such as applying an MIT license (designed for software) 
to a tabular dataset. 

While the choice of copyright terms ultimately falls to the researcher and the control they would 
like to maintain over the reuse of their own data, the curator may also wish to explain the 
benefits of applying a CC0 waiver to the dataset, placing it in the public domain. Researchers 
without experience with copyright may instinctively want to exert at least some sort of copyright 
control over their data, figuring that it is better to be safe than sorry. However, bequeathing data 
to the public domain is often an appropriate choice for most datasets (in the absence of ethical 
constraints) for several reasons:

1. If the dataset could reasonably be considered a non-creative arrangement of facts, the 
research may not have a copyright claim anyway.

2. A CC0 waiver prevents the problem of “attribution stacking” and removes friction to 
future reuse by ensuring reusers that the researcher will not pursue a copyright 
infringement claim.

3. If the researcher is concerned about plagiarism or intellectual theft, maintaining a strong 
copyright claim is not the right tool for addressing that problem. An individual who 
attempts to pass off the researcher’s data as their own will have violated dominant 
norms of research integrity, regardless of whether or not the dataset was copyrighted. 
Applying a CC0 waiver to a dataset does not extinguish the researcher’s right to not be 
plagiarized. 

Explaining these factors to researchers may make them more comfortable with the idea of 
removing all copyright limitations on the reuse of their data, making it more easily reusable by 
others in the future. 

Relatedly, the license a researcher wishes to apply may impact which repositories they can use 
to host their data. Some data repositories may require the use of a specific license or tool such 
as the CC0 (public domain) waiver for deposited datasets. Some generalist repositories default 
to Creative Commons licensing, but allow users to select another license if needed. For 
instance, Harvard Dataverse defaults to using a CC0 waiver, but a user can work with 
authorized administrators of their instance of Harvard Dataverse to apply other licenses, given 
the data to be shared in the repository.

Navigating and Interpreting Licenses Applied to Datasets That Are Being 
Reused
First, curators should check whether a license has been applied to the dataset. For common 
licenses, curators can make use of the license descriptions provided in this primer in aiding 

15

https://creativecommons.org/public-domain/cc0/
https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/
https://dataverse.harvard.edu/


researchers in interpreting them. In addition to knowledge of licenses, the curator can also bring 
expertise in locating licenses, which may be clearly listed alongside the dataset, stated in a 
README file, or included in the dataset’s metadata. 

Second, the curator should investigate the provenance of the dataset to determine if it may have 
previously been shared under a different license. README files and other documentation or 
metadata may be helpful in determining if the dataset is a reshared or adapted version of a 
previous dataset. If this is the case, then the curator should check for a license applied to the 
previous dataset. Even if an individual has shared a dataset under a permissive license, such as 
CC-BY, if they adapted a dataset from a previous version, they may not have had the 
appropriate authority to apply a more permissive license, a documented problem among public 
artificial intelligence training datasets.

Finally, the curator should check if the researcher has accessed the dataset through a service 
that restricts reuse in its terms of service. Databases frequently license their content to users 
with restrictions on republishing. While these restrictions more commonly apply to articles, 
videos, or other creative content, they may apply to datasets as well.

Navigating and Interpreting Licenses Applied to Code That Is Being 
Reused
In general, researchers should be free to use any existing code or software that is legally 
available to perform analysis on their data. There are a few, rare exceptions of more limited 
licenses and it can be valuable to remind researchers in specific fields (such as nuclear or 
military research) or those intending to make commercial use of their research to check what 
limitations or restrictions licenses may require.

If a researcher intends to expand upon existing code, they should check to see what the existing 
licenses say and ensure that they follow any requirements.

Challenges to Understanding Licenses
Unfortunately, the copyright landscape for shared datasets is extremely uneven, so there will be 
many cases in which researchers need to determine if they can adapt an existing dataset 
without clear copyright information. Licenses will frequently be mis-applied or missing entirely, 
leading to circumstances in which an original creator’s intent is uncertain. (Sometimes, 
researchers may be able to contact the creator directly to ask about reuse, but this should not 
be relied upon.) Datasets containing purely factual information available on the open web (and 
accessible without agreeing to any terms of service) may contain warnings forbidding reuse that 
may not be actionable.

Curators can help ameliorate these problems by ensuring that researchers creating original 
datasets apply explicit and easily findable licenses to prevent future confusion. In considering 
the reuse of existing datasets, the curator can help explain what various licenses mean to a 
researcher so they can make their own determination on what they’re comfortable reusing. This 
uncertainty can be frustrating to researchers, and while curators can respond empathetically, 

16

https://spectrum.ieee.org/data-ai
https://spectrum.ieee.org/data-ai
https://spdx.org/licenses/BSD-3-Clause-No-Nuclear-License.html
https://www.cs.ucdavis.edu/~rogaway/ocb/license2.pdf


they should work to provide information that empowers researchers, not give them a false sense 
of certainty that particular reuses are permitted. 

Conclusion
Navigating copyright is not an easy task, especially in the gray areas of datasets and code. 
While individual cases may be exceedingly complex, general understandings of copyright 
principles and commonly applied licenses can provide much-needed clarity to many 
circumstances. Given that copyright can seem arcane or intimidating to many, the information 
that a data curator provides to a researcher can be essential in helping them understand 
licenses and choose the one that best matches their intent for their data. Ultimately, the role of 
the data curator when it comes to copyright is to inform the researcher. Final decisions 
regarding the reuse of data or which licenses to apply to new datasets and code belong to the 
researcher, even when the curator may disagree.

Bibliography and Further Reading
About CC Licenses. (n.d.). Creative Commons. Retrieved July 2, 2024, from 

https://creativecommons.org/share-your-work/cclicenses/

Barsness, S., Cummins, J., Fernandez, M., James, A., Pierce Farrier K., Pringle, J., Carroll, 
SR. Taitingfong, R., & Wieker, A. (2023). CARE Data Principles Primer. Data Curation 
Network. Retrieved July 2, 2024, from 
https://github.com/DataCurationNetwork/data-primers/blob/main/CARE%20Primer/care
-primer.md. 

Benson, S. R. (2019). Fear & Fair Use: Addressing the Affective Domain. Association of 
College and Research Libraries. https://hdl.handle.net/2142/105485.

Best Practices for Sharing Research Software | Data Science at NIH. (n.d.). National 
Institutes of Health. Retrieved November 11, 2024, from 
https://datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-soft
ware-faq

BSD 3-Clause No Nuclear License. (2009). Software Package Data Exchange. Retrieved 
November 11, 2024, from 
https://spdx.org/licenses/BSD-3-Clause-No-Nuclear-License.html

CARE Principles. (2023, January 23). Global Indigenous Data Alliance. 
https://www.gida-global.org/care

Chapter XI: Other Post Award Requirements and Considerations - Proposal & Award 
Policies & Procedures Guide (PAPPG) (NSF 24-1). (May 20, 2024). National Science 
Foundation. Retrieved November 11, 2024, from 
https://new.nsf.gov/policies/pappg/24-1/ch-11-other-post-award-requirements

17

https://creativecommons.org/share-your-work/cclicenses/
https://creativecommons.org/share-your-work/cclicenses/
https://www.go-fair.org/fair-principles/
https://github.com/DataCurationNetwork/data-primers/blob/main/CARE%20Primer/care-primer.md
https://github.com/DataCurationNetwork/data-primers/blob/main/CARE%20Primer/care-primer.md
https://hdl.handle.net/2142/105485
https://hdl.handle.net/2142/105485
https://datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-software-faq
https://datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-software-faq
https://datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-software-faq
https://spdx.org/licenses/BSD-3-Clause-No-Nuclear-License.html
https://spdx.org/licenses/BSD-3-Clause-No-Nuclear-License.html
https://www.gida-global.org/care
https://www.gida-global.org/care
https://new.nsf.gov/policies/pappg/24-1/ch-11-other-post-award-requirements
https://new.nsf.gov/policies/pappg/24-1/ch-11-other-post-award-requirements


Code of Best Practices in Fair Use for Software Preservation. (2012). Association of 
Research Libraries. Retrieved November 11, 2024, from 
https://www.arl.org/wp-content/uploads/2014/01/code-of-best-practices-fair-use.pdf.

Compendium of U.S. Copyright Office Practices, 313.3(C) Facts. (n.d.). U.S. Copyright 
Office. Retrieved July 2, 2024, from https://www.copyright.gov/comp3/

Data Governance. (n.d.). University of Wisconsin-Madison. Retrieved July 2, 2024, from 
https://data.wisc.edu/data-governance/

Data ownership. (n.d.). National Center for Advancing Translational Sciences. Retrieved 
July 2, 2024, from https://toolkit.ncats.nih.gov/glossary/data-ownership

Darragh, Jen; Hofelich Mohr, Alicia; Hunt, Shanda; Woodbrook, Rachel; Fearon, Dave; 
Moore, Jennifer; and Hadley, Hannah. (2020). Human Subjects Data Essentials Data 
Curation Primer. Data Curation Network. Retrieved July 2, 2024, from 
https://github.com/DataCurationNetwork/data-primers/blob/main/Human%20Participant
s%20Data%20Essentials%20Data%20Curation%20Primer/human-participants-data-es
sentials-data-curation-primer.md.

Dear Colleague Letter: Office of Polar Programs Data, Code, and Sample Management 
Policy. (2022, July 14). National Science Foundation. Retrieved November 11, 2024, 
from https://www.nsf.gov/pubs/2022/nsf22106/nsf22106.jsp

FAIR Principles. (n.d.). GO FAIR. Retrieved July 2, 2024, from 
https://www.go-fair.org/fair-principles/

Fadler, M., & Legner, C. (2022). Data ownership revisited: Clarifying data accountabilities in 
times of big data and analytics. Journal of Business Analytics, 5(1), 123–139. 
https://doi.org/10.1080/2573234X.2021.1945961

Feist Publications, Inc. V. Rural Tel. Serv. Co. : 499 U.S. 340 (1991): Justia US Supreme 
Court Center. Retrieved July 2, 2024, from 
https://supreme.justia.com/cases/federal/us/499/340/

Frequently Asked Questions about the GNU Licenses. (n.d.). Free Software Foundation. 
Retrieved November 11, 2024, from 
https://www.gnu.org/licenses/gpl-faq.en.html#GPLIncompatibleLibs

Friedlander, A. (2023). NASA’s public access plan. National Aeronautics and Space 
Administration. Retrieved July 2, 2024, from 
https://www.nasa.gov/wp-content/uploads/2021/12/nasa-ocs-public-access-plan-may-2
023.pdf. .

Gent, E. Public AI Training Datasets Are Rife With Licensing Errors. (2023, November 8). 
IEEE Spectrum. Retrieved July 2, 2024, from https://spectrum.ieee.org/data-ai. 

18

https://www.arl.org/wp-content/uploads/2014/01/code-of-best-practices-fair-use.pdf
https://www.copyright.gov/comp3/
https://data.wisc.edu/data-governance/
https://data.wisc.edu/data-governance/
https://toolkit.ncats.nih.gov/glossary/data-ownership
https://www.go-fair.org/fair-principles/
https://github.com/DataCurationNetwork/data-primers/blob/main/Human%20Participants%20Data%20Essentials%20Data%20Curation%20Primer/human-participants-data-essentials-data-curation-primer.md
https://github.com/DataCurationNetwork/data-primers/blob/main/Human%20Participants%20Data%20Essentials%20Data%20Curation%20Primer/human-participants-data-essentials-data-curation-primer.md
https://github.com/DataCurationNetwork/data-primers/blob/main/Human%20Participants%20Data%20Essentials%20Data%20Curation%20Primer/human-participants-data-essentials-data-curation-primer.md
https://www.nsf.gov/pubs/2022/nsf22106/nsf22106.jsp
https://www.go-fair.org/fair-principles/
https://www.go-fair.org/fair-principles/
https://doi.org/10.1080/2573234X.2021.1945961
https://doi.org/10.1080/2573234X.2021.1945961
https://supreme.justia.com/cases/federal/us/499/340/
https://supreme.justia.com/cases/federal/us/499/340/
https://www.gnu.org/licenses/gpl-faq.en.html#GPLIncompatibleLibs
https://www.gnu.org/licenses/gpl-faq.en.html#GPLIncompatibleLibs
https://www.nasa.gov/wp-content/uploads/2021/12/nasa-ocs-public-access-plan-may-2023.pdf
https://www.nasa.gov/wp-content/uploads/2021/12/nasa-ocs-public-access-plan-may-2023.pdf
https://spectrum.ieee.org/data-ai


GNU Free Documentation License v1.3. (2008). Free Software Foundation. Retrieved 
November 11, 2024, from https://www.gnu.org/licenses/fdl-1.3.en.html

Good data practices: Removing barriers to data reuse with CC0 licensing. (2023, May 30) 
Dryad. Retrieved July 2, 2024, from 
https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-r
euse-with-cc0-licensing/. 

Hollich, S. (2024, April 24). MJFF Data Community - Creative Commons Training: Copyright 
and Open Licensing [Video recording]. Zenodo. 
https://doi.org/10.5281/zenodo.11062207

Know Your Copyrights. (n.d.). Association of Research Libraries. Retrieved November 11, 
2024, from https://www.arl.org/know-your-copyrights/

Monkey selfie copyright dispute. (2024). In Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Monkey_selfie_copyright_dispute&oldid=125
4626044

Open Data Commons: Legal tools for open data. (n.d.). Open Data Commons. Retrieved 
November 11, 2024, from https://opendatacommons.org/

Research Data Ownership Policy. (2019). Harvard University Office of the Vice Provost for 
Research. 
https://cpb-us-e1.wpmucdn.com/websites.harvard.edu/dist/6/18/files/2020/07/data_ow
nership_policy_08.06.19.pdf. 

Saenen, B. (2024). Developing and Aligning Policies on Research Software: 
Recommendations for Research Funding and Research Performing Organisations. 
https://doi.org/10.5281/zenodo.13740999

TLDRLegal—Software Licenses Explained in Plain English. (n.d.). Retrieved November 11, 
2024, from https://www.tldrlegal.com/

What is Copyright? (n.d.). U.S. Copyright Office. Retrieved November 11, 2024, from 
https://www.copyright.gov/what-is-copyright/

Writing a Data Management & Sharing Plan | Data Sharing. (n.d.). National Institues of 
Health. Retrieved November 11, 2024, from 
https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-f
or-data-management-and-sharing/writing-a-data-management-and-sharing-plan#elem
ents-to-include-in-a-data-management-and-sharing-plan

Additional Resources on Navigating Licenses
Creative Commons Licenses: 

19

https://www.gnu.org/licenses/fdl-1.3.en.html
https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/
https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/
https://doi.org/10.5281/zenodo.11062207
https://doi.org/10.5281/zenodo.11062207
https://www.arl.org/know-your-copyrights/
https://en.wikipedia.org/w/index.php?title=Monkey_selfie_copyright_dispute&oldid=1254626044
https://en.wikipedia.org/w/index.php?title=Monkey_selfie_copyright_dispute&oldid=1254626044
https://en.wikipedia.org/w/index.php?title=Monkey_selfie_copyright_dispute&oldid=1254626044
https://opendatacommons.org/
https://cpb-us-e1.wpmucdn.com/websites.harvard.edu/dist/6/18/files/2020/07/data_ownership_policy_08.06.19.pdf
https://cpb-us-e1.wpmucdn.com/websites.harvard.edu/dist/6/18/files/2020/07/data_ownership_policy_08.06.19.pdf
https://cpb-us-e1.wpmucdn.com/websites.harvard.edu/dist/6/18/files/2020/07/data_ownership_policy_08.06.19.pdf
https://doi.org/10.5281/zenodo.13740999
https://doi.org/10.5281/zenodo.13740999
https://www.tldrlegal.com/
https://www.copyright.gov/what-is-copyright/
https://www.copyright.gov/what-is-copyright/
https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-for-data-management-and-sharing/writing-a-data-management-and-sharing-plan#elements-to-include-in-a-data-management-and-sharing-plan
https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-for-data-management-and-sharing/writing-a-data-management-and-sharing-plan#elements-to-include-in-a-data-management-and-sharing-plan
https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-for-data-management-and-sharing/writing-a-data-management-and-sharing-plan#elements-to-include-in-a-data-management-and-sharing-plan
https://sharing.nih.gov/data-management-and-sharing-policy/planning-and-budgeting-for-data-management-and-sharing/writing-a-data-management-and-sharing-plan#elements-to-include-in-a-data-management-and-sharing-plan


About CC Licenses. (n.d.). Creative Commons. Retrieved July 2, 2024, from 
https://creativecommons.org/share-your-work/cclicenses/.

Open Source Licenses for Software: 

Choose an open source license. (n.d.). Choose a License. Retrieved July 2, 2024, from 
https://choosealicense.com/

Licenses. (n.d.). Open Source Initiative. Retrieved November 11, 2024, from 
https://opensource.org/licenses

Joinup Licensing Assistant: 

JLA - Find and compare software licenses. (n.d.). Joinup. Retrieved July 2, 2024, from 
https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-an
d-compare-software-licenses

GitHub Repository Licenses: 

Licensing a repository. (n.d.). GitHub. Retrieved July 2, 2024, from 
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-featur
es/customizing-your-repository/licensing-a-repository

20

https://creativecommons.org/share-your-work/cclicenses/
https://choosealicense.com/
https://choosealicense.com/
https://opensource.org/licenses
https://opensource.org/licenses
https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses
https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses
https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository

	Overview
	Introduction
	Copyright Licensing for Datasets and Code
	Common Licenses Applied to Datasets
	Code Ownership
	Who owns the copyright on computer code?
	Common Licenses Applied to Code
	What license should be used for computer code?
	Code Documentation

	Institutional Policies Impacting Data Licensing and Ownership
	Institutional data governance does not always address research data
	Data Ownership

	Interpreting and Applying Dataset Licenses
	A note on legal advice
	Choosing a license for a newly created dataset
	Factors impacting data sharing
	Helping researchers choose a license
	Navigating and Interpreting Licenses Applied to Datasets That Are Being Reused
	Navigating and Interpreting Licenses Applied to Code That Is Being Reused

	Challenges to Understanding Licenses

	Conclusion
	Bibliography and Further Reading
	Additional Resources on Navigating Licenses