• Share this article:

Revising the Eclipse IP Policy: Third Party Content

Wednesday, October 9, 2019 - 23:07 by Wayne Beaton

The Eclipse Foundation is in the process of making a major update to our Intellectual Property Policy. A big part of this update is a change in the way that we will manage third party content. 

In the context of the Eclipse IP Policy, “third party content” is content that is leveraged by the Eclipse open source project, but not otherwise produced or managed by an Eclipse open source project. A library produced by, say, an Apache open source project, is considered to be third party content. Today, the IP Policy requires that all third party content must be vetted by the Eclipse IP Team before it can be used by an Eclipse Project. Pending approval from the Eclipse Board of Directors, we’re planning to turn this around.

Upon approval of these updates, project teams will be able to introduce new third party content during a development cycle without first checking with the IP Team. That is, a project team may commit build scripts, code references, etc. to third party content to their source code repository without first creating a contribution questionnaire (CQ) to request IP Team review and approval of the third party content. At least during the development period between releases, the onus is on the project team to–with reasonable confidence–ensure any third party content that they introduce is license compatible with the project’s license. Before any content may be included in any formal release the project team must validate that the third party content licenses are compatible with the project license.

Traditionally, releases are preceded by a release review and, as part of that release review, the Eclipse IP Team engages in a review of the project’s record of intellectual property contributions and third party content use (the IP Log). It is during that IP Log review that the IP Team will validate the state of license compatibility. I say “traditionally”, because we changed tradition, or more specifically, we changed the Eclipse Development Process in late 2018, to make it so that a project team may engage in any number of major and minor releases for an entire year following a successful release review. In the case where a release does not require a review (and so there is no trigger to engage in an IP Log review), the onus is on the project team to ensure the license compatibility of all referenced third party content. But we’re not leaving project teams high-and-dry: the IP Team can still help with the validation, even when a formal review is not required.

We’ve been experimenting with processes and tools to help us with license compatibility validation. These tools will be used by the IP Team during their evaluation, and will be made available to project teams as well. Ideally, project teams will integrate the license compatibility validation tool into their builds so that the tool may identify content that requires further scrutiny, and present it to the IP Team to resolve early in their development cycle so that–by the time we run the tool to validate the content at the end of the release cycle–all identified content will already have been resolved and the IP Team can just “rubber stamp” it. 

This should provide significant flexibility for project teams to experiment with different libraries and versions, while also making the IP due diligence process more streamlined and predictable.

An important part of making this work, is the leveraging of existing databases of information. Over the years, we’ve accumulated a significant amount of knowledge about a great many libraries, but others have also done a great deal of work. The new process will leverage other trusted sources of information (more on this in a future post). We’re going to get out of the business of scanning through every single bit of source code ourselves, and instead trust our own database and other sources of information (and contribute to these other sources of information). 

Our prototype tool focuses on a bill of materials. Each entry in the bill of materials identifies a particular third party library. To identify a particular third party library, we’ve decided to adopt the ClearlyDefined project’s five part identifiers which includes, the type of content, its software repository source, its namespace and name, and version. ClearlyDefined coordinates are roughly analogous to Maven coordinates which unambiguously identify a particular piece of software by groupid, artifactid, and version (e.g., org.junit.jupiter:junit-jupiter:5.5.2 unambiguously identifies content that is known to be under the EPL-2.0). The ClearlyDefined coordinate system adds the type of the content as “maven” and its source as “mavencentral”, so org.junit.jupiter:junit-jupiter:5.5.2 becomes maven/mavencentral/org.junit.jupiter/junit-jupiter/5.5.2 (note that, at least theoretically, the source could be a different Maven repository). We selected ClearlyDefined coordinates at least in part because we have projects that use languages that are not Java and software repositories that are not Maven Central; using these coordinates, we can also identify, for example, NPM content (e.g., npm/npmjs/@babel/generator/7.6.2). 

A bill of materials, then, is a list of ClearlyDefined coordinates (the prototype tool automatically translates a Maven dependency list or a node package-lock file into this coordinate system).

The Maven Dependency plugin can be used to generate a list of dependencies (as Maven coordinates):

$> mvn dependency:list -DoutputFile=project.deps -DappendOutput=true

The output from the Maven Dependency plugin takes the following form:

The following files have been resolved:
   org.apache.commons:commons-lang3:jar:3.4:compile
   org.slf4j:slf4j-api:jar:1.7.21:compile
   org.slf4j:slf4j-log4j12:jar:1.7.21:compile
   log4j:log4j:jar:1.2.17:compile
   commons-logging:commons-logging:jar:1.1.1:compile
   ...
The following files have been resolved:
  org.apache.commons:commons-lang3:jar:3.4:compile
   com.clearspring.analytics:stream:jar:2.9.6:compile
   ...

The prototype tool generates a bill of materials that looks something like the following:

maven/mavencentral/org.apache.commons/commons-lang3/3.4, Apache-2.0, approved
maven/mavencentral/org.slf4j/slf4j-api/1.7.21, MIT, approved
maven/mavencentral/org.slf4j/slf4j-log4j12/1.7.21, MIT, approved
maven/mavencentral/log4j/log4j/1.2.17, Apache-2.0, approved
maven/mavencentral/commons-logging/commons-logging/1.1.1, Apache-2.0, approved
maven/mavencentral/com.clearspring.analytics/stream/2.9.6, Apache-2.0, approved
...

The output actually also includes a handful of URLs with the output (e.g., pointers to the source code when they’re known), but I’ve removed them to focus on the important bits. You’ll also notice that content that is repeated in the dependency list only appears once in the output.

For each library, the tool determines the license and whether or not it is approved for use (I’ll discuss how it works in a future post).  Any content that is listed as restricted instead of approved must be reviewed and resolved by the IP Team before it can be included in any official project release (I’ll discuss this in a future post). 

At present, the prototype tool only identifies licenses and assesses compatibility to determine whether or not the content is approved. The output is easily parsed to identify problematic content, but our intent is to make it more immediately helpful without requiring advanced bash-fu skills. There’s plenty of opportunity for further automation, including rolling the prototype into a Maven plugin to incorporate directly into a build and support for other build systems.

This bill of materials becomes the third-party content part of the project’s IP Log. This has the added benefit of being 100% accurate without the need for things like piggyback contribution questionnaires (CQs). At least in the short term, the IPzilla system and CQs will remain the main means by which project committers interact with the IP Team, but only for content that requires investigation.