• Share this article:

Using Usage Data

Friday, June 6, 2008 - 18:00 by Mike Milinkovich

Continuing from yesterday’s post on the Usage Data Collector, I thought we should also describe how we plan to use the UDC data. In the official, lawyer-approved Terms of Use, we have stipulated the following:

  • The Eclipse Foundation may, in its sole discretion, make available to organizations and individuals, on a case-by case-basis, the data that it collects through the Usage Data Collector, whether in raw or aggregated form.
  • The Eclipse Foundation will publish summary reports based on the data obtained. These reports will be made available in machine readable format that will allow individuals and organizations to undertake further analysis.
  • Potential uses of the summary reports may include, but not limited to: 1) Eclipse project committers who want to better understand how individuals are using their projects, 2) usage of Eclipse projects and third-party Eclipse plug-ins, and 3) an estimate to the number of individuals using Eclipse. It is expected that the summary reports and raw data may be used for other purposes that we have not envisioned at this point in time.

However, I think it would be useful to explain more about how the Foundation plans to use the data. First I think it is important to reinforce what and how we are collecting the data.

  1. UDC will only be included in the EPP packages. If someone does not want to download UDC, the ‘classic’ Eclipse SDK will be ‘udc-free’.
  2. UDC is opt-in, so each user must agree to send the data, along with any optionally selected general demographic information.
  3. UDC provides the ability to filter the data, so you can send only information about org.eclipse bundles or specifically not send information about bundles that have ‘xxx.yyyyy’. This is important if you want have sensitive plug-ins that you don’t want to share data about.
  4. All the data that is collected is anonymous. For each participant, a unique ID is created by the UDC that allows us to aggregate data for that participant. The unique ID does not allow us to identify the participant in any way. We are not collecting any IP addresses and can not aggregate the data by organizations or companies. To be really specific: we cannot trace the ip address from the upload to the keys contained in the uploaded files. The data is completely anonymous from the source.
  5. We do plan to capture the country location so we can report the data by geography.

So what do we plan on doing with the data? The first priority is to provide a service for the committers and projects. We intend to create and publish a series of reports that will only include information about bundles that include ‘org.eclipse’. Information about bundles from other organizations will not be included in these reports. We intend to make these reports publicly available. The committer community will not have access to the raw data.

We have already been approached by a number of academics in universities that are interested in analyzing the data as part of their academic research. In principle we would like to support and encourage this type of research. One valuable results of UDC could be a better understanding of how people use IDE’s and develop code. At this time, we don’t have a process to make the raw data available to academics but if we do the raw data would be made available under a confidentiality agreement that enforces the Eclipse Foundation privacy policy.

In the future we do think that organizations, in particular the Eclipse membership, will be interested in accessing the UDC data. We think they will be interested in understanding how their products are being used and how they compare with other products in the industry. If we provide this information, it would be in the form of machine readable reports, not access to the raw data. The reports would be scrubbed to ensure only information about the relevant products are included.

Finally, I’d like to address the question of ‘selling the data’. We will not sell the raw data to anyone. Period. However, we may sell reports of the data to organizations. At this time, we have no idea if there is any commercial value for any reports. We do hope there is some commercial value and we can develop either a future revenue stream for the Eclipse Foundation, or use the UDC reports to generate increased value for memberships at Eclipse.

As a reminder, the Eclipse Foundation is a not-for-profit entity that is funded by membership dues. If we want to provide additional services to the Eclipse community and its projects, we need funds. If we are able to create either new revenue stream, or enhanced membership revenue in a way that respects the privacy and integrity of the Eclipse community, it could be a good thing for the entire community.

I hope this provides a bit more insight into our current thinking. A lot will depend on how many people participate with UDC and the type of information we can report from the data. I am optimistic that this will be a great service for the Eclipse community.