Skip to Content

technology.scout

About this document

This document is a R notebook, dynamically created from the numbers extracted on the project. It lists all datasets published for the project, providing basic numbers, figures and a quick summary, and serves as a test case to make sure that all the required data is present and roughly consistent with requirements. All plots and tables are computed from the actual data as provided in the downloads.

To re-execute the document, simply start a R session, load rmarkdown and render the page with the project ID as a parameter:

require('rmarkdown')
render("datasets_report.Rmarkdown", params = list(project_id = "technology.scout"), output_format="html_document")

This website uses the blogdown R package, which provides a different output_format for the hugo framework.

This report was generated on 2021-04-25.

Downloads

All data is retrieved from Alambic, an open-source framework for development data extraction and processing.

This project’s analysis page can be found on the Alambic instance for the Eclipse forge, at https://eclipse.alambic.io/projects/technology.scout.

Downloads are composed of gzip’d CSV and JSON files. CSV files always have a header to name the fields, which makes it easy to import in analysis software like R:

data <- read.csv(file='myfile.csv', header=T)
names(data)

List of datasets generated for the project:

  • Git
    • Git Commits (CSV) – Full list of commits with id, message, time, author, committer, and added, deleted and modifed lines.
    • Git Commits Evol (CSV) – Evolution of number of commits and authors by day.
    • Git Log (TXT) – the raw export of git log.
  • Bugzilla
  • Eclipse Forums
    • Forums Posts (CSV) – list of all forum posts for this project.
    • Forums threads (CSV) – list of all forum threads for this project.
  • Jenkins CI
  • Eclipse PMI
    • PMI Checks (CSV) – list of all checks applied to the Project Management Infrastructure entries for the project.
  • ScanCode

Git

Git commits

Download: git_commits_evol.csv.gz

data <- read.csv(file=file_git_commits_evol, header=T)

File is git_commits_evol.csv, and has 3 columns for 2571 entries.

data$commits_sum <- cumsum(data$commits)
data.xts <- xts(x = data[,c('commits_sum', 'commits', 'authors')], order.by=as.POSIXct(as.character(data[,c('date')]), format="%Y-%m-%d"))

time.min <- index(data.xts[1,])
time.max <- index(data.xts[nrow(data.xts)])
all.dates <- seq(time.min, time.max, by="days")
empty <- xts(order.by = all.dates)

merged.data <- merge(empty, data.xts, all=T)
merged.data[is.na(merged.data) == T] <- 0

p <-dygraph(merged.data[,c('commits')],
        main = paste('Daily commits for ', project_id, sep=''),
        width = 800, height = 250 ) %>%
      dyRangeSelector()
p


Git log

Download: git_log.txt.gz

File is git_log.txt, and full log has 135195 lines.


Bugzilla

Bugzilla issues

Download: bugzilla_issues.csv.gz

data <- read.csv(file=file_bz_issues, header=T)

File is bugzilla_issues.csv, and has 17 columns for 1988 issues.

Bugzilla open issues

Download: bugzilla_issues_open.csv.gz

data <- read.csv(file=file_bz_issues_open, header=T)

File is bugzilla_issues_open.csv, and has 17 columns for 9 issues (all open).

Bugzilla evolution

Download: bugzilla_evol.csv.gz

data <- read.csv(file=file_bz_evol, header=T)

File is bugzilla_evol.csv, and has 3 columns for 905 weeks.

Let’s try to plot the monthly number of submissions for the project:

Versions

Download: bugzilla_versions.csv.gz

data <- read.csv(file=file_bz_versions, header=T)

File is bugzilla_versions.csv, and has 2 columns for 13 weeks.

Components

Download: bugzilla_components.csv.gz

data <- read.csv(file=file_bz_components, header=T)

File is bugzilla_components.csv, and has 2 columns for 3 weeks.

data.sorted <- data[order(data$Bugs, decreasing = T),]

g <- gvisColumnChart(data.sorted, options=list(title='List of product components', legend="{position: 'none'}", width="automatic", height="300px"))
plot(g)

Eclipse Forums

Forums posts

Download: eclipse_forums_posts.csv.gz

data <- read.csv(file=file_forums_posts, header=T)

File is eclipse_forums_posts.csv, and has 6 columns for 7437 posts. The evolution of posts

data$created.date <- as.POSIXct(data$created_date, origin="1970-01-01")
posts.xts <- xts(data, order.by = data$created.date)

time.min <- index(posts.xts[1,])
time.max <- index(posts.xts[nrow(posts.xts)])
all.dates <- seq(time.min, time.max, by="weeks")
empty <- xts(order.by = all.dates)

merged.data <- merge(empty, posts.xts$id, all=T)
merged.data[is.na(merged.data) == T] <- 0

posts.weekly <- apply.weekly(x=merged.data, FUN = nrow)
names(posts.weekly) <- c("posts")

p <- dygraph(
  data = posts.weekly[-1,],
  main = paste('Weekly forum posts for ', project_id, sep=''),
  width = 800, height = 250 ) %>%
  dyAxis("x", drawGrid = FALSE) %>%
  dySeries("posts", label = "Weekly posts") %>%
  dyOptions(stepPlot = TRUE) %>%
  dyRangeSelector()
p

The list of the 10 last active posts on the forums:

data$created.date <- as.POSIXct(data$created_date, origin="1970-01-01")
posts.table <- head(data[,c('id', 'subject', 'created.date', 'author_id')], 10)
posts.table$subject <- paste('<a href="', posts.table$html_url, '">', posts.table$subject, '</a>', sep='')
posts.table$created.date <- as.character(posts.table$created.date)
names(posts.table) <- c('ID', 'Subject', 'Post date', 'Post author')

print(
    xtable(head(posts.table, 10),
        caption = paste('10 most recent posts on', project_id, 'forum.', sep=" "),
        digits=0, align="lllll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
10 most recent posts on technology.scout forum.
ID Subject Post date Post author
1840798 Re: NullPointerException in AbstractRestClientHelper derived class 2021-04-23 22:31:41 231363
1840797 Re: NullPointerException in AbstractRestClientHelper derived class 2021-04-23 22:26:34 231363
1840793 Re: NullPointerException in AbstractRestClientHelper derived class 2021-04-23 18:31:12 215671
1840781 Re: Session Unload Error Page 2021-04-23 14:46:49 77727
1840769 Re: NullPointerException in AbstractRestClientHelper derived class 2021-04-23 13:52:22 152412
1840755 NullPointerException in AbstractRestClientHelper derived class 2021-04-23 12:26:54 231363
1840707 Re: Site page error after upgrading from Scout 9 to 10 2021-04-22 09:52:09 232030
1840706 Re: Session Unload Error Page 2021-04-22 09:36:36 228949
1840694 Re: Site page error after upgrading from Scout 9 to 10 2021-04-22 07:31:28 152412
1840690 Re: Site page error after upgrading from Scout 9 to 10 2021-04-22 07:12:35 232030


Forums threads

Download: eclipse_forums_threads.csv.gz

data <- read.csv(file=file_forums_threads, header=T)

File is eclipse_forums_threads.csv, and has 8 columns for 1654 threads. A wordcloud with the main words used in threads is presented below.

The list of the 10 last active threads on the forums:

data$last.post.date <- as.POSIXct(data$last_post_date, origin="1970-01-01")
threads.table <- head(data[,c('id', 'subject', 'last.post.date', 'last_post_id', 'replies', 'views')], 10)
threads.table$subject <- paste('<a href="', threads.table$html_url, '">', threads.table$subject, '</a>', sep='')
threads.table$last.post.date <- as.character(threads.table$last.post.date)
names(threads.table) <- c('ID', 'Subject', 'Last post date', 'Last post author', 'Replies', 'Views')

print(
    xtable(threads.table,
        caption = paste('10 last active threads on', project_id, 'forum.', sep=" "),
        digits=0, align="lllllll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
10 last active threads on technology.scout forum.
ID Subject Last post date Last post author Replies Views
1107775 NullPointerException in AbstractRestClientHelper derived class 2021-04-23 22:31:41 1840798 4 141
1107748 Site page error after upgrading from Scout 9 to 10 2021-04-22 09:52:09 1840707 4 389
1107743 Session Unload Error Page 2021-04-23 14:46:49 1840781 3 514
1107711 Registration page 2021-04-19 09:28:28 1840552 2 318
1107684 Errors while using CodeType 2021-04-19 08:29:24 1840551 4 382
1107627 Filechooser: how to get the directory containing the file(s) 2021-04-12 06:43:18 1840301 1 264
1107612 AbstractChartControl - how to show chart 2021-04-21 07:20:42 1840636 7 1340
1107518 Error java.sql.SQLException: No suitable driver found for jdbc:mysql:XXXXXXXXXXXX 2021-04-01 15:25:26 1839943 4 258
1107508 JS - Java questions 2021-03-30 17:10:40 1839872 4 338
1107506 Scout Content Assist does not work anymore with 2021.03 2021-04-07 07:58:17 1840127 6 395

Jenkins

Builds

Download: jenkins_builds.csv.gz

data <- read.csv(file=file_jenkins_builds, header=T)

File is jenkins_builds.csv, and has 7 columns for 662 commits.

ID Name Time Result
5 Change permissions recursively to 0664 (rw-rw-r) \#5 1.571155e+12 SUCCESS
2017-12-19\_02-12-59 Change permissions recursively to 0664 (rw-rw-r) \#4 1.513668e+12 FAILURE
2017-06-23\_05-39-12 Change permissions recursively to 0664 (rw-rw-r) \#3 1.498211e+12 SUCCESS
2017-06-23\_05-39-04 Change permissions recursively to 0664 (rw-rw-r) \#2 1.498211e+12 SUCCESS
2017-06-23\_05-38-32 Change permissions recursively to 0664 (rw-rw-r) \#1 1.498211e+12 FAILURE
58 Copy to archive.eclipse.org \#58 1.553066e+12 SUCCESS
57 Copy to archive.eclipse.org \#57 1.553066e+12 SUCCESS
56 Copy to archive.eclipse.org \#56 1.553066e+12 SUCCESS
55 Delete from download.eclipse.org \#55 1.575653e+12 SUCCESS
54 Delete from download.eclipse.org \#54 1.572969e+12 SUCCESS


Jobs

Download: jenkins_jobs.csv.gz

data <- read.csv(file=file_jenkins_jobs, header=T)

File is jenkins_jobs.csv, and has 15 columns for 66 commits.

Name Colour Last build time Health report
Change permissions recursively to 0664 (rw-rw-r) blue 1.571155e+12 60
Copy to archive.eclipse.org blue 1.553066e+12 100
Delete from download.eclipse.org blue 1.575653e+12 100
org.eclipse.scout.rt.branch-10.0\_continuous blue 1.619024e+12 100
org.eclipse.scout.rt.branch-10.0\_continuous\_pipeline blue 1.596454e+12 100
org.eclipse.scout.rt.branch-11.0\_continuous\_pipeline blue 1.619024e+12 100
org.eclipse.scout.rt.branch-22.0\_continuous\_pipeline yellow 1.619023e+12 99
org.eclipse.scout.rt.branch-6.0\_continuous blue 1.595511e+12 60
org.eclipse.scout.rt.branch-6.1\_continuous blue 1.593102e+12 100
org.eclipse.scout.rt.branch-7.0\_continuous blue 1.589200e+12 100


PMI

PMI Checks

Download: eclipse_pmi_checks.csv.gz

data <- read.csv(file=file_pmi_checks, header=T)

File is eclipse_pmi_checks.csv, and has 3 columns for 17 commits.

checks.table <- head(data[,c('Description', 'Value', 'Results')], 10)

print(
    xtable(checks.table,
        caption = paste('Extract of the 10 first PMI checks for ', 
                        project_id, '.', sep=" "),
        digits=0, align="llll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
Extract of the 10 first PMI checks for technology.scout .
Description Value Results
Checks if the URL can be fetched using a simple get query. https://bugs.eclipse.org/bugs/enter\_bug.cgi?product=Scout OK: Create URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://bugs.eclipse.org/bugs/buglist.cgi?product=Scout OK: Query URL could be successfully fetched.
Sends a get request to the given CI URL and looks at the headers in the response (200 404..). Also checks if the URL is really a Hudson instance (through a call to its API). Failed: could not get CI URL \[\].
Checks if the Dev ML URL can be fetched using a simple get query. https://dev.eclipse.org/mailman/listinfo/scout-dev OK: Dev ML URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. http://eclipsescout.github.io/ OK: Documentation URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://www.eclipse.org/downloads/eclipse-packages/ OK: Download URL could be successfully fetched.
Checks if the Forums URL can be fetched using a simple get query. http://www.eclipse.org/forums/eclipse.scout OK. Forum \[eclipse.scout\] correctly defined.\\OK: Forum \[eclipse.scout\] URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. http://eclipsescout.github.io/10.0/beginners-guide.html OK: Documentation URL could be successfully fetched.
Checks if the Mailing lists URL can be fetched using a simple get query. Failed: no mailing list defined.
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for plan.

ScanCode

Authors

Download: scancode_authors.csv.gz

data <- read.csv(file=file_sc_authors, header=T)

File is scancode_authors.csv, and has 2 columns for 18 commits.

Author Count
unknown 5740
BSI Business Systems Integration AG - initial 46
Andreas Hoegger 38
Matthias Villiger 8
BSI AG 4
Ivan Motsch 2
Adrian Moser 1
DateField.js 1
Henry Algus 1
Tim Down 1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Authors for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Copyrights

Download: scancode_copyrights.csv.gz

data <- read.csv(file=file_sc_copyrights, header=T)

File is scancode_copyrights.csv, and has 2 columns for 16 commits.

Copyrights Count
Copyright (c) BSI Business Systems Integration AG. 4648
unknown 1124
Copyright (c) year BSI Business Systems Integration 46
Copyright (c) year BSI Business Systems Integration AG. 46
Copyright The Android Open Source Project 16
Copyright (c) Pivotal Labs 5
Copyright (c) license.git.copyrightYears BSI Business Systems Integration AG. 2
Copyright Tim Down 2
Copyright jQuery Foundation and other contributors 2
  1. BSI Business Systems Integration AG
1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Copyrights for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Holders

Download: scancode_holders.csv.gz

data <- read.csv(file=file_sc_holders, header=T)

File is scancode_holders.csv, and has 2 columns for 14 commits.

Holders Count
BSI Business Systems Integration AG. 4650
unknown 1124
\$ year BSI Business Systems Integration AG. 46
\$ year BSI Business Systems Integration AG.& 13 46
The Android Open Source Project, Inc. 16
Pivotal Labs 5
\$ license.git.copyrightYears BSI Business Systems Integration AG. 2
Tim Down 2
jQuery Foundation and other contributors 2
Henry Algus 1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Holders for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Licences

Download: scancode_licences.csv.gz

data <- read.csv(file=file_sc_licences, header=T)

File is scancode_licences.csv, and has 2 columns for 13 commits.

Licence Count
epl-1.0 4741
unknown 1100
cpl-1.0 AND other-permissive 27
apache-2.0 20
ofl-1.1 14
mit 12
unknown 8
epl-2.0 2
proprietary-license 2
public-domain 2
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

p <- gvisPieChart(data,
              options = list(
                title=paste("Licences for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Programming Languages

Download: scancode_programming_languages.csv.gz

data <- read.csv(file=file_sc_pl, header=T)

File is scancode_licences.csv, and has 2 columns for 8 commits.

Programming Language Count
Java 3563
unknown 1282
JavaScript 819
LessCss 124
HTML 46
Bash 8
CSS 4
Python 2
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

p <- gvisPieChart(data,
              options = list(
                title=paste("Programming languages for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Special files

Download: scancode_special_files.csv.gz

data <- read.csv(file=file_sc_sf, header=T)

File is scancode_special_files.csv, and has 2 columns for 60 commits.

Holders Type
pom.xml manifest
README.md readme
license\_files/copyright.txt legal
org.eclipse.scout.dev.jetty/pom.xml manifest
org.eclipse.scout.dev.jetty.test/pom.xml manifest
org.eclipse.scout.dev.jetty.test.affix/pom.xml manifest
org.eclipse.scout.dev.jetty.test.affix/README.txt readme
org.eclipse.scout.jaxws.apt/pom.xml manifest
org.eclipse.scout.json/LICENSE legal
org.eclipse.scout.json/NOTICE legal