Files
releasing-research-code/notebooks/code_checklist-analysis.Rmd
Robert Stojnic f67b4c72f6 Fix graph
2020-04-07 20:32:26 +01:00

100 lines
2.8 KiB
Plaintext

---
title: "ML Code Completeness Checklist Analysis"
output:
pdf_document: default
html_notebook: default
---
This notebook contains the ML Code Completeness analysis for NeurIPS 2019 repositories.
For a run & rendered version of this notebook please see: [code_checklist-analysis.pdf](code_checklist-analysis.pdf).
Official repositories for NeurIPS 2019 papers fetched from: https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019
A random 25% sample has been selected and manually annotated according to the 5 critera of the ML Code Completness Checklist. The result has been saved into `code_checklist-neurips2019.csv`.
```{r}
library(tidyverse)
library(ggplot2)
library(MASS)
library(RColorBrewer)
t = read_csv("code_checklist-neurips2019.csv")
cat("Number of rows:", nrow(t), "\n")
```
We'll focus only on Python repositories, since this is the dominant language in ML and repositories in other languages tend to have a smaller number of stars just because the community is smaller.
```{r}
t = t[t$python==1,]
cat("Number of rows:", nrow(t), "\n")
```
Next, we calculate the score as a sum of of individual checklist items and calculate summary stats.
```{r}
t$score = rowSums(t[,4:8])
```
We group repositories based on their score and calculate summary stats.
```{r}
cat("Spread of values in each group:\n")
summaries = tapply(t$stars, t$score, summary)
names(summaries) = paste(names(summaries), "ticks")
print(summaries)
cat("Proportion of repos in each group:\n")
props = tapply(t$stars, t$score, length)
props = props/sum(props)
names(props) = paste(names(props), "ticks")
print(props)
# Extract medians
medians = unlist(lapply(tapply(t$stars, t$score, summary), function(x) x["Median"]))
names(medians) = paste(sub(".Median", "", names(medians)), "ticks")
```
Generate summary graphs.
```{r}
par(oma=c(0,1,0,1))
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(3,2))
barplot(medians,
xlab="",
ylab="Median GitHub stars", ylim=c(0,200),
col=brewer.pal(6, "Blues"), cex.axis=0.6, cex.names=0.6)
mtext("GitHub repos grouped by number of ticks on ML code checklist", side=1, line=3, cex=0.6)
pie(rev(props), col=rev(brewer.pal(6, "Blues")), cex=0.6)
mtext("Proportion of repositories in each group", side=1, line=3, cex=0.6)
```
Compare using box plots.
```{r}
tp = t
tp$score = as.factor(tp$score)
par(mfrow=c(1,1))
boxplot(stars~score, data=t, ylim=c(0,200), col=brewer.pal(6, "Blues"),
xlab="ML code checklist ticks", ylab="Github stars")
```
Fit robust regression and test significance of results
```{r}
print(summary(rlm(stars~training+evaluation+pretrained_model+results+dependencies, data=t)))
for(i in 0:4){
cat("\nScore5 vs Score", i, "\n")
print(wilcox.test(t$stars[t$score==5], t$stars[t$score==i]))
}
```
### Session information
```{r}
sessionInfo()
```