100 lines
2.8 KiB
Plaintext
100 lines
2.8 KiB
Plaintext
---
|
|
title: "ML Code Completeness Checklist Analysis"
|
|
output:
|
|
pdf_document: default
|
|
html_notebook: default
|
|
---
|
|
|
|
This notebook contains the ML Code Completeness analysis for NeurIPS 2019 repositories.
|
|
|
|
For a run & rendered version of this notebook please see: [code_checklist-analysis.pdf](code_checklist-analysis.pdf).
|
|
|
|
Official repositories for NeurIPS 2019 papers fetched from: https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019
|
|
|
|
A random 25% sample has been selected and manually annotated according to the 5 critera of the ML Code Completness Checklist. The result has been saved into `code_checklist-neurips2019.csv`.
|
|
|
|
```{r}
|
|
library(tidyverse)
|
|
library(ggplot2)
|
|
library(MASS)
|
|
library(RColorBrewer)
|
|
|
|
t = read_csv("code_checklist-neurips2019.csv")
|
|
cat("Number of rows:", nrow(t), "\n")
|
|
```
|
|
|
|
We'll focus only on Python repositories, since this is the dominant language in ML and repositories in other languages tend to have a smaller number of stars just because the community is smaller.
|
|
|
|
```{r}
|
|
t = t[t$python==1,]
|
|
cat("Number of rows:", nrow(t), "\n")
|
|
```
|
|
|
|
Next, we calculate the score as a sum of of individual checklist items and calculate summary stats.
|
|
|
|
```{r}
|
|
t$score = rowSums(t[,4:8])
|
|
```
|
|
|
|
We group repositories based on their score and calculate summary stats.
|
|
|
|
```{r}
|
|
|
|
cat("Spread of values in each group:\n")
|
|
summaries = tapply(t$stars, t$score, summary)
|
|
names(summaries) = paste(names(summaries), "ticks")
|
|
print(summaries)
|
|
|
|
cat("Proportion of repos in each group:\n")
|
|
props = tapply(t$stars, t$score, length)
|
|
props = props/sum(props)
|
|
names(props) = paste(names(props), "ticks")
|
|
print(props)
|
|
|
|
# Extract medians
|
|
medians = unlist(lapply(tapply(t$stars, t$score, summary), function(x) x["Median"]))
|
|
names(medians) = paste(sub(".Median", "", names(medians)), "ticks")
|
|
```
|
|
|
|
Generate summary graphs.
|
|
|
|
```{r}
|
|
par(oma=c(0,1,0,1))
|
|
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(3,2))
|
|
barplot(medians,
|
|
xlab="",
|
|
ylab="Median GitHub stars", ylim=c(0,200),
|
|
col=brewer.pal(6, "Blues"), cex.axis=0.6, cex.names=0.6)
|
|
mtext("GitHub repos grouped by number of ticks on ML code checklist", side=1, line=3, cex=0.6)
|
|
|
|
pie(rev(props), col=rev(brewer.pal(6, "Blues")), cex=0.6)
|
|
mtext("Proportion of repositories in each group", side=1, line=3, cex=0.6)
|
|
```
|
|
|
|
Compare using box plots.
|
|
|
|
```{r}
|
|
tp = t
|
|
tp$score = as.factor(tp$score)
|
|
par(mfrow=c(1,1))
|
|
boxplot(stars~score, data=t, ylim=c(0,200), col=brewer.pal(6, "Blues"),
|
|
xlab="ML code checklist ticks", ylab="Github stars")
|
|
```
|
|
|
|
Fit robust regression and test significance of results
|
|
|
|
```{r}
|
|
print(summary(rlm(stars~training+evaluation+pretrained_model+results+dependencies, data=t)))
|
|
|
|
for(i in 0:4){
|
|
cat("\nScore5 vs Score", i, "\n")
|
|
print(wilcox.test(t$stars[t$score==5], t$stars[t$score==i]))
|
|
}
|
|
```
|
|
|
|
### Session information
|
|
|
|
```{r}
|
|
sessionInfo()
|
|
```
|