--- title: "ML Code Completeness Checklist Analysis" output: pdf_document: default html_notebook: default --- This notebook contains the ML Code Completeness analysis for NeurIPS 2019 repositories. For a run & rendered version of this notebook please see: [code_checklist-analysis.pdf](code_checklist-analysis.pdf). Official repositories for NeurIPS 2019 papers fetched from: https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019 A random 25% sample has been selected and manually annotated according to the 5 critera of the ML Code Completness Checklist. The result has been saved into `code_checklist-neurips2019.csv`. ```{r} library(tidyverse) library(ggplot2) library(MASS) library(RColorBrewer) t = read_csv("code_checklist-neurips2019.csv") cat("Number of rows:", nrow(t), "\n") ``` We'll focus only on Python repositories, since this is the dominant language in ML and repositories in other languages tend to have a smaller number of stars just because the community is smaller. ```{r} t = t[t$python==1,] cat("Number of rows:", nrow(t), "\n") ``` Next, we calculate the score as a sum of of individual checklist items and calculate summary stats. ```{r} t$score = rowSums(t[,4:8]) ``` We group repositories based on their score and calculate summary stats. ```{r} cat("Spread of values in each group:\n") summaries = tapply(t$stars, t$score, summary) names(summaries) = paste(names(summaries), "ticks") print(summaries) cat("Proportion of repos in each group:\n") props = tapply(t$stars, t$score, length) props = props/sum(props) names(props) = paste(names(props), "ticks") print(props) # Extract medians medians = unlist(lapply(tapply(t$stars, t$score, summary), function(x) x["Median"])) names(medians) = paste(sub(".Median", "", names(medians)), "ticks") ``` Generate summary graphs. ```{r} par(oma=c(0,1,0,1)) layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(3,2)) barplot(medians, xlab="", ylab="Median GitHub stars", ylim=c(0,200), col=brewer.pal(6, "Blues"), cex.axis=0.6, cex.names=0.6) mtext("GitHub repos grouped by number of ticks on ML code checklist", side=1, line=3, cex=0.6) pie(rev(props), col=rev(brewer.pal(6, "Blues")), cex=0.6) mtext("Proportion of repositories in each group", side=1, line=3, cex=0.6) ``` Compare using box plots. ```{r} tp = t tp$score = as.factor(tp$score) par(mfrow=c(1,1)) boxplot(stars~score, data=t, ylim=c(0,200), col=brewer.pal(6, "Blues"), xlab="ML code checklist ticks", ylab="Github stars") ``` Fit robust regression and test significance of results ```{r} print(summary(rlm(stars~training+evaluation+pretrained_model+results+dependencies, data=t))) for(i in 0:4){ cat("\nScore5 vs Score", i, "\n") print(wilcox.test(t$stars[t$score==5], t$stars[t$score==i])) } ``` ### Session information ```{r} sessionInfo() ```