Teaching statistics for the future

The MOOC revolution and beyond

Brian Caffo
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health

Teaching statistics for the future

Outline of the talk

  1. What I supposedly do during the day
  2. A brief taxonomy and history of online educational models
  3. Massive Open Online Courses (MOOCs)
  4. JHU Biostat involvement in Coursera
  5. Novel moving target directions of the field statistics
  6. Data Science series
  7. SWIRL

Core team

Drawing Drawing Drawing
Drawing Drawing Drawing

Plus generous contributions from the

What I supposedly do during the day

(Incomplete) characteristics of educational systems

  • Online / in person / blended
  • Active/participatory/interactive learning
  • Scalable / non-scalable
  • Low cost / high cost / freemium
  • Student paced / teacher paced
  • Open / restricted access
  • Flipped / lecture style / blended
  • Open / closed source content
  • Instructor interaction
  • Credentialing
  • Funding model


  • Traditional in person teaching generally has characterists of: in person, lecture style, non-scalable, high cost, restricted access with a large amount of instructor interaction valued credentialing
  • "Tranditional" online courses are online or blended online and in person and otherwise try to approximate traditional in person classes.
  • Online interactive learning (OIL Code School, Code academy) primary characteristics are student-paced interactive learning
  • Intelligent tutoring systems (swirl), like OILs just not necessarily online
  • Khan Academy is online, interactive, scalable and low cost
  • Other modalities : iTunes U, OpenCourseware, Udemy


Primary characteristics are open access, low cost, scalable, online

(every letter is negotiable, from Wikipedia citing Mathieu Plorde)


Most visible MOOC instruction sites


Several university/organization-specific sites, platforms and content delivery systems

  • Stanford, CMU, Duke, Harvard, MIT, google ...
  • Varying degrees of content/delivery
  • EdX platform has been open sourced
  • Google course builder (now contributing to EdX)
  • Massive amount of development going into platforms and instruction sites/portals

Coursera platform, videos


Coursera platform, quizzes


Coursera platform, peer grading


Coursera platform, forums

You're on the internets

(Forums can be brutal)


Johns Hopkins Biostat Coursera classes

Original three

  • Brian Caffo, Roger Peng, Jeff Leek
  • Run 09/2012, 09/2012, 01/2013
    Drawing Drawing Drawing

Since then

  • Scott Zeger introduced class Cased Based Introduction to Statistics
  • Brian introducted MBBC2
  • Martin Lindquist introduced Statistical Analysis of fMRI Data
  • John McGready introduced Statistical Reasoning for Public Health

Enrollments by class and offering

Statements of accomplishment by offering

Over time

Over time, MBBC 1

Completion rates

  • Students participate in MOOCs for a variety of reasons
  • Numerous students sign up for a course, but do not actively participate
  • Recent MBBC 1
    • 17K students
    • 10K accessed the course site ever
    • 7K watched any video ever
    • 2K submitted any quiz
    • 196 earned a SOA
    • 129 signature track
    • 128 posted to the forums

Some summary statistics

  • Classes considered are MBBC1, MBBC2, CDA, DA, Case, StatReas, fMRI, Tools, Rprog, Clean
  • A total of 776,691 students enrolled
  • 19 class offerings

  • Average of 40,878 students per class

  • Minimum class size of 6,742 for class MBBC2 offering 2

  • Maximum class size of 101,747 for class DA offering 1

Signature track

Drawing Drawing
  • Verification based on typing patterns (highly unique according to Coursera)
    • Secondary verification via webcams
  • Coursera gives student aid
  • Revenue distribution via agreement between Coursera and partner institution
  • 155 signature track for 20k students in MBBC1 V4
  • Much higher retention and completion rates for signature track
  • Between 1% and 12% Signature track rate for classes (depending on a variety of factors)
  • All of our classes are $49 for sig track

Interesting fallout from doing this

Positive things

  • Jeff's class was listed as one of the top 10 most enrolled Coursera classes
    • Roger's class has since eclipsed it in cumulative enrollment
  • Brian, Roger, Martin, John have run signature track ($49 per student)
  • Brian's class featured on the front page of the Washington Post
  • Roger interviewed on Anderson Cooper and NPR
  • Increase in PhD applications, many citing the Coursera efforts

Negative things

  • Lots of complaining on the forums
  • Uprisings
  • Negative reviews (internal and external)
  • Complaints
  • No internal mechanisms for support
  • Fatigue

MOOCs by university

Case studies


Case studies

Statistics, big data, data science / cost of training the workforce



Johhs Hopkins Data Science Specialization

Codirected and taught by Roger Peng, Jeff Leek and Brian Caffo




Specialization certificate


Unique aspects of the program

  • Completely redesigned stat curriculum
  • 9 signature track courses
  • 1 capstone project course
  • Total cost (modular) $490
    • $49 per sig track for 10 classes
  • Each class is four weeks
  • Quizzes, in video quizzes and peer assessment projects
  • Run monthly after initial rollout
  • All content open source
  • Toolbox, R Programming and Getting and Cleaning Data have all run (enrollments of 64k, 62, 34k resp)

Platform choices

  • Everything done on Coursera
  • All programming in R
  • All lecture notes done in Slidify (common theme)
  • All content open source
  • Version control through git and github
  • (Students will learn and use git)
  • RStudio as an IDE
  • knitr for reproducible documents and report writing

Standard and non-standard stat content

  • Basic probability and math stat
  • Statistical inference
    • Hypothesis tests, confidence intervals, likelihood
    • Brief intro to Bayesian analysis
  • Regression and generalized linear models
  • Statistical machine learning
  • EDA
  • Data analysis
  • Reproducible research, report generation
  • Presentations
  • Data cleaning, obtaining data
  • R Programming
  • Plotting, interactive graphics
  • Shiny, yhat

Shiny example John Muschelli's app

Rawan Allozi's algorithm

Rstudio is giving free hosted shiny apps for students in the concentration!

Cost comparison with other programs


A friendlier way to learn R...

Self-paced and interactive

Learn R and stats, together

swirl + Coursera

Get Coursera credit, automatically

Make your own interactive content