Reproducible Data Science with Clojure

Welcome to Clojure for data science

This book is a published, more curated version of the workshop first presented live at Bobkonf 2025 in Berlin on March 14, 2025.

Abstract

Despite sophisticated tooling, data scientists still battle fundamental challenges day to day, like reproducibility, maintainability, and sharing their work. While traditional notebooks offer interactivity and quick feedback, they’re plagued with hidden state dependencies, version control complexity, and production deployment hurdles. Converting notebook-based analyses into production-ready code often requires extensive refactoring, untangling implicit dependencies, debugging hidden state issues, and deciphering sparse documentation. And that’s before tackling today’s reality of excessively large, unstructured data dumps typically lacking any metadata or explanation, making it difficult to find useful data in the first place.

Clojure’s data science ecosystem has been maturing rapidly in recent years. With it’s stable toolkit, immutable data structures, and functional paradigm, Clojure offers a compelling alternative to traditional data science workflows. Imagine knowing exactly which version of your code produced which dataset. Or seamlessly deploying the same code you used in an exploratory analysis to production. And imagine that code also ran in a state of the art literate programming environment, but also simultaneously in your own, familiar IDE.

This hands-on workshop will introduce a new way of thinking about working with data, demonstrating how Clojure’s libraries and tooling solve many pain points in current data science workflows.

Setup

Get started with the installation guide to get Clojure running locally.

source: bobkonf_2025/src/index.clj