Ariel ("Ari") RabkinSoftware Engineer
I currently am a software engineer at Cloudera, helping build tools and services for the support organization. My background is as a researcher who works on techniques for designing, debugging, and configuring large software systems. My work especially focuses on the software systems used to support "big data" processing; I've made a number of contributions to Hadoop and related software systems. My research has been published in both systems and software engineering venues.
I've been fortunate to work with some amazing people along the way. Before starting at Cloudera, I was a postdoctoral researcher at Princeton, working with Mike Freedman. Before that, I received my PhD from UC Berkeley in 2012, working in the AMP lab advised by Randy Katz.
The best way to contact me is email to asrabkin at gmail.com
This is an annotated list of things I've worked on, with links to papers. For a straight-up publication list, see below.
I and the folks in Mike Freedman's group at Princeton built a system called JetStream, for wide-area stream processing. The vision was to build a practical useful system for this new and emerging area. An abstract about the system appeared at LADIS 2012. A workshop paper about the overall vision appeared at HotOS '13. The full paper was at NSDI 2014. [slides]
Fellow Berkeleyan Leo Meyerovich and I have been looking at social influences on programming language adoption. Most of the visualizations and associated materials are available here. A vision paper appeared at ONWARD 2012 (the "new ideas" track associated with SPLASH/OOPSLA). Our results appeared at OOPSLA 2013. We described our methodological challenges at PLATEAU 2012.
My dissertation studied using program analysis to understand software configuration. I particularly looked at two major applications: explaining errors in terms of configuration, and "configuration spellcheck". Error explanation is the problem of finding the option that most usefully explains a failure, given a program and an error message. Configuration spellcheck is the problem of catching configuration errors before the program ever runs.
A paper about automatically finding and classifying options (and hence Configuration Spellcheck) appeared at ICSE 2011. A followup paper about inferring the root cause of error messages appeared at ASE 2011. This work was the core of my dissertation, "Using Program Analysis to Reduce Misconfiguration in Open Source Systems Software".
My configuration debugging work used JChord, a program analysis framework being primarily developed by Mayur Naik, formerly of Intel Research Berkeley and now of Georgia Tech.
If you're interested in using the Configuration Spellchecker, you should check out the JChord SVN repository and look in the
I'm one of the lead developers for Chukwa, an open-source log collection annd monitoring system. Chukwa was first started while I was working at Yahoo! It's currently an Apache Software Foundation incubation project. It's in use at several companies, including CBS Interactive and Selective Media.
I've also done some work on the question of what to log. I published a paper about graphical representation for log structure at SLAML 2010, the workshop on managing systems via system log analysis and machine learning.
I was a coauthor of Above the Clouds: A Berkeley View of Cloud Computing. This was a white paper written by the RAD Lab faculty and a number of the systems graduate students. It's gotten a great deal of response, and on the whole, we've been very happy with it. A version of this paper later appeared in CACM.
Some while back, I did some work on bank security questions. The paper, published at SOUPS '08, is available here. The tagged data supporting the paper is available [in a gzipped archive] here. Slides for my conference talk are available as PDFs and also in Power Point format.
I have a total of 8 semesters of TA experience (6 at Cornell as an undergraduate and master's student, 2 at UC Berkeley. I helped teach operating systems at Cornell, and algorithms and machine organization at Berkeley.
As a teaching assistant, I helped with the overhaul of UC Berkeley's lower-division computer organization course (CS 61c) to refocus it on parallelism. A paper about the MapReduce unit of the course appeared at SIGCSE 2012, with a longer version appearing in ACM Transactions on Computing Education.
Ariel Rabkin is a researcher interested in techniques for building and debugging complex software systems. He is currently a postdoctoral researcher at Princeton University. He received his PhD in Computer Science from UC Berkeley in May 2012. He is also professionally interested in security and cloud computing. He previously attended Cornell University (AB 2006, MEng 2007). He is a contributor to several open source projects, including Hadoop, the Chukwa log collection framework, and the JChord program analysis toolset.