Instantly provision spark clusters

One typical challenge in using a distributed system like Apache Spark is to setup a cluster on demand. Spark requires a significant number of configuration steps. Repeating the same set of configurations time and again is inefficient.

AutoSpark provisions and configures Apache Spark clusters over common cloud providers like Amazon and Digital Ocean. We're working on supporting other platforms.


Automatically discover defects related to upgrading to different api versions

Often classes depend on libraries, which may break at any time. For example, an application deployed using libusb1.0.8 may break if a developer is trying to compile with a later version libusb1.0.9. Instead of passively waiting for issue reports to come in to discover this problem, an ApiMonkey will continuously monitor new versions of a library that a class is dependent on and attempt to build and test the application to check for breaking changes. If possible, it will attempt to patch itself to handle simple upgrade changes such as renamed function names or upgrade dependencies.


Automatically discover defects using state-of-the-art testing techniques

This project investigates ways in which automated test generation techniques can integrated into continuous integration. The advantage of such an approach is to minimize developer involvement in creating test cases while increasing discovery of defects. Several test generation techniques exist like Microsoft SAGE and Randoop but work as standalone tools that require and manual configuration and effort. Instead this project aims to automate the process of test generation and integrate generation, running and reporting of test cases with the build and deployment process.

The primary goal of the research project is to understand how automated test generation can adopted within continuous integration


Automatically create docker image from code repository

Docker allows services to be isolated in a standalone containers which can be provisioned and run with less overhead than full virtual machine images. Porting traditional applications to run within docker containers is not always so easy. Docker containers often require that the applications remain stateless. This means if two services needed to interact via the file system, then some degree of porting must occur.

DockerizeMe is a project to automatically create docker images from existing code repositories. We aim to support:

  • porting dependencies into the docker image.
  • handling simple cases of file system access mapping between two services.
  • support building this process automatically from a build server (Jenkins).

Massive Refactoring

Refactor all the things

Some software organizations will systematically refactor codebases across all the company, in order to accommodate changes to API functions, discourage problematic code (e.g., performance, security), simplify code, or fix bugs.

What tools and systems are needed to support such large-scale refactorings?

Commit Summarization and Change Idioms

How can we characterize and categorize different software changes?

Software projects undergo countless changes. Many times, these changes can be expressed more simply. Rather than showing 30 files, and 600 lines have been changed, sometimes it is easier to state that a function was simply renamed. Other times, changes may have interesting semantics attached. For example, if executing a block of code was blocking the user interface and a programmer added a thread for running that block of code, then we could give that change a standard name, such as "Encapsulate With Thread".

In general, what would a catalogue of changes look like? Can we use such a catalogue to produce a more natural language summary of software changes? If such a service to summarize software changes existed, then it could be widely used across different processes and workflows of an organization. For example, helping developers understand commits when performing code reviews or helping automated processes (refactoring agent) understand the contents of commits.