Tyro : a first step towards automatically generating parallel programs from sequential programs.

Access rights
Worldwide access
Journal Title
Journal ISSN
Volume Title

Currently, MapReduce is used as the standard for automatic parallelization of programs. However, MapReduce restricts programs to a simple framework with limited parallelism but still requires the user to understand parallelism within the framework. In this thesis, we present Tyro, a new tool that automatically translates a sequential Python program into a parallel PySpark program. Tyro identifies potential code fragments where parallelism can be done and translates them. It uses Abstract Syntax Trees (AST) for fragment detection and gradual program synthesis to convert the Python operations into PySpark operations. Tyro also verifies the generated code against given user test cases. We evaluated Tyro by automatically converting different real world sequential Python programs into PySpark programs. The resulting PySpark programs perform up to 9x faster (on 9 parallel machines) compared to the original. The promising result of Tyro against these benchmarks shows how Tyro can utilize gradual synthesis and operation translation to go beyond MapReduce with automatic parallelization.

Program synthesis. Parallel programming. PySpark. MapReduce. Distributed computing.