Tyro : a first step towards automatically generating parallel programs from sequential programs.

Date

Access rights

Worldwide access

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Currently, MapReduce is used as the standard for automatic parallelization of programs. However, MapReduce restricts programs to a simple framework with limited parallelism but still requires the user to understand parallelism within the framework. In this thesis, we present Tyro, a new tool that automatically translates a sequential Python program into a parallel PySpark program. Tyro identifies potential code fragments where parallelism can be done and translates them. It uses Abstract Syntax Trees (AST) for fragment detection and gradual program synthesis to convert the Python operations into PySpark operations. Tyro also verifies the generated code against given user test cases. We evaluated Tyro by automatically converting different real world sequential Python programs into PySpark programs. The resulting PySpark programs perform up to 9x faster (on 9 parallel machines) compared to the original. The promising result of Tyro against these benchmarks shows how Tyro can utilize gradual synthesis and operation translation to go beyond MapReduce with automatic parallelization.

Description

Keywords

Program synthesis. Parallel programming. PySpark. MapReduce. Distributed computing.

Citation