C and/or C++

Banshee

Saturday, December 22nd, 2007

BANSHEE is a toolkit for specifying program analyses as set or term constraint problems. From the web page:

BANSHEE is a toolkit that simplfies the task of building constraint-based program analyses. Program analyses are widely used in compilers and software engineering tools for discovering or verifying specific properties of software systems, such as type safety and opportunities for program optimization. To use BANSHEE, the analysis designer provides a short specification file describing the kinds of constraints used in the analysis. From this specification, BANSHEE builds a customized constraint resolution engine which solves those constraints very efficiently. BANSHEE also builds a customized interface for that engine which is easy to use.

CodeSurfer

Tuesday, October 5th, 2004

Mark Zarins of GrammaTech writes about the CodeSurfer tool, which is freely available for academic researchers:

CodeSurfer is an advanced source-code analysis platform for C/C++ that understands pointers, indirect function calls, and whole-program effects. GrammaTech is pleased to offer CodeSurfer at no cost to qualified academic computer science researchers.

CodeSurfer builds a dependence-graph program representation and provides both a GUI and an API for exploring this web. The dependence graph includes forward and backward links between each assignment statement and possible uses of the values stored by that assignment (data flow). Pointer analysis is used so that indirect loads and stores through pointers are taken into account, as well as indirect function calls. Dataflow analysis is used so that links between unrelated assignments and uses are excluded. Operations that highlight forward and backward slices show the impact of a given statement on the rest of the program (forward slicing), and the impact of the rest of a program on a given statement (backward slicing). Operations that highlight paths between nodes in the dependence graph (chops) show ways in which the program points are interdependent (or independent).

CodeSurfer’s scripting language, which provides access to the dependence graph program representation, can be used to build batch program-analysis applications, or to integrate CodeSurfer with other tools. The following program representations are calculated:

  • Abstract Syntax Trees (ASTs) [with a pattern matching API]
  • Preprocessor Expansions and Include Trees
  • Points-To and Pointed-To-By Sets
  • Control Flow Graphs (CFGs)
  • Call Multi-Graphs [direct & indirect]
  • Def, Use, and Conditional-Kill Sets [per statement]
  • Non-Local Def and Use Sets [per procedure]
  • Control and Data Dependences (PDGs) [per statement]
  • Transitive in/out data dependences [per procedure]

More information about the programming API can be found here:
http://www.grammatech.com/products/codesurfer/overview_prog.html

More information about the CodeSurfer Academic Program can be found here:
http://www.grammatech.com/products/codesurfer/academic.html

Important notes: Under the academic program, use of CodeSurfer is limited to non-profit use within academic units only. In addition, although you may use CodeSurfer to implement source-to-source transformations, you may not use it to implement a compiler, i.e., a direct translation to machine code.

ICD-C Compiler framework

Thursday, June 17th, 2004

Thanks to Jörg Eckart for a pointer to the ICD-C Compiler framework, which provides, among other things, a C99 parser, a high-level intermediate representation, control and data flow analyses, call graph analyses, and a framework for analyses spanning multiple compilation units.

ACE CoSy

Thursday, May 6th, 2004

Thanks to Joseph van Vlijmen for a pointer to ACE CoSy, a C and C++ compiler infrastructure for conventional, DSP, VLIW, and experimental processors and microcontrollers. From the web page:

CoSy is the highly flexible, easy-targetable compiler development system from ACE Associated Compiler Experts, which has been successfully deployed by over 40 industrial customers and partners world-wide, creating high-quality, high-performance compilers for a broad spectrum of DSP, NPU, RISC, VLIW and 8/16/32-bit microcontroller-architectures. Based upon its highly modular design, surrounding a generic, extensible intermediate representation (IR) and extensive use of generators, the CoSy environment enables construction of production-quality performance compilers in a highly efficient manner, reducing time-to-market, time-to-performance and development and maintenance costs. CoSy’s DSP-C language extensions allow DSP compiler developers to address specific characteristics of the target architecture and generate optimal code. In addition, CoSy’s configurability and retargetability make it a particularly effective environment for exploration of compiler effects on possible architecture variations, thus enabling true HW/SW co-design.

Elkhound: A GLR Parser Generator

Friday, April 23rd, 2004

Thanks to David Wagner for a pointer to Elkhound, which is a GLR (Generalized LR) parser generator. It can generate parsers in C++ and OCaml, and includes Elsa, a grammar for C++.

DMS Software Reengineering Toolkit

Sunday, April 18th, 2004

Thanks to Ira Baxter for a pointer to the DMS Software Reengineering Toolkit. According to the web page,

The DMS Software Reengineering Toolkit is a set of tools for automating customized source program analysis, modification or translation or generation of software systems, containing arbitrary mixtures of languages (”domains”). The term “software” for DMS is very broad and covers any formal notation, including programming languages, markup languages, hardware description languages, design notations, data descriptions, etc.

Predefined frontends exist for many programming, hardware, and markup languages, including Ada, C, C++, C#, Cobol, Fortran, HTML, IDL, Java, Mathematica, Matlab, Motorola 68k assembly, Pascal, PHP, Verilog, VHDL, Visual Basic, and XML; it is apparently also possible to define new frontends.

Compiler Generator Coco/R

Thursday, April 15th, 2004

Thanks to Mykola Rabchevskiy for a pointer to Coco/R, a scanner/parser generator from the University of Linz. It generates source for C# and Java; there are also (apparently unsupported) versions that generate code for Oberon, Pascal, Modula-2, C, C++, Delphi, and Unicon. Coco/R generates recursive descent parsers and scanners from an attributed grammar and is distributed under the GNU GPL.

iburg, A Tree Parser Generator

Thursday, April 15th, 2004

iburg is most often used as a code-generator generator. From the web page:

iburg is a program that generates fast tree parsers for cost-augmented tree grammars. iburg is useful for writing code generators and for teaching computer science compiler courses.

SableVM

Thursday, April 15th, 2004

SableVM is a free/Free spec-compliant and extensible JVM. It includes a JIT for PPC, SPARC and x86, but runs on several more architectures. It supports several different interpreter dispatch models (switched, threaded, inlined) and has an efficient runtime system. It is implemented in C with extensive use of M4 macros.

treecc

Friday, April 9th, 2004

treecc is a tool for building programs that operate on trees (as such, it does not fit neatly into the two “Tools:” categories I have placed it in). It will generate code for C, C++, C#, and Java, and is notable for its aspect-oriented approach to compiler construction.

EDG C++, Java, and Fortran front-ends

Friday, April 9th, 2004

The Edison Design Group makes a standards-compliant C++ front end, as well as Java and Fortran front-ends. While their primary market is compiler and program transformation tool vendors, apparently they are willing to license these to university researchers on a case-by-case basis. (per their FAQ.)

SUIF

Friday, April 9th, 2004

The venerable SUIF system is “a free infrastructure designed to support collaborative research in optimizing and parallelizing compilers.” It is fairly straightforward to write your own passes on the SUIF IR in C++. Frontends exist for C, C++, Fortran, and Java, and SUIF can interoperate with Zephyr; there are MIPS and C backends.

ANTLR Parser Generator and Translator Generator Home Page

Thursday, April 1st, 2004

The ANTLR Parser Generator and Translator Generator is a well-regarded tool to aid in generating compiler frontends in Java, C# or C++. From the web page:

ANother Tool for Language Recognition, (formerly PCCTS) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing Java, C#, or C++ actions. ANTLR provides excellent support for tree construction, tree walking, and translation.

Thanks to Mulhern for the suggestion.

WARTS — the Wisconsin Architectural Research Tool Set

Thursday, April 1st, 2004

Thanks to Manoj Plakal for a link to WARTS, a toolset developed at Wisconsin. WARTS includes the Executable Editing Library (EEL), which enables analysis and modification of SPARC Solaris executable programs using a C++ API. Distributed with the library are two tools built on EEL — a path profiler and an instruction count profiler. The path profiler is based on the technique described in Efficient Path Profiling by Thomas Ball and James Larus (ps link).

WARTS also includes a framework for memory system simulators, a cache profiler, two cache simulators, and the Wisconsin Wind Tunnel simulator.

CIL — Infrastructure for C Program Analysis and Transformation

Thursday, April 1st, 2004

CIL is a tool and library that inputs C source code and outputs simplified C that is more amenable to program analysis. It can generate CFGs and can perform several analyses and transformations out of the box. See here for more details; some notables follow:

  • points-to analysis
  • various buffer-overrun protection transformations
  • transform all subprograms to have at most one return statement
  • conversion of switch and continue constructs to simple branches
  • partial evaluation and constant folding
  • conversion of C code to three-address code

FrontC — C front-end for OCaml

Thursday, April 1st, 2004

FrontC is a C front-end written in OCaml. From the web page:

FrontC is an OCAML library providing a C parser and lexer. The result is a
syntactic tree easy to process with usual OCAML tree management.

It provides support for ANSI C syntax, old-C K&R style syntax and the standard
GNU CC attributes.

It provides also a C pretty printer as an example of use.

ckit — tool for source-to-source translations of C code

Thursday, April 1st, 2004

Thanks to Suan Yong for a link to ckit, which is a Standard ML library for parsing C code into an AST. Suan also mentions that ckit does not include a CFG generator, and requires preprocessed code.

Some documents on using GCC as a backend

Thursday, April 1st, 2004

Compilation of Functional Programming Languages using GCC is Andreas Bauer’s master’s thesis, which especially treats implementing tail calls in a compiler that uses C and GCC as a backend.

The Mercury Project has several papers on compiling Mercury to C: Compiling logic programs to C using GNU C as a portable assembler (.ps.gz), Code generation for Mercury (.ps.gz) and Compiling Mercury to high-level C code (.ps.gz).

The GNU Compiler Collection

Thursday, April 1st, 2004

Manoj Plakal points out that I have not mentioned GCC. GCC, of course, has frontends for many languages, including C, C++, Objective-C, Java, Fortran, Pascal, and Ada; and backends for nearly every computer architecture ever created as well as a great many that weren’t. It has a reputation for being difficult to use for research, but that hasn’t stopped many people from doing so.

Feel free to TrackBack this entry if you’re using GCC for programming languages research, or if you have tips for using GCC for programming languages research.

DAISY — Architecture Emulation thru Dynamic Compilation

Thursday, April 1st, 2004

DAISY is a dynamic recompiler that translates PowerPC executables into VLIW code on-the-fly, a page at a time, following program execution. It is implemented in C, runs on AIX/PPC machines, and includes a simulator for the DAISY VLIW architecture. Apparently, it can deal with self-modifying code, precise exceptions, and other thorny issues.

Scale

Wednesday, March 31st, 2004

Scale is a compiler from the ALI group at Massachusetts. Scale is a modular compiler that provides frontends for C, Java, and Fortran, and a backend that produces C. Scale supports alias analysis (including implementations of Shapiro-Horwitz, Stensgaard, and a simple algorithm), SSA, and a battery of scalar optimizations (redundancy elimination, value numbering, etc.). Scale uses an IR called Scribble and supports annotating the IR with information for additional passes. It is implemented in Java.

Open64 and derivatives

Wednesday, March 31st, 2004

Open64 (formerly Pro64) is an open-source compiler for C++ and Fortran, targeting IA-64. It uses an intermediate language called WHIRL, and includes “inter-procedural analysis and optimizations, loop-nest optimizations, scalar global optimizations, and code generation.” The mailing list for Open64 developers seems to get between 10-30 messages a month.

The CAPSL group at Delaware has developed a compiler based on Open64 called Kylin, which targets Intel’s XScale architecture. Unfortunately, as of this writing, the source code for Kylin is not available.

Intel and the Chinese Academy of Sciences have collaborated to produce the Open Research Compiler (ORC). ORC is a derivative of Open64; it includes more advanced IA-64 optimizations and has been refactored in order to “facilitate future research.” It includes SSA, region-based compilation, and edge, value, and memory profiling.

Open Runtime Platform

Wednesday, March 31st, 2004

The Open Runtime Platform is a modular virtual machine infrastructure, enabling creation of JIT and GC components that are not tightly-coupled and can be swapped out. It is implemented in C and C++ and runs on IA-32 Linux and Windows. Papers about the JIT and GC strategies that it includes are available from Intel.

Zephyr Compiler Infrastructure

Wednesday, March 31st, 2004

Zephyr Compiler Infrastructure provides a means to define an intermediate representation and write passes on it in several languages; it also provides a hardware description language to power a code-generator generator. From the web site:

If you describe your intermediate forms using Zephyr’s Abstract Syntax Description Language (ASDL), we can generate data-structure definitions in C, C++, Java, Standard ML, and Haskell. Your IR can be serialized on disk and freely exchanged among compiler passes written in these languages…

[Zephyr] generate[s] the machine-dependent parts from descriptions of instructions’ semantics, of binary representations, or of other properties. Zephyr’s Computer Systems Description Languages (CSDL) let you describe as much or as little as you need for your application.

Zephyr also seems to provide a reasonable set of built-in optimizations.

Trimaran

Wednesday, March 31st, 2004

Trimaran consists of a C frontend and a composable series of back-end passes, targeting an ISA called HPL-PD. Trimaran also includes a parameterizable simulator for HPL-PD. It is targeted towards research in compiling for explicitly-scheduled ILP architectures.

FLEX compiler infrastructure

Wednesday, March 31st, 2004

The FLEX compiler infrastructure transforms Java programs and is implemented in Java. It specifically targets embedded applications and has MIPS, StrongARM, and C backends. FLEX supports region-based memory allocation, multiple thread models, and optimization for space.

The LLVM Compiler Infrastructure Project

Wednesday, March 31st, 2004

The LLVM Compiler Infrastructure Project is a compiler infrastructure and virtual machine designed to enable compile-time, link-time, run-time, and offline optimization of code. It has front-ends for C/C++ and provides a typed SSA RISC-like instruction set. It is implemented in standard C++ (the FAQ page says “with heavy use of STL”).