< Introduction
Writing PPXs >

How It Works

General Concepts

The Driver

ppxlib sits in between the PPXs authors and the compiler toolchain. For the PPX author, it provides an API to define the transformation and register it to ppxlib. Then, all registered transformations can be turned into a single executable, called the driver, that is responsible for applying all the transformations. The driver will be called by the compiler.

The PPX authors register their transformations using the Driver.register_transformation function, as explained in the Writing PPXs section. The different arguments of this function corresponds to the different kinds of PPXs supported by ppxlib or the phase, at which time they will be executed.

The driver is created by calling either Driver.standalone or Driver.run_as_ppx_rewriter. Note that when used through Dune, none of these functions will need to be called by the PPX author. As we will see, Dune will be responsible for generating the driver after all required PPXs from different libraries have been registered. These functions will interpret the command line arguments and start the rewriting accordingly.

The Driver.standalone function creates an executable that parses an OCaml file, transforms it according to the registered transformations, and outputs the transformed file. This makes it suitable for use with the -pp option of the OCaml compiler. It is a preprocessor for sources and is standalone in the sense that it can be called independently from the OCaml compiler (e.g., it includes an OCaml parser).

On the other hand, the Driver.run_as_ppx_rewriter-generated driver is a proper PPX, as it will read and output a Parsetree marshalled value directly. This version is suitable for use with the -ppx option of the OCaml compiler, as well as any tool that requires control of parsing the file. For instance, Merlin includes an OCaml parser that tries hard to recover from errors in order to generate a valid AST most of the time.

Several arguments can be passed to the driver when executing it. Those arguments can also be easily passed using Dune, as explained in its manual. PPX authors can add arguments to their generated drivers using Driver.add_arg. Here are the default arguments for respectively standalone and run_as_ppx_rewriter generated drivers:

Standalone driver
driver.exe [extra_args] [<files>]
  -as-ppx                     Run as a -ppx rewriter (must be the first argument)
  --as-ppx                    Same as -as-ppx
  -as-pp                      Shorthand for: -dump-ast -embed-errors
  --as-pp                     Same as -as-pp
  -o <filename>               Output file (use '-' for stdout)
  -                           Read input from stdin
  -dump-ast                   Dump the marshaled ast to the output file instead of pretty-printing it
  --dump-ast                  Same as -dump-ast
  -dparsetree                 Print the parsetree (same as ocamlc -dparsetree)
  -embed-errors               Embed errors in the output AST (default: true when -dump-ast, false otherwise)
  -null                       Produce no output, except for errors
  -impl <file>                Treat the input as a .ml file
  --impl <file>               Same as -impl
  -intf <file>                Treat the input as a .mli file
  --intf <file>               Same as -intf
  -debug-attribute-drop       Debug attribute dropping
  -print-transformations      Print linked-in code transformations, in the order they are applied
  -print-passes               Print the actual passes over the whole AST in the order they are applied
  -ite-check                  (no effect -- kept for compatibility)
  -pp <command>               Pipe sources through preprocessor <command> (incompatible with -as-ppx)
  -reconcile                  (WIP) Pretty print the output using a mix of the input source and the generated code
  -reconcile-with-comments    (WIP) same as -reconcile but uses comments to enclose the generated code
  -no-color                   Don't use colors when printing errors
  -diff-cmd                   Diff command when using code expectations (use - to disable diffing)
  -pretty                     Instruct code generators to improve the prettiness of the generated code
  -styler                     Code styler
  -output-metadata FILE       Where to store the output metadata
  -corrected-suffix SUFFIX    Suffix to append to corrected files
  -loc-filename <string>      File name to use in locations
  -reserve-namespace <string> Mark the given namespace as reserved
  -no-check                   Disable checks (unsafe)
  -check                      Enable checks
  -no-check-on-extensions     Disable checks on extension point only
  -check-on-extensions        Enable checks on extension point only
  -no-locations-check         Disable locations check only
  -locations-check            Enable locations check only
  -apply <names>              Apply these transformations in order (comma-separated list)
  -dont-apply <names>         Exclude these transformations
  -no-merge                   Do not merge context free transformations (better for debugging rewriters). As a result, the context-free transformations are not all applied before all impl and intf.
  -cookie NAME=EXPR           Set the cookie NAME to EXPR
  --cookie                    Same as -cookie
  -deriving-keep-w32 {impl|intf|both}
                               Do not try to disable warning 32 for the generated code
  -deriving-disable-w32-method {code|attribute}
                               How to disable warning 32 for the generated code
  -type-conv-keep-w32 {impl|intf|both}
                               Deprecated, use -deriving-keep-w32
  -type-conv-w32 {code|attribute}
                               Deprecated, use -deriving-disable-w32-method
  -deriving-keep-w60 {impl|intf|both}
                               Do not try to disable warning 60 for the generated code
  -unused-code-warnings {true|false|force}
                               Allow ppx derivers to enable unused code warnings (default: false)
  -unused-type-warnings {true|false|force}
                               Allow unused type warnings for types with [@@deriving ...] (default: false)
  -help                        Display this list of options
  --help                       Display this list of options

and

Ppx rewriter driver
driver.exe [extra_args] <infile> <outfile>
  -loc-filename <string>      File name to use in locations
  -reserve-namespace <string> Mark the given namespace as reserved
  -no-check                   Disable checks (unsafe)
  -check                      Enable checks
  -no-check-on-extensions     Disable checks on extension point only
  -check-on-extensions        Enable checks on extension point only
  -no-locations-check         Disable locations check only
  -locations-check            Enable locations check only
  -apply <names>              Apply these transformations in order (comma-separated list)
  -dont-apply <names>         Exclude these transformations
  -no-merge                   Do not merge context free transformations (better for debugging rewriters). As a result, the context-free transformations are not all applied before all impl and intf.
  -cookie NAME=EXPR           Set the cookie NAME to EXPR
  --cookie                    Same as -cookie
  -help                       Display this list of options
  --help                      Display this list of options

Exception handling

In general, raising an exception in a registered transformation will make the ppxlib driver crash with an uncaught exception error. However, when spawned with the -embed-errors or -as-ppx flags (that's the case when Merlin calls the driver) the ppxlib driver still handles a specific kind of exception: Located exceptions. They have type Location.Error and contain enough information to display a located error message.

During its execution, the driver will run many different rewriters. In the case described above, it will catch any located exception thrown by a rewriter. When catching an exception, it will collect the error in a list, take the last valid AST (the one that was given to the raising rewriter) and continue its execution from there.

At the end of the rewriting process, the driver will prepend all collected errors to the beginning of the AST, in the order in which they appeared.

The same mechanism applies for the context-free rewriters: if any of them raises, the error is collected, the part of the AST that the rewriter was responsible to rewrite remains unmodified, and the context-free phase continues.

Cookies

Cookies are values that are passed to the driver via the command line, or set as side effects of transformations, which can be accessed by the transformations. They have a name to identify them and a value consisting of an OCaml expression. The module to access cookies is Driver.Cookies.

Integration With Dune

The Dune build system is well integrated with the ppxlib mechanism of registering transformations. In every dune file, Dune will read the set of PPXs that are to be used (i.e. the PPXs in `(preprocess (pps <list of PPXs that are to be used>))`). For a given set of rewriters, it will generate a driver using Driver.run_as_ppx_rewriter that contains all registered transformations. Using a single driver for multiple transformations from multiple PPXs ensures better composition semantics and improves the speed of the combined transformations. Moreover, ppxlib communicates with Dune through .corrected files to allow for promotion, for instance when using [@@deriving_inline]. A PPX author can also generate its own promotion suggestion using the Driver.register_correction function.

Compatibility With Multiple OCaml Versions

One of the important issues with working with the Parsetree is that the API is not stable. For instance, in the OCaml 4.13 release, the following two changes were made to the Parsetree type. Although they are small changes, they may break any PPX that is written to directly manipulate the (evolving) type.

This instability causes an issue with maintenance. PPX authors wish to maintain a single version of their PPX, not one per OCaml version, and ideally not have to update their code when an irrelevant (for them) field is changed in the Parsetree.

ppxlib helps to solve both issues. The first one, having to maintain a single PPX version working for every OCaml version, is done by migrating the Parsetree. The PPX author only maintains a version working with the latest version, and the ppxlib driver will convert the values from one version to another.

For example, say a deriver is applied in the context of OCaml 4.08. After the 4.08 Parsetree has been given to it, the ppxlib driver will migrate this value into the latest Parsetree version, using the Astlib module. The "latest" here depends on the version of ppxlib, but at any given time, the latest released version of ppxlib will always use the latest released version of the Parsetree.

After the migration to the latest Parsetree, the driver runs all transformations on it, which ends with a rewritten Parsetree of the latest version. However, since the context of rewriting is OCaml 4.08 (in this example), the driver needs to migrate back the rewritten Parsetree to an OCaml 4.08 version. Again, ppxlib uses the Astlib module for this migration. Once the driver has rewritten the AST for OCaml 4.08, the compilation can continue as usual.

Context-Free Transformations

ppxlib defines several kinds of transformations whose core property is that they can only read and modify the code locally. The parts of the AST given to the transformation are only portions of the whole AST. In this regard, they are usually called context-free transformations. While being not as general-purpose as plain AST transformations, they are more than often sufficient and have many nice properties such as a well-defined semantics for composition. The two most important context-free transformations are derivers and extenders.

Derivers

A deriver is a context-free transformation that, given a certain structure or signature item, will generate code to append after this item. The given code is never modified. A deriver can be very useful to generate values depending on the structure of a user-defined type, for instance a converter for a type to and from a JSON value. A deriver is triggered by adding an attribute to a structure or signature item. For instance, the folowing code:

type t = Int of int | Float of float [@@deriving yojson]

let x = ...

would be rewritten to:

type ty = Int of int | Float of float [@@deriving yojson]

let ty_of_yojson = ...
let ty_to_yojson = ...

let x = ...

Extenders

An extender is a context-free transformation that is triggered on extension nodes, and that will replace the extension node by some code generated from the extension node's payload. This can be very useful to generate values of a DSL using a more user-friendly syntax, e.g., to generate OCaml values from the JSON syntax.

For instance, the following code:

let json =
  [%yojson
    [ { name = "Anne"; grades = ["A"; "B-"; "B+"] }
    ; { name = "Bernard"; grades = ["B+"; "A"; "B-"] }
    ]
  ]

could be rewritten into:

let json =
  `List
    [ `Assoc
        [ ("name", `String "Anne")
        ; ("grades", `List [`String "A"; `String "B-"; `String "B+"])
        ]
    ; `Assoc
        [ ("name", `String "Bernard")
        ; ("grades", `List [`String "B+"; `String "A"; `String "B-"])
        ]
    ]

Advantages

There are multiple advantages of using context-free transformations. First, they provide the PPX user a much clearer understanding of the AST parts that will be rewritten, rather than a fully general AST rewriting. Secondly, they provide a much better composition semantic, which does not depend on the order. Finally, context-free transformations are applied in a single phase factorising the work for all transformations, resulting in a much faster driver than when combining multiple, whole AST transformations. More details on the execution of this phase are given in its dedicated section.

See the Writing PPXs section for how to define derivers and extenders.

The Execution of the Driver

The actual rewriting of the AST is done in multiple phases:

  1. The linting phase
  2. The preprocessing phase
  3. The first instrumentation phase
  4. The context-free phase
  5. The global transformation phase
  6. The last instrumentation phase

When registering a transformation through the Driver.register_transformation function, the phase in which the transformation has to be applied is specified. The multiplicity of phases is mostly to account for potential constraints on the execution order. However, most of the time there are no such constraints, and in this case, either the context-free or the global transformation phase should be used. (Note that whenever possible, which should be almost always, context-free transformations are possible and better.) If you register in another phase, be sure to know what you are doing.

The Linter Phase

Linters are preprocessors that take as input the whole AST and output a list of "lint" errors. Such an error is of type Driver.Lint_error.t and includes a string (the error message) and the location of the error. The errors will be reported as preprocessors warnings.

This is the first phase, so linting errors can only be reported for code handwritten by the user.

An example of a PPX registered in this phase is ppx_js_style.

The Preprocessing Phase

The preprocessing phase is the first transformation that actually alters the AST. In fact, the property of being the "first transformation applied" is what defines this phase, and ppxlib will thus ensure that only one transformation is registered in this phase; otherwise, it will generate an error.

You should only register a transformation in this phase if it is really strongly necessary, and you know what you are doing. Your PPX will not be usable at the same time as another one registering a transformation in this phase.

An example of a PPX registered in this phase is metapp.

The First Instrumentation Phase

This phase is for transformations that need to be run before the context-free phase. Historically, it was meant for instrumentation-related PPXs, hence the name. Unlike the preprocessing phase, registering to this phase provides no guarantee that the transformation is run early in the rewriting, as there is no limit in the number of transformations registered in this phase, which are then applied in the alphabetical order by their name.

If it is not crucial for a transformation to run before the context-free phase, it should be registered to the global transformation phase.

The Context-Free Phase

The execution of all registered context-free rules is done in a single top-down pass through the AST. Whenever the top-down pass encounters a situation that triggers rewriting, the corresponding transformation is called. For instance, when encountering an extension point corresponding to a rewriting rule, the extension point is replaced by the rule's execution, and the top-down pass continues inside the generated code. Similarly, when a deriving attribute is found attached to a structure or signature item, the result of the deriving rule’s application is appended to the AST, and the top-down pass continues in the generated code.

Note that the code generation for derivers is applied when "leaving" the AST node, that is when all rewriters have been run. Indeed, a deriver like this:

type t = [%my_type] [@@deriving deriver_from_type]

would need the information generated by the my_type extender to match on the structure of t.

Also note that in this phase, the execution of the context-free rules are intertwined altogether, and it would not make sense to speak about the order of application, contrary to the next phase.

The Global Transformation Phase

The global transformation phase is the phase where registered transformations, seen as function from and to the Parsetree, are run. The applied order might matter and change the outcome, but since ppxlib knows nothing about the transformations, the order applied is alphabetical by the transformation's name.

The Last Instrumentation Phase

This phase is for global transformation to escape the alphabetical order and be executed as a last phase. For instance, bisect_ppx needs to be executed after all rewriting has occurred.

Note that only one global transformation can be executed last. If several transformations rely on being the last transformation, it will be true for only one of them. Thus, only register your transformation in this phase if it is absolutely vital to be the last transformation, as your PPX will become incompatible with any other that registers a transformation during this phase.

< Introduction
Writing PPXs >