Introduction

CodeLogician helps your LLMs and AI developer assistants formally reason about the code they write by translating it first into math and then using powerful logic-based AI (automated reasoning) to analyze it for correctness (i.e. formal verification), generate test cases and much more. Under the hood, CodeLogician is an agent (implemented in LangGraph) that uses LLMs, the ImandraX automated reasoning engine and various other tools (e.g. static analyzers and vector databases). Users can interact with it via low-level commands guiding the entire reasoning process, tap into high-level agentic flows that automate the underlying steps and combine the two approaches. If agentic processes get “stuck” (for example when CodeLogician finds an inconsistency in your source program) it may request human feedback. Using the LangGraph also means that you can integrate CodeLogician into your own agent via its RemoteGraph API.

Your first neurosymbolic AI experience (or not!)

Before we dive deep into the details, let’s get you warmed up with a quick example. We’ll also use this as an opportunity to get you setup with CodeLogician VS Code extension!

We’ll work with this Python program:

def f (x: int) -> int:
  return x * x

# Verify that this function always returns a positive value and generate test cases for it
def g (x: int) -> int:
  if x < 0:
    return f(x)
  else:
    return x

Reminder: The demo will require a valid Imandra Universe API key (go to https://universe.imandra.ai to obtain one!)

Step 1: Install the CodeLogician and ImandraX extensions
Step 2: Paste the Python program above into a new file (call it simple.py)
Step 3: Click on the CodeLogician window button to Formalize this file. - this will create the formal model of the code in IML
Step 4: Once it’s done, you should be able to open up the resulting simple.iml - this is the mathematical model of the code written in IML. You can open it with the ImandraX extension
Step 5: Then, back in the CodeLogician extension click on the button ‘Extract reasoning requests’ - this will convert the comment above function g (that it should return a positive number) into a Verification Goal, a statement about the model expressed in IML
Step 6: Open once again the resulting file And there you have it - your first neurosymbolic AI experience!

To recap, you have just:

Written a bunch of code in Python (we’ll call this source program written in a source language going forward)
Used CodeLogician to turn it into a formal (mathematically precise) model (the process as autoformalization) with a verification goal in ImandraX
Used ImandraX to verify the goal to confirm it’s correct (this process is referred to as formal verification)
Asked CodeLogician to extract test cases from ImandraX analysis of the model state space

Behind the scenes many individual steps were performed, including gathering formalization information about your source code, which we’ll describe in detail later. CodeLogician was specifically designed to give you the ultimate flexibility when working with the agent - sometimes you want to provide specific guidance about a fragment of your program and at others you just want it to work automatically and ask you for help when it gets stuck. In the rest of this tutorial we’ll show you how this works.

Why you should be using CodeLogician

There’s a handful of reasons why combining the “statistical” and “symbolic” AI approaches will ultimately drive the scalability of AI-assisted software development. Fundamentally, by creating a formal logical model, you can answer questions with certainty, perform deep analysis and rigorously generate test cases. It automates a lot of the type of thinking about code you’d like to do, but it does it (unlike “pure” LLM-based techniques) with rigor and explanations! The process described in this tutorial and codified within CodeLogician comes from years of pioneering application of formal methods to large industrial projects.

More specifically, CodeLogician:

Helps you formalize the requirements - it forces you (or your agent/AI assistant) to be very precise about the code you’d like to write, be explicit about the assumptions and formulate properties the code should have (aka the specification)
Verifies (using ImandraX) deep properties about the code using mathematical logic, covering infinitely many possible inputs
Rigorously automates test case generation - considerably speeding up the process in a way that a human nor an LLM (with other statistical methods) can reproduce

Make sure you don’t miss something

Is this approach relevant for every bit of code that you write? Probably not, but it’s indispensable for critical logic, UI state transitions, etc. It’s also unmatched at test case generation - we’ll get to this later in detail. As an example let’s consider this case study we’ve done for the investment bank UBS, we demonstrated that if you encode the logic of how their dark pool matched orders (order priority calculation) the way they described it in their regulatory filings, then it would ultimately lead to violation of best execution principles in a very non-trivial way. This is the type of logical error that would not be detected by a human (the LLM would not be able to identify it either) and could cost many millions of dollars in fines and reputational costs. The full notebook is here: UBS Case Study.

The main bit of logic there - order_higher_ranked (side, o1, o2, mkt) - encodes the logic described in the document the bank sent to the Securities and Exchange Commission (“Form ATS”), the section on order ranking (the actual English prose is provided in the notebook) and it contains a fundamental flaw. This flaw deals with the basic condition that must be true for any ranking function - which must apply whether you’re ranking integers, characters or orders within an order book. This condition is called “transitivity” - e.g. “if a is greater than b, b is greater than c, then a must be greater than c!” If you were implementing a sorting algorithm (let’s say using “quicksort” in C++) and used a ranking function that violates this condition, the code would still run but the results would effectively be random. This would be especially problematic if done within a regulated environment.

We’ll encode the check for transitivity like this:

let rank_transitivity side o1 o2 o3 mkt =
    order_higher_ranked(side,o1,o2,mkt) &&
    order_higher_ranked(side,o2,o3,mkt)
    ==>
    order_higher_ranked(side,o1,o3,mkt)

rank_transitivity is an example of a Verification Goal, a property we will to verify of the model written in IML. In this case, it takes side of the order, three order values o1, o2, o3 and variable representing current market conditions (best bid/offer on the primary exchange) mkt. When ImandraX is presented with this VG, it tries to prove it but fails, because that property does not, in fact, hold true. So ImandraX produces a counterexample (shortened for brevity, please see the documentation page for the full example):

- : order_side -> order -> order -> order -> mkt_data -> bool = <fun>
module CX :
  sig
    val side : order_side
    val o1 : order
    val o2 : order
    val o3 : order
    val mkt : mkt_data
  end
Counterexample (after 0 steps, 0.028s):
let side : order_side = BUY
let o1 : order =
  {id = 9; peg = NO_PEG; client_id = 13; order_type = MARKET; qty = 7720;
   min_qty = 14; leaves_qty = 1; price = 1; time = 1; src = 10;
   order_attr = RESIDENT; capacity = Principal; category = C_ONE;
   cross_restrict =
   {cr_self_cross = false; cr_ubs_principal = false;
    cr_round_lot_only = false; cr_no_locked_nbbo = false;
    cr_pegged_mid_point_mode = 12; cr_enable_conditionals = false;
    cr_min_qty = false;
    cr_cat_elig =
    {c_one_elig = false; c_two_elig = false; c_three_elig = false;
     c_four_elig = false}};
   locate_found = false; expiry_time = 11}
… (* omitted *)
let mkt : mkt_data = {nbb = 8857; nbo = 8858; l_up = 10655; l_down = 8856}

ImandraX computed this counterexample from actually analyzing the logic and not by using LLM trained on similar examples, or some other statistical technique (e.g. QuickCheck). If it was actually true, then ImandraX would’ve produced a proof object that can be independently checked. Our documentation site has many similar examples where our reasoning engine uncovers deep problems that are virtually impossible for human to uncover.

Similar techniques have become bedrock in industries like hardware manufacturing - where having a bug in the design of the microprocessor is very expensive!

Rigorous way to generate test cases

If you write source code professionally, surely you’re familiar with writing test cases. CodeLogician will help you write “functional” tests - those seeking to ensure the implementation meets requirements. A natural quest when writing those is to “cover all edge cases” - here we’ll describe what that means from a mathematical (logical) perspective and how automated reasoning can help you with that.

One of the core features of ImandraX is Region Decomposition, ability to decompose behavior of models into a finite number of symbolic regions where each region contains a set of constraints on the inputs and a description of the outcome (called invariant result). That was a handful! Another way to think about it is exactly what we described at the top - ability to take a model and identify all distinct behaviors (or edge cases). An edge case corresponds to a “region” of the model’s state space. A state space corresponds to all possible inputs and outputs. When we define this “region” we do so by describing what the inputs will look like through a list of constraints. Each region then contains “invariant” result which describes what the model will output if the inputs satisfy that region’s constraints. When ImandraX generates these regions, it does so with mathematical precision. Furthermore, it generates a proof that when you combine all of those regions, you get behavior of the original function.

Consider this example of analyzing all of the edge cases of the SIX Swiss pricing logic (for a particular segment of the Swiss market): SIX Swiss Exchange Pricing Notebook. That code must process effectively (we come from the world of math, so we always qualify the word “infinite” :) infinitely many possible inputs, but even from a brief look, you can tell that there’s a finite number of edge cases. How do we analyze this efficiently?

This is where ImandraX’s region decomposition comes in - it’s symbolic analysis of the model that decomposes the behavior of the model into a finite number of regions - each region may contain infinitely many possible inputs. These “regions” are the edge case of the model. This technique is inspired by Cylindrical Algebraic Decomposition which we’ve lifted to algorithms at large.

For the SIX Swiss example above, ImandraX identified 44 different regions. Here’s one example:

Constraints:
- (List.hd ob.buys).order_qty = (List.hd ob.sells).order_qty
- (List.hd ob.buys).order_type = Market
- (List.hd ob.sells).order_type = Market
- (List.hd (List.tl ob.buys)).order_type <> Market
- ref_price >= (List.hd (List.tl ob.buys)).order_price
- (List.hd (List.tl ob.sells)).order_type <> Market
- ref_price > (List.hd (List.tl ob.sells)).order_price

Invariant:
- F = Known (List.hd (List.tl ob.sells)).order_price

The constraints and invariant are both in IML - here are the relevant functions that appear:

List.hd refers to the first element of the list (ob.buys is a list containing orders and ob is the type representing the order book)
List.tl refers to the list except for the first element. So, List.hd (List.tl ob.buys) is the second element of the list. If you’re using CodeLogician through a tool like Cursor, then it’s quite easy to convert these to English prose (or other formats and languages) or testing framework and language of your choice.

In a sense, generation of these regions is also generative AI, but the results are derived solely from the logic of the model - no statistics involved! If you’re using CodeLogician through a tool like Cursor, then it’s quite easy to convert these to English prose (or other formats).

These constraints have to be satisfied for the code to be in that specific region - this code is automatically generated by ImandraX, without any statistics involved - purely from the logical information encoded into the model. Invariant represents the outcome of calling that function on inputs that satisfy the constraints and together they define the region. ImandraX generates a proof that if you combine all of the regions together, you will recover the original behavior of the model.

CodeLogician works with ImandraX to extract regions (and corresponding sample points) to generate high-coverage test suites.

We’ll cover Region Decomposition more later, but you should hopefully already appreciate the value proposition of this approach - just that example would take many hours for humans to go through themselves (another questions is whether they’d be able to enumerate them at all). LLM would not be able to generate this by itself!

Application to vibe coding - “Verify your vibes!”

If you’re into “vibing”, then CodeLogician should be one of the tools at your disposal. With the rapidly rising popularity of coding assistants like Cursor and proliferation of MCP (Model Context Protocol), CodeLogician becomes an indispensable part of the developer’s process.

High-level overview of CodeLogician

The agent gathers information about what’s in your program, it then takes the information and generates a formal model (“model”) which will be required to think about this.

CodeLogician Process overview

All of the information about this process is contained within the CodeLogician state (“state”) - it is the data structure that contains the current source program, any refactoring information and much more (defined below).

State contains the following:

Source program - the original program you submitted
Formalization data - information about the program that will be used to perform formalization
The generated IML model
Model’s status (whether it was successfully admitted or if there’re errors, etc.)
Other artefacts like verification goals, decomposition requests, etc.

Then there are several functions that operate on the state which fall into 3 categories:

Accessing and changing the state:
- Change the state explicitly, for example updating the source program
Agentic flows:
- Performing the full formalization cycle by generating the
Low-level commands:
- Generating formalization from source program (identifying various mechanisms on how that could be done)
- Submitting the model to ImandraX and getting feedback
- Generating Verification Goals and Decomposition Requests

The objective of the agent is to get the model to be admitted into ImandraX, and an extra bonus is to make it executable (free of opaque functions, as discussed further below). If the model is admitted to ImandraX, then it means that there’s sufficient information available to eliminate any misunderstandings and other things.

With the process of refinement, you’re incrementally creating a model of your source code. The two phases of using ImandraX are as follows:

Model admitted to ImandraX - there are no missing definitions and type-checking passes
Model admitted ImandraX and executable - there are no opaque functions (either they’re fully implemented or substituted with approximations)

We’ll go into approximations in more detail later, but for now you can think of them as mocking functionality you don’t have access to (e.g. third party services or libraries) or computation that difficult to reason about (e.g. exp function).

One way to describe this process is model-based software engineering or, alternatively, working with a digital twin. But the idea is the same - you’re working with some non-trivial source code and you want to make sure you have implemented the specification correctly, all the edge cases covered, you can explain the behavior and generate intelligent test cases. Doing so directly in Python or source languages (e.g. Java) is very difficult directly (for various reasons beginning with their type systems), so a useful approach is to create a model of the code you’re implementing that you can reason abou mathematically. Think of using Matlab to simulate airplane control algorithm before implementing it as part of the actual physical airplane. With the help of LLMs, formal tools (e.g. static analyzers), we transform your code into a model that our reasoning engine, ImandraX, can reason about.

The interface to the agent is very flexible - you can ask it to perform the full process of model creation, verification and test case generation, or ask it to perform individual steps. This last part is especially useful when you’re using it via MCP server through a coding assistant like Cursor.

Can you use these models directly? Absolutely! The models are written in IML (more on this later) which is (mostly, with few directives) a subset of OCaml (www.ocaml.org). But because of their precision (type information, etc), we can translate them back to the source language pretty easily.

How do you know the model is right?

When creating a formal model of source code, a natural question arises: “How do you know the model is correct?” We think of model correctness from the perspective of functional equivalence. In other words, two programs are equivalent if they produce the same set of outputs for a given set of inputs. To generate this set of inputs and outputs in a rigorous way, we use region decomposition which we already described.