Analysing Machine Learning Models With Imandra

In this notebook we show how Imandra can be used to analyse and reason about models that have been learnt from data, an important and exciting topic bridging the gap between formal methods and machine learning (ML). Brief notes and some links are included, but for a fuller explanation (with code snippets included for reference) see our corresponding Medium post. You can also find all of our code for both learning and analysing our models on GitHub.

To illustrate this approach we'll be looking at (relatively simple) examples from two of the most common tasks within supervised learning (and ML more generally): classification and regression. In particular, we'll show how two of the most common kinds of model used to perform these tasks, random forests and neural networks, can be analysed using Imandra. For each task we'll use a real-world benchmark dataset from the UCI Machine Learning Repository and create our models using Python with some standard ML libraries.

We'll mostly be working with reals in this notebook so we'll start by installing a pretty printer so that we're not overrun with digits.

In [1]:
let pp_approx fmt r = CCFormat.fprintf fmt "%s" (Real.to_string_approx r) [@@program]
#install_printer pp_approx
Out[1]:
val pp_approx : Format.formatter -> real -> unit = <fun>

Classification

In a classification task we want to learn to predict the label of a datapoint based on previous data. In the classic Wisconsin Breast Cancer (Diagnostic) dataset) the task is to predict whether the cancer is benign or malignant based on the features of cell nuclei. In the dataset we have the following variables:

1. ID number
2. Diagnosis (malignant or benign)
3-32. Real values for the mean, standard error, and the 'worst' value for each cell nucleus'
      a) Radius
      b) Texture
      c) Perimeter
      d) Area
      e) Smoothness
      f) Compactness
      g) Concavity
      h) Concave points
      i) Symmetry
      j) Fractal dimension

As is standard practice we pre-process the data before learning. First we standardise each variable to have zero mean and unit variance, then remove all but one from sets of highly correlated variables, along with those that have low mutual information with respect to the target variable. The data is split into training (80%) and test (20%) sets and we use Scikit-Learn to learn a random forest of 3 decision trees of maximum depth 3. As this is a relatively straightforward problem even this simple model achieves a fairly high accuracy. Using a short Python script each tree is then converted to Imandra Modelling Language (IML) and can be reasoned about using Imandra.

In [2]:
let tree_0 (f_0 : real) (f_1 : real) (f_2 : real) (f_3 : real) (f_4 : real) (f_5 : real) (f_6 : real) = let open Real in
  if f_2 <=. (-0.10815) then
    if f_0 <=. (0.26348) then
      if f_6 <=. (-0.06176) then
        (236.0, 1.0)
      else
        (17.0, 5.0)
    else
      if f_3 <=. (-0.54236) then
        (8.0, 2.0)
      else
        (3.0, 7.0)
  else
    if f_6 <=. (0.09812) then
      if f_6 <=. (-0.17063) then
        (24.0, 0.0)
      else
        (4.0, 2.0)
    else
      if f_2 <=. (2.65413) then
        (6.0, 128.0)
      else
        (7.0, 5.0);;

let tree_1 (f_0 : real) (f_1 : real) (f_2 : real) (f_3 : real) (f_4 : real) (f_5 : real) (f_6 : real) = let open Real in
  if f_5 <=. (-0.05799) then
    if f_0 <=. (0.68524) then
      if f_1 <=. (-0.83180) then
        (110.0, 3.0)
      else
        (137.0, 0.0)
    else
      if f_3 <=. (0.45504) then
        (1.0, 8.0)
      else
        (0.0, 7.0)
  else
    if f_0 <=. (-0.18668) then
      if f_6 <=. (0.45214) then
        (39.0, 0.0)
      else
        (2.0, 11.0)
    else
      if f_6 <=. (-0.00009) then
        (8.0, 4.0)
      else
        (5.0, 120.0);;

let tree_2 (f_0 : real) (f_1 : real) (f_2 : real) (f_3 : real) (f_4 : real) (f_5 : real) (f_6 : real) = let open Real in
  if f_2 <=. (0.10459) then
    if f_5 <=. (-0.38015) then
      if f_5 <=. (-0.60659) then
        (139.0, 1.0)
      else
        (44.0, 3.0)
    else
      if f_6 <=. (-0.07927) then
        (38.0, 2.0)
      else
        (25.0, 17.0)
  else
    if f_6 <=. (0.46888) then
      if f_3 <=. (0.41642) then
        (28.0, 3.0)
      else
        (1.0, 4.0)
    else
      if f_2 <=. (1.74327) then
        (3.0, 122.0)
      else
        (4.0, 21.0);;

let rf (f_0, f_1, f_2, f_3, f_4, f_5, f_6) = let open Real in
let (a_0, b_0) = tree_0 f_0 f_1 f_2 f_3 f_4 f_5 f_6 in
let (a_1, b_1) = tree_1 f_0 f_1 f_2 f_3 f_4 f_5 f_6 in
let (a_2, b_2) = tree_2 f_0 f_1 f_2 f_3 f_4 f_5 f_6 in
let a = a_0 + a_1 + a_2 in
let b = b_0 + b_1 + b_2 in
(a, b);;
Out[2]:
val tree_0 :
  real -> real -> real -> real -> real -> real -> real -> real * real = <fun>
val tree_1 :
  real -> real -> real -> real -> real -> real -> real -> real * real = <fun>
val tree_2 :
  real -> real -> real -> real -> real -> real -> real -> real * real = <fun>
val rf : real * real * real * real * real * real * real -> real * real =
  <fun>

We can create a custom input type in Imandra for our model, so that we can keep track of the different features of our data.

In [3]:
type rf_input = {
  radius_mean : real;
  compactness_mean : real;
  concavity_mean : real;
  radius_se : real;
  compactness_worst : real;
  concavity_worst : real;
  concave_points_worst : real;
}
Out[3]:
type rf_input = {
  radius_mean : real;
  compactness_mean : real;
  concavity_mean : real;
  radius_se : real;
  compactness_worst : real;
  concavity_worst : real;
  concave_points_worst : real;
}

However, remember that we also processed our data before learning. To make things easier we'll add in a function applying this transformation to each input variable. Here we simply use some multiplicative and additive scaling values extracted during our data pre-processing stage. After that we can define a full model which combines these pre-processing steps and the random forest.

In [4]:
let process_rf_input input = let open Real in
let f_0 = (input.radius_mean          - 14.12729) / 3.52405 in
let f_1 = (input.compactness_mean     - 0.10434)  / 0.05281 in
let f_2 = (input.concavity_mean       - 0.08880)  / 0.07972 in
let f_3 = (input.radius_se            - 0.40517)  / 0.27731 in
let f_4 = (input.compactness_worst    - 0.25427)  / 0.15734 in
let f_5 = (input.concavity_worst      - 0.27219)  / 0.20862 in
let f_6 = (input.concave_points_worst - 0.11461)  / 0.06573 in
(f_0, f_1, f_2, f_3, f_4, f_5, f_6)

let process_rf_output c =
let (a, b) = c in
if a >. b then "benign" else "malignant"

let rf_model input = input |> process_rf_input |> rf |> process_rf_output
Out[4]:
val process_rf_input :
  rf_input -> real * real * real * real * real * real * real = <fun>
val process_rf_output : real * real -> string = <fun>
val rf_model : rf_input -> string = <fun>

As our model is fully executable we can both query it as well as find counterexamples, prove properties, apply logical side-conditions, decompose its regions, and more. As a quick sanity check to make sure everything is working, let's run a datum from our dataset through the model. In particular, we'll input (17.99, 0.2776, 0.3001, 1.095, 0.6656, 0.7119, 0.2654) which is classified as malignant in the data.

In [5]:
let x = {
  radius_mean = 17.99;
  compactness_mean = 0.2776;
  concavity_mean = 0.3001;
  radius_se = 1.095;
  compactness_worst = 0.6656;
  concavity_worst = 0.7119;
  concave_points_worst = 0.7119;
}

let y = rf_model x
Out[5]:
val x : rf_input =
  {radius_mean = 17.99; compactness_mean = 0.2776; concavity_mean = 0.3001;
   radius_se = 1.095; compactness_worst = 0.6656; concavity_worst = 0.7119;
   concave_points_worst = 0.7119}
val y : string = "malignant"

Great, just what we'd expect. Now we'll use Imandra to generate an example datapoint for us given that diagnosis is benign.

In [6]:
instance (fun x -> rf_model x = "benign")
Out[6]:
- : rf_input -> bool = <fun>
module CX : sig val x : rf_input end
Instance (after 0 steps, 0.021s):
let x : rf_input =
  {radius_mean = (-890472096653/500000000);
   compactness_mean = (-2926469793679/500000000);
   concavity_mean = (-224459910859/500000000); radius_se = 2282;
   compactness_worst = 9; concavity_worst = (-3044271784029/5000000000);
   concave_points_worst = (-653506988241/78125000)}
Instance
proof attempt
ground_instances:0
definitions:0
inductions:0
search_time:
0.021s
details:
Expand
smt_stats:
num checks:2
arith assert lower:7
arith tableau max rows:1
arith tableau max columns:39
arith pivots:2
rlimit count:3530
mk clause:63
datatype occurs check:25
mk bool var:148
arith assert upper:15
decisions:21
propagations:42
datatype accessor ax:28
arith num rows:1
datatype constructor ax:4
num allocs:7660708
final checks:1
added eqs:93
arith eq adapter:6
time:0.009000
memory:16.880000
max memory:17.440000
Expand
  • start[0.021s]
      let (_x_0 : real) = ….2 in
      let (_x_1 : (Real.t * Real.t)) = if _x_0 <= (-2163/20000) then … else …
      in
      let (_x_2 : (Real.t * Real.t))
          = if ….5 <= (-5799/100000) then … else …
      in
      let (_x_3 : (Real.t * Real.t)) = if _x_0 <= 10459/100000 then … else …
      in
      (if _x_1.0 +. _x_2.0 +. _x_3.0 > _x_1.1 +. _x_2.1 +. _x_3.1 then "benign"
       else "malignant")
      = "benign"
  • simplify

    into:
    let (_x_0 : real) = ( :var_0: ).concavity_mean in
    let (_x_1 : real) = ( :var_0: ).concave_points_worst in
    let (_x_2 : real) = ( :var_0: ).radius_se in
    let (_x_3 : real) = ( :var_0: ).radius_mean in
    let (_x_4 : (Real.t * Real.t))
        = if _x_0 <= 40089141/500000000
          then
            if _x_3 <= 7527903347/500000000
            then if _x_1 <= 8636759/78125000 then … else …
            else if _x_2 <= 636920371/2500000000 then … else …
          else
          if _x_1 <= 302648569/2500000000
          then if _x_1 <= 1033944901/10000000000 then … else …
          else if _x_0 <= 750968109/2500000000 then … else …
    in
    let (_x_5 : real) = ( :var_0: ).concavity_worst in
    let (_x_6 : (Real.t * Real.t))
        = if _x_5 <= 1300460631/5000000000
          then
            if _x_3 <= 8271055011/500000000
            then
              if ( :var_0: ).compactness_mean <= 30206321/500000000 then …
              else …
            else if _x_2 <= 166049107/312500000 then … else …
          else
          if _x_3 <= 6734710173/500000000
          then if _x_1 <= 721645811/5000000000 then … else …
          else if _x_1 <= 1146040843/10000000000 then … else …
    in
    let (_x_7 : (Real.t * Real.t))
        = if _x_0 <= 242844787/2500000000
          then
            if _x_5 <= 192883107/1000000000
            then if _x_5 <= 728215971/5000000000 then … else …
            else if _x_1 <= 1093995829/10000000000 then … else …
          else
          if _x_1 <= 181786853/1250000000
          then if _x_2 <= 2603237151/5000000000 then … else …
          else if _x_0 <= 569433711/2500000000 then … else …
    in
    (if _x_4.0 +. _x_6.0 +. _x_7.0 <= _x_4.1 +. _x_6.1 +. _x_7.1 then "malignant"
     else "benign")
    = "benign"
    expansions:
    []
    rewrite_steps:
      forward_chaining:
      • Sat (Some let x : rf_input = {radius_mean = (Q.of_string "-890472096653/500000000"); compactness_mean = (Q.of_string "-2926469793679/500000000"); concavity_mean = (Q.of_string "-224459910859/500000000"); radius_se = (Q.of_nativeint (2282n)); compactness_worst = (Q.of_nativeint (9n)); concavity_worst = (Q.of_string "-3044271784029/5000000000"); concave_points_worst = (Q.of_string "-653506988241/78125000")} )
      In [7]:
      CX.x
      
      Out[7]:
      - : rf_input =
      {radius_mean = -1780.94419331; compactness_mean = -5852.93958736;
       concavity_mean = -448.919821718; radius_se = 2282.; compactness_worst = 9.;
       concavity_worst = -608.854356806; concave_points_worst = -8364.88944948}
      

      This looks a bit funny however; notice how the unspecified input variables are unbounded in a way that doesn't make sense with respect to the data. In general we might only care about the performance of our model when some reasonable bounds are placed on the input (for example, the mean radius can't be negative, and if the values for this variable in our dataset range between 6.98 and 28.11 we wouldn't really expect any value greater than, say, 35). Using the description of each variable in the dataset we can form a condition describing valid and reasonable inputs to our model. In machine learning more generally, we are typically only interested in the performance and quality of a model over some particular distribution of data, which we often have particular prior beliefs about.

      In [8]:
      let is_valid_rf input =
        5.0 <=. input.radius_mean && input.radius_mean <=. 35.0 &&
        0.0 <=. input.compactness_mean && input.compactness_mean <=. 0.4 &&
        0.0 <=. input.concavity_mean && input.concavity_mean <=. 0.5 &&
        0.0 <=. input.radius_se && input.radius_se <=. 3.5 &&
        0.0 <=. input.compactness_worst && input.compactness_worst <=. 1.2 &&
        0.0 <=. input.concavity_worst && input.concavity_worst <=. 1.5 &&
        0.0 <=. input.concave_points_worst && input.concave_points_worst <=. 0.35
      
      instance (fun x -> rf_model x = "benign" && is_valid_rf x)
      
      Out[8]:
      val is_valid_rf : rf_input -> bool = <fun>
      - : rf_input -> bool = <fun>
      module CX : sig val x : rf_input end
      
      Instance (after 0 steps, 0.018s):
      let x : rf_input =
        {radius_mean = 32857980652003/2500000000000;
         compactness_mean = 8487976201/5000000000000;
         concavity_mean = 22890899511/2500000000000; radius_se = 40971/20000;
         compactness_worst = 321/5000;
         concavity_worst = 326968970979/50000000000000;
         concave_points_worst = 5259786231/781250000000}
      
      Instance
      proof attempt
      ground_instances:0
      definitions:0
      inductions:0
      search_time:
      0.018s
      details:
      Expand
      smt_stats:
      num checks:2
      arith assert lower:14
      arith tableau max rows:1
      arith tableau max columns:39
      arith pivots:3
      rlimit count:4512
      mk clause:63
      datatype occurs check:25
      mk bool var:162
      arith assert upper:22
      decisions:21
      propagations:42
      datatype accessor ax:28
      arith num rows:1
      datatype constructor ax:4
      num allocs:15468356
      final checks:1
      added eqs:93
      arith eq adapter:6
      time:0.009000
      memory:17.440000
      max memory:18.090000
      Expand
      • start[0.018s]
          let (_x_0 : real) = ….2 in
          let (_x_1 : (Real.t * Real.t)) = if _x_0 <= (-2163/20000) then … else …
          in
          let (_x_2 : (Real.t * Real.t))
              = if ….5 <= (-5799/100000) then … else …
          in
          let (_x_3 : (Real.t * Real.t)) = if _x_0 <= 10459/100000 then … else …
          in
          let (_x_4 : real) = ( :var_0: ).radius_mean in
          let (_x_5 : real) = ( :var_0: ).compactness_mean in
          let (_x_6 : real) = ( :var_0: ).concavity_mean in
          let (_x_7 : real) = ( :var_0: ).radius_se in
          let (_x_8 : real) = ( :var_0: ).compactness_worst in
          let (_x_9 : real) = ( :var_0: ).concavity_worst in
          let (_x_10 : real) = ( :var_0: ).concave_points_worst in
          ((if _x_1.0 +. _x_2.0 +. _x_3.0 > _x_1.1 +. _x_2.1 +. _x_3.1 then "benign"
            else "malignant")
           = "benign")
          && ((5 <= _x_4)
              && ((_x_4 <= 35)
                  && ((0 <= _x_5)
                      && ((_x_5 <= 2/5)
                          && ((0 <= _x_6)
                              && ((_x_6 <= 1/2)
                                  && ((0 <= _x_7)
                                      && ((_x_7 <= 7/2)
                                          && ((0 <= _x_8)
                                              && ((_x_8 <= 6/5)
                                                  && ((0 <= _x_9)
                                                      && ((_x_9 <= 3/2)
                                                          && ((0 <= _x_10)
                                                              && (_x_10 <= 7/20))))))))))))))
      • simplify

        into:
        let (_x_0 : real) = ( :var_0: ).concavity_mean in
        let (_x_1 : real) = ( :var_0: ).concave_points_worst in
        let (_x_2 : real) = ( :var_0: ).radius_se in
        let (_x_3 : real) = ( :var_0: ).radius_mean in
        let (_x_4 : (Real.t * Real.t))
            = if _x_0 <= 40089141/500000000
              then
                if _x_3 <= 7527903347/500000000
                then if _x_1 <= 8636759/78125000 then … else …
                else if _x_2 <= 636920371/2500000000 then … else …
              else
              if _x_1 <= 302648569/2500000000
              then if _x_1 <= 1033944901/10000000000 then … else …
              else if _x_0 <= 750968109/2500000000 then … else …
        in
        let (_x_5 : real) = ( :var_0: ).compactness_mean in
        let (_x_6 : real) = ( :var_0: ).concavity_worst in
        let (_x_7 : (Real.t * Real.t))
            = if _x_6 <= 1300460631/5000000000
              then
                if _x_3 <= 8271055011/500000000
                then if _x_5 <= 30206321/500000000 then … else …
                else if _x_2 <= 166049107/312500000 then … else …
              else
              if _x_3 <= 6734710173/500000000
              then if _x_1 <= 721645811/5000000000 then … else …
              else if _x_1 <= 1146040843/10000000000 then … else …
        in
        let (_x_8 : (Real.t * Real.t))
            = if _x_0 <= 242844787/2500000000
              then
                if _x_6 <= 192883107/1000000000
                then if _x_6 <= 728215971/5000000000 then … else …
                else if _x_1 <= 1093995829/10000000000 then … else …
              else
              if _x_1 <= 181786853/1250000000
              then if _x_2 <= 2603237151/5000000000 then … else …
              else if _x_0 <= 569433711/2500000000 then … else …
        in
        let (_x_9 : real) = ( :var_0: ).compactness_worst in
        ((if _x_4.0 +. _x_7.0 +. _x_8.0 <= _x_4.1 +. _x_7.1 +. _x_8.1
          then "malignant" else "benign")
         = "benign")
        && (5 <= _x_3) && (_x_3 <= 35) && (0 <= _x_5) && (_x_5 <= 2/5) && (0 <= _x_0)
        && (_x_0 <= 1/2) && (0 <= _x_2) && (_x_2 <= 7/2) && (0 <= _x_9)
        && (_x_9 <= 6/5) && (0 <= _x_6) && (_x_6 <= 3/2) && (0 <= _x_1)
        && (_x_1 <= 7/20)
        expansions:
        []
        rewrite_steps:
          forward_chaining:
          • Sat (Some let x : rf_input = {radius_mean = (Q.of_string "32857980652003/2500000000000"); compactness_mean = (Q.of_string "8487976201/5000000000000"); concavity_mean = (Q.of_string "22890899511/2500000000000"); radius_se = (Q.of_string "40971/20000"); compactness_worst = (Q.of_string "321/5000"); concavity_worst = (Q.of_string "326968970979/50000000000000"); concave_points_worst = (Q.of_string "5259786231/781250000000")} )
          In [9]:
          CX.x
          
          Out[9]:
          - : rf_input =
          {radius_mean = 13.1431922608; compactness_mean = 0.0016975952402;
           concavity_mean = 0.0091563598044; radius_se = 2.04855;
           compactness_worst = 0.0642; concavity_worst = 0.00653937941958;
           concave_points_worst = 0.00673252637568}
          

          This looks much better. Now let's move on to reasoning about our model in more interesting ways. One thing we can do is check the validity of certain constraints we might want our model to satisfy. For example, if the surface of a cell nucleus has many, large concave sections then is a particularly negative sign indicating that the cancer is likely to be malignant. We can use Imandra to easily verify that our model always classifies a sample of highly concave cells as malignant.

          In [10]:
          verify (fun x -> is_valid_rf x
                  && x.concavity_mean >=. 0.4
                  && x.concavity_worst >=. 1.0
                  && x.concave_points_worst >=. 0.25
                  ==> rf_model x = "malignant")
          
          Out[10]:
          - : rf_input -> bool = <fun>
          
          Proved
          proof
          ground_instances:0
          definitions:0
          inductions:0
          search_time:
          0.019s
          details:
          Expand
          smt_stats:
          num checks:2
          arith assert lower:30
          arith tableau max rows:1
          arith tableau max columns:40
          arith pivots:2
          rlimit count:4525
          mk clause:50
          mk bool var:171
          arith assert upper:16
          decisions:17
          propagations:28
          conflicts:2
          datatype accessor ax:28
          arith conflicts:2
          arith num rows:1
          datatype constructor ax:4
          num allocs:26253115
          added eqs:102
          del clause:4
          arith eq adapter:8
          time:0.009000
          memory:17.860000
          max memory:18.540000
          Expand
          • start[0.019s]
              let (_x_0 : real) = ( :var_0: ).radius_mean in
              let (_x_1 : real) = ( :var_0: ).compactness_mean in
              let (_x_2 : real) = ( :var_0: ).concavity_mean in
              let (_x_3 : real) = ( :var_0: ).radius_se in
              let (_x_4 : real) = ( :var_0: ).compactness_worst in
              let (_x_5 : real) = ( :var_0: ).concavity_worst in
              let (_x_6 : real) = ( :var_0: ).concave_points_worst in
              let (_x_7 : real) = ….2 in
              let (_x_8 : (Real.t * Real.t)) = if _x_7 <= (-2163/20000) then … else …
              in
              let (_x_9 : (Real.t * Real.t))
                  = if ….5 <= (-5799/100000) then … else …
              in
              let (_x_10 : (Real.t * Real.t)) = if _x_7 <= 10459/100000 then … else …
              in
              (5 <= _x_0)
              && ((_x_0 <= 35)
                  && ((0 <= _x_1)
                      && ((_x_1 <= 2/5)
                          && ((0 <= _x_2)
                              && ((_x_2 <= 1/2)
                                  && ((0 <= _x_3)
                                      && ((_x_3 <= 7/2)
                                          && ((0 <= _x_4)
                                              && ((_x_4 <= 6/5)
                                                  && ((0 <= _x_5)
                                                      && ((_x_5 <= 3/2)
                                                          && ((0 <= _x_6)
                                                              && (_x_6 <= 7/20)))))))))))))
              && ((_x_2 >= 2/5) && ((_x_5 >= 1) && (_x_6 >= 1/4)))
              ==> (if _x_8.0 +. _x_9.0 +. _x_10.0 > _x_8.1 +. _x_9.1 +. _x_10.1
                   then "benign" else "malignant")
                  = "malignant"
          • simplify

            into:
            let (_x_0 : real) = ( :var_0: ).radius_mean in
            let (_x_1 : real) = ( :var_0: ).compactness_mean in
            let (_x_2 : real) = ( :var_0: ).concavity_mean in
            let (_x_3 : real) = ( :var_0: ).radius_se in
            let (_x_4 : real) = ( :var_0: ).compactness_worst in
            let (_x_5 : real) = ( :var_0: ).concavity_worst in
            let (_x_6 : real) = ( :var_0: ).concave_points_worst in
            let (_x_7 : (Real.t * Real.t))
                = if _x_2 <= 40089141/500000000
                  then
                    if _x_0 <= 7527903347/500000000
                    then if _x_6 <= 8636759/78125000 then … else …
                    else if _x_3 <= 636920371/2500000000 then … else …
                  else
                  if _x_6 <= 302648569/2500000000
                  then if _x_6 <= 1033944901/10000000000 then … else …
                  else if _x_2 <= 750968109/2500000000 then … else …
            in
            let (_x_8 : (Real.t * Real.t))
                = if _x_5 <= 1300460631/5000000000
                  then
                    if _x_0 <= 8271055011/500000000
                    then if _x_1 <= 30206321/500000000 then … else …
                    else if _x_3 <= 166049107/312500000 then … else …
                  else
                  if _x_0 <= 6734710173/500000000
                  then if _x_6 <= 721645811/5000000000 then … else …
                  else if _x_6 <= 1146040843/10000000000 then … else …
            in
            let (_x_9 : (Real.t * Real.t))
                = if _x_2 <= 242844787/2500000000
                  then
                    if _x_5 <= 192883107/1000000000
                    then if _x_5 <= 728215971/5000000000 then … else …
                    else if _x_6 <= 1093995829/10000000000 then … else …
                  else
                  if _x_6 <= 181786853/1250000000
                  then if _x_3 <= 2603237151/5000000000 then … else …
                  else if _x_2 <= 569433711/2500000000 then … else …
            in
            not
            ((5 <= _x_0) && (_x_0 <= 35) && (0 <= _x_1) && (_x_1 <= 2/5) && (0 <= _x_2)
             && (_x_2 <= 1/2) && (0 <= _x_3) && (_x_3 <= 7/2) && (0 <= _x_4)
             && (_x_4 <= 6/5) && (0 <= _x_5) && (_x_5 <= 3/2) && (0 <= _x_6)
             && (_x_6 <= 7/20) && (_x_2 >= 2/5) && (_x_5 >= 1) && (_x_6 >= 1/4))
            || ((if _x_7.0 +. _x_8.0 +. _x_9.0 <= _x_7.1 +. _x_8.1 +. _x_9.1
                 then "malignant" else "benign")
                = "malignant")
            expansions:
            []
            rewrite_steps:
              forward_chaining:
              • Unsat

              The nested if ... then ... else statements in how the trees are defined mean that they are a prime candidate for Imandra's region decomposition functionality. As well as the total model we can of course also decompose the individual trees making up the ensemble.

              In [11]:
              Modular_decomp.top ~assuming:"is_valid_rf" "rf_model"
              
              Out[11]:
              - : Modular_decomp_intf.decomp_ref = <abstr>