Another way of encoding type identity for BuckleScript libraries without using big functor

October 16, 2019

Note this article is for library authors, it has something in depth which is not necessary for people who use BuckleScript at daily work.

When we build some generic data structure, abstract over function is not enough. For example, a type safe generic balanced AVL tree not only relies on the types of a comparison function, but also the identity of such function. Two balanced AVL trees which are initialized over same type of comparison function still can not be mixed.

module Eq1 = {
  let eq = (x, y) => x == y;
};

module Eq1 = struct 
  let eq x y = x = y
end

module Eq2 = {
  let eq = (x, y) => x == y;
};

module Eq2 = struct 
  let eq x y = x * x = y * y 
end

Take the two modules above for example, they have the same type, but we need a way to mark their identity so that data structures instantiated using them can not be mixed.

A traditional way is using functor:

module Make (Cmp : sig 
  type t 
  val eq : t -> t -> bool
end) = 
(struct 
  open Cmp
  type key = t
  type coll = key list 
  let empty = []
  let add  (y : coll) (e : key) = 
    if List.exists (fun x -> eq x e) y then
      y
    else      
      e::y
end : sig 
  type key = Cmp.t 
  type coll
  val empty : coll
  val add : coll -> key -> coll
end )

module Ins1 = Make(struct
  type t = int 
  let eq x y = x = y 
end)
module Ins2 = Make(struct
  type t = int 
  let eq x y = x * x = y * y
end)

module Make = (
  Cmp: {
    type t;
    let eq: (t, t) => bool;
  }) : {
 type key = Cmp.t;
 type coll;
 let empty: coll;
 let add: (coll, key) => coll;
} => {
  open Cmp;
  type key = t;
  type coll = list(key);
  let empty = [];
  let add = (y: coll, e: key) =>
    if (List.exists(x => eq(x, e), y)) {
      y;
    } else {
      [e, ...y];
    };
};

module Ins1 = Make({
  type t = int;
  let eq = (x, y) => x == y;
});
module Ins2 = Make({
  type t = int;
  let eq = (x, y) => x * x == y * y;
});

By marking coll as abstract type, when such functor is initialized,Ins1.coll and Ins2.coll are no longer the same.

let v = [Ins1.empty; Ins2.empty]

let v = [Ins1.empty, Ins2.empty];

When mixing them together, we get a type error

File ..., line 31, characters 21-31:
Error: This expression has type Ins2.coll
       but an expression was expected of type Ins1.coll

There are some issues with such encoding:

From runtime point of view, Ins1 is initialized during runtime, its implementation is a big closure, which means even if you only use on function in Ins1 module, all functions will be linked in.
From user point of view, people has to call Ins1.add and Ins2.add instead of calling Ins.add, this makes code less polymorphic.

Now we introduce another encoding, note it is quite sophiscated that is recommended only for library authors

module Cmp: {
  type cmp('a, 'id);
  let eq: (cmp('a, 'id), 'a, 'a) => bool;
  module Make: (
    M: {
       type t;
       let eq: (t, t) => bool;
     }
  ) => {
    type identity;
    let eq: cmp(M.t, identity);
  };
} = {
  type cmp('a, 'id) = ('a, 'a) => bool;
  module Make = (
    M: {
     type t;
     let eq: (t, t) => bool;
    }
  ) => {
    type identity;
    include M;
  };
  let eq = (cmp, x, y) => cmp(x, y); /* This could be inlined by using externals */
};

open Cmp;

module Coll: {
  type coll('k, 'id);
  let empty: cmp('k, 'id) => coll('k, 'id);
  let add: (coll('k, 'id), 'k) => coll('k, 'id);
} = {
  type coll('k, 'id) = {
    eq: cmp('k, 'id),
    data: list('k),
  };

  let empty = (type t, type identity, eq: cmp(t, identity)) => {
    data: [],
    eq,
  };
  let add = (x: coll('k, 'id), y: 'k) =>
    if (List.exists(a => Cmp.eq(x.eq, a, y), x.data)) {
      x;
    } else {
      {
        data: [y, ...x.data],
        eq: x.eq,
      };
    };
};

module Cmp : sig 
  type ('a, 'id) cmp 
  val eq : ('a,'id) cmp -> 'a -> 'a -> bool
  module Make : functor (M : 
    sig type t 
      val eq : t -> t -> bool 
    end
  ) -> sig 
    type identity
    val eq :  (M.t, identity) cmp
  end 

end = struct 
  type ('a, 'id) cmp = 'a -> 'a -> bool
  module Make (M: sig 
    type t 
    val eq : t -> t -> bool  
  end) = struct 
    type identity
    include M
  end 
  let eq cmp x y = cmp x y (* This could be inlined by using externals *)
end 

open Cmp 

module Coll : sig 
  type ('k, 'id) coll
  val empty : ('k, 'id) cmp -> ('k,'id) coll
  val add : ('k, 'id) coll -> 'k -> ('k,'id) coll 
end = struct 
  type ('k, 'id) coll = {
    eq :   ('k,'id) cmp;
    data :  'k list 
  }

  let empty (type t) (type identity) (eq : (t,identity) cmp) = {
    data = [];
    eq = eq 
  }

  let add (x : ('k,' id) coll) (y : 'k) =  
    if List.exists (fun a -> Cmp.eq x.eq a y) x.data then x
    else {
      data = y:: x.data;
      eq = x.eq 
    }
end

The key is the construction of Cmp modules, we create an abstract type cmp which is signed by a phantom type as its identity, it is unique whenever user create it by calling Make functor. Here we are still using functor, but it is small functor.

The usage is as below:

module S0 = Make (struct 
  type t = int
    let eq x y = x = y
  end)

module S1 = Make (struct 
  type t  = int
    let eq x y = x * x = y * y 
  end)

let v0 = Coll.empty S0.eq 
let v1 = Coll.empty S1.eq 

let a0 = Coll.add v0 1 
let a1 = Coll.add v1 1

module S0 = Make({
  type t = int;
  let eq = (x, y) => x == y;
});

module S1 = Make({
  type t = int;
  let eq = (x, y) => x * x == y * y;
});

let v0 = Coll.empty(S0.eq);
let v1 = Coll.empty(S1.eq);

let a0 = Coll.add(v0, 1);
let a1 = Coll.add(v1, 1);

In practice, we can make use of first class modules to get rid of functors from end users, which is saved for readers.

When we mix a0 and a1, we will get a type error

File ..., line 71, characters 13-15:
Error: This expression has type (int, S1.identity) Coll.coll
       but an expression was expected of type (int, S0.identity) Coll.coll
       Type S1.identity is not compatible with type S0.identity

As you read here, by using such encoding, the data structure is more generalized from user point of view. The generated JS code is not in a big closure so that it can be dead code eliminated better.

This style is extensively used in Belt encoding, we encourage you to have a look at its implementation for better ideas.

Release 5.2.0/6.2.0

September 23, 2019

bs-platform 5.2.0/6.2.0 is released, it contains several major enhancement that we would like to share with you.

You can install it via npm i -g bs-platform@5.2.0

Local module compiled into object

OCaml has an advanced module system for people to structure large scale applications, it supports first class module and higher-order module system which is unique compared to other ML-like languages such as F# and Haskell.

In previous versions, BuckleScript compiled local modules into a JS array whereby global modules (module produced by a file)were transformed into JS objects.

When a local module is compiled into a JS array, the field name is stripped away, which makes debugging and JS interop difficult. To make the debugging experience better, we instrumented the array with field names in debug mode, this mitigated the debugging issue, but still present challenges for JS interop.

In this release, the compiler generates uniform representation for global module and local module -- idiomatic JS object, this makes OCaml's module system more valuable to JS target.

Below is an image showing the diff in this release

As you can see, the id module changed from an array into an JS object.

Pattern match code generation with annotations

BuckleScript aims to generate readable code.

OCaml has a sophiscated pattern match compiler, it generates well optimized code, however, for complex pattern matching, the constructor name is lost on native backend, this is also one of the very few case where we generate magic number in JS backend, this makes debugging particularly challenging for large complex pattern match.

In this release, we made such information available to JS backend so that we annotate the generated JS code with its names.

Below is an image showing the diff in this release

In the future, we will explore if we can produce such annotation in the runtime without losing efficiency.

Code generation improvement in various places

We care about the generated code quality, and will always keep improving it regardless how good it is.

In this release, we improved the code generation in quite a lot of places including lazy evaluation, if branches and pattern match.

In particular, we added a data-flow pass to eliminate non-necessary staticfail case.

Important bug fixes

This release also comes with a couple of important bug fixes, in particular, #3805 the stale build issue and #3823 the interaction with reason langauge service.

Upcoming breaking changes

In next release, we plan to remove deprecated getters.

A detailed list of changes is available here

Happy hacking!

What's new in release 5.1.0

August 12, 2019

bs-platform 5.1.0 (for OCaml 4.02.3) and 6.1.0 (for OCaml 4.06.1) is ready for testing.

You can install it via npm i -g bs-platform@5.1.0 (or npm i -g bs-platform@6.1.0-dev.6).

A detailed list of changes is available here

Some feature enhancements are described as follows:

Introducing `bsc` to public

bsc is the underlying compiler which is invoked by bsb. In this release we simplified it a bit so that it can be used directly by customers for simple tasks. It is available after you have bs-platform installed.

Suppose you have a file called test.re:

let rec fib (n) = switch n {
    | 0 | 1 => 1;
    | n => fib (n -1) + fib(n-2);
};
Js.log (fib (0));

You can compile it directly via bsc test.re, producing the following output:

bucklescript.github.io>bsc test.re
// Generated by BUCKLESCRIPT, PLEASE EDIT WITH CARE
'use strict';
function fib(n) {
  if (n === 0 || n === 1) {
    return 1;
  } else {
    return fib(n - 1 | 0) + fib(n - 2 | 0) | 0;
  }
}
console.log(fib(0));
exports.fib = fib;
/*  Not a pure module */

You can also get the inferred signature directly via bsc -i test.re

let fib: int => int;

Or even better, you can do a one liner in bsc, via -e option.

bucklescript>bsc -i -e 'let id = x => x'
let id: 'a => 'a;

Note: bsc supports vanilla OCaml syntax as well, this is only recommended for toying around, for serious development, bsb is recommended.

`bstracing` to visualize build profile

After you finish the build process, you can run bstracing directly. This generates a data file called tracing_${hour}_${minute}_${second}.json which can be loaded into chrome via chrome://tracing.

Below is a profile image that shows the tracing graph for a large project:

And you can zoom-in to see more details:

Support of ppx with arguments

We extended the schema to support ppx with arguments:

        "ppx-specs": {
            "type": "array",
            "items": {
                "oneOf" : [
                    {
                        "type": "string" // single command
                    },
                    {
                        "type" : "array", // command with args
                        "items": {
                            "type" : "string" 
                        }
                    }
                ]
            }
        },

Respect NODE_PATH when resolving dependent modules

Previously, bsb was tied to npm package structures by searching node_modules. In this release, bsb also tries to search paths listed in NODE_PATH so that bsb is no longer tied to the npm or yarn package manager.

Build performance improvement

Yes, performance is increased with each release!

Quite a lot of work was spent in house-keeping this release. We changed the internal data representation to a more compact format. Here is the result of using bstracing to show a comparison of clean building a large project around (2 * 5 * 5 * 5 * 5 = 1250 files):

Version 5.0.6 (around 4.8s)

Version 5.1.0 (around 4.2s)

Happy hacking!

Release 5.0.5 and 6.0.2

June 26, 2019

bs-platform 5.0.5 (for OCaml 4.02.3) and 6.0.2 (for OCaml 4.06.1) is released.

A detailed list of changes is available here

It has some critical bug fixes that we suggest users to upgrade.

Some feature enhancement is described as below:

User land C stubs polyfill

Previously, for existing OCaml libraries which rely on some C primitives, it has to be patched in source level. In this release, user can provide such support independently without patching the source code.

Suppose you have an OCaml module which relies on an C primitive as below:

external ff : int -> int -> int = "caml_fancy_add"

external ff : (int,int) => int = "caml_fancy_add" ;

caml_fancy_add is a C function for native code, now we can provide this support in a js files, what user needs to do is adding caml_fancy_add to a global variable like this

/**
 * @param {int} x 
 * @param {int} y
 * @returns {int}
 * 
 */
require('bs-platform/lib/js/caml_external_polyfill.js').register('caml_fancy_add',function(x,y){
  return + ((""+x ) + (""+y))
})

Note this is an experimental feature that we don't suggest users to use it extensively, it is provided as an escape hatch. We are also expecting feedback to see how we could improve it, so there might be some backward incompatible changes.

A new warning number 105

Previously, there are some scenarios that the Js function name is inferred during the interop.

For example

external f : int -> int = "" [@@bs.val]

[@bs.val] external f : int => int = ""

Here the JS function name is inferred as f which is the same as OCaml function name.

Such ffi declaration is fragile to refactoring when changing names of f, it will also change the name of js function name which is probably not what user expected.

105 warning number is to help warn against this case (turned on by default).

Simplified debugger mode

Previously, user has to add -bs-g flag to bsc-flags in bsconfig.json and add one line code to the main module. Such code change is no longer needed, only the flag is needed.

Build performance improvement

We improved the build performance and simplified the design of the build significantly in this release, we will have a separate post about it.

Happy hacking!

A high level overview of BuckleScript interop with Javascript

May 21, 2019

When users start to use BuckleScript to develop applications on JS platform, they have to interop with various APIs provided by the JS platform.

In theory, like Elm, BuckleScript could ship a comprehensive library which contains what most people would like to use daily. This, however, is particularly challenging, given that JS is running on so many platforms for example, Electron, Node and Browser, yet each platform is still evolving quickly. So we have to provide a mechanism to allow users to bind to the native JS API quickly in userland.

There are lots of trade-off when designing such a FFI bridge between OCaml and the JavaScript API. Below, we list a few key items which we think have an important impact on our design.

Interop design constraints

BuckleScript is still OCaml

We are not inventing a new language. In particular, we can not change the concrete syntax of OCaml. Luckily, OCaml introduced attributes and extension nodes since 4.02, which allows us to customize the language to a minor extent. To be a good citizen in the OCaml community, all attributes introduced by BuckleScript are prefixed with bs.
Bare metal efficiency should always be possible for experts in pure OCaml

Efficiency is at the heart of BuckleScript's design philosophy, in terms of both compilation speed and runtime performance. While there were other strongly typed functional languages running on the JS platform before we made BuckleScript, one thing in particular that confused me was that in those languages, people have to write native JS to gain performance. Our goal is that when performance really matters, it is still possible for experts to write pure OCaml without digging into native JS, so users don't have to make a choice between performance and type safety.

Easy interop using raw JS

BuckleScript allows users to insert raw JS using extension nodes directly. Please refer to the documentation for details. Here we only talk about one of the most used styles: inserting raw JS code as a function.

let getSafe : int array -> int -> int = fun%raw a b -> {| 
    if (b>=0 && b < a.length) {
        return a [b]
     }
     throw new Error("out of range")
  |} 

let v = getSafe [|1;2;3|] (-1)

let getSafe: (array(int), int) => int = [%raw
  (a, b) => {|
    if (b>=0 && b < a.length) {
        return a [b]
     }
     throw new Error("out of range")
  |}
];

let v = [|1, 2, 3|]->getSafe(-1);

Here the raw extension node asks the user to list the parameters and function statement in raw JS syntax. The generated JS code is as follows:

function getSafe (a,b){ 
    if (b>=0 && b < a.length) {
        return a [b]
     }
     throw new Error("out of range")
  };

var v = getSafe(/* array */[
      1,
      2,
      3
    ], -1);

Inserting raw JS code as a function has several advantages:

It is relatively safe; there is no variable name polluting.
It is quite expressive since the user can express everything inside the function body.
The compiler still has some knowledge about the function, for example, its arity.

Some advice about using this style:

Always annotate the raw function with explicit type annotation.
When annotating raw JS, you can use polymorphic types, but don’t create them when you don’t really need them. In general, non polymoprhic type is safer and more efficient.
Write a unit test for the function.

Note that a nice thing about this mechanism is that no separate JS file is needed, so no change to the build system is necessary in most cases. By using this mechanism, BuckleScript users can already deal with most bindings.

Interop via attributes

If you are a developer busy shipping, the mechanism above should cover almost everything you need. A minor disadvange of that mechanism is that it comes with a cost: a raw function can not be inlined since it is JavaScript, so the BuckleScript compiler does not have a deep knowledge about the function.

To demonstrate interop via attributes, we are going to show a small example of binding to JS date. There are lots of advanced topics in the documentation; here we are only talking about one of the most-used methods for interop.

The key idea is to bind your JS object as an abstract data type where a data type is defined by its behavior from the point of view of a user of the data, instead of the data type’s concrete representations.

type date
external fromFloat : float -> date = "Date" [@@bs.new]
external getDate : date -> float = "getDate" [@@bs.send]
external setDate : date -> float -> unit = "setDate" [@@bs.send]

let date = fromFloat 10000.
let () = setDate date 3.
let d = getDate date

type date;

[@bs.new]
external fromFloat : float => date = "Date" ;
[@bs.send]
external getDate : date => float = "getDate" ;
[@bs.send]
external setDate : date => float => unit = "setDate";

let date = fromFloat (10000.0);
date->setDate (3.0);
let d = date -> getDate;

The preceding code generates the following JS. As you can see, the binding itself is zero cost and serves as formal documentation.

var date = new Date(10000);
date.setDate(3);
var d = date.getDate();

A typical workflow is that we create an abstract data type, create bindings for a “maker” using bs.new, and bind methods using bs.send.

Thanks to native support of abstract data types in OCaml, the interop is easy to reason about.

Some advice when using this style:

Again, you can use polymorphic types in your annotations, but don't create polymorphic types when you don't need them.
Write a unit test for each external.

As a comparison, we can create the same binding using raw:

type date
let fromFloat : float -> date = fun%raw d -> {|return new Date(d)|}
let getDate : date -> float = fun%raw d -> {|return d.getDate()|}
let setDate : date -> float -> unit = fun%raw d v -> {|
   d.setDate(v);
   return 0; // ocaml representation of unit 
|}

let date = fromFloat 10000.
let () = setDate date 3.
let d = getDate date

type date;
let fromFloat: float => date = [%raw d => {|return new Date(d)|}];
let getDate: date => float = [%raw d => {|return d.getDate()|}];
let setDate: (date, float) => unit = [%raw
  (d, v) => {|
   d.setDate(v);
   return 0; // ocaml representation of unit
|}
];

let date = fromFloat(10000.);
date->setDate( 3.);
let d = date->getDate;

The generated JS is as follows, and you can see the cost:

function fromFloat (d){return new Date(d)};

function getDate (d){return d.getDate()};

function setDate (d,v){
   d.setDate(v);
   return 0; // ocaml representation of unit 
};

var date = fromFloat(10000);

setDate(date, 3);

var d = getDate(date);

architecture changes in bs-platform v5.0.4 and v6.0.1

April 22, 2019

We are going to make releases of bs-platform@5.0.4 and bs-platform@6.0.1, this release mostly contains bug fixes.

At the same time, we are introducing an internal change which should be fine for average users.

If you are tooling authors, here are some details: previously we ship react_jsx ppx as a stand-alone binary, for example, reactjs_jsx_ppx_v2.exe, reactjs_jsx_ppx_v2.exe. Recently we start more close integration with reason, so it is absorbed into bsc.exe itself, we introduced a flag to have react-jsx on/off (note these are internal flags, not expected to be exposed to average users):

bsc.exe -bs-jsx 2 # turn on reactjs_jsx_ppx_v2
bsc.exe -bs-jsx 3 # turn on reactjs_jsx_ppx_v3

Like before, we also ship a stand alone bsppx.exe, it now absorbs reactjs_jsx_ppx as well.

bsppx.exe -bs-jsx 2 # turn on reactjs_jsx_ppx_v2
bsppx.exe -bs-jsx 3 # turn on reactjs_jsx_ppx_v3

The benefit of this change is that it help reduced the prebuilt binary size significantly and it also help shave 5~10ms per file compilation.

Another minor change we added, is that we introduced an env variable BS_VSCODE to help error messages more adapted to VsCode, see here for more details.

Release 5.0.1

April 9, 2019

bs-platform@5.0.1 preview is available, try npm i -g bs-platform@beta-4.02! A detailed a list of changes is available here

Some notable new features in this release:

react jsx v3 is available which means zero-cost for react-bindings (@rickyvetter will talk about it in a separate post)
bs.inline for library authors. Our compilers have a pretty good inlining heuristics by default, in this release, we allow some user input for some fine-tuned inlining behavior. Read this (https://github.com/BuckleScript/bucklescript/issues/3472) for more use cases. A typical usage is as below
```
module Platform = struct
    let ios = "ios" [@@bs.inline]
end    
```
If user wants to write an interface, it has to carry the payload though:
```
module Platform : sig
    val ios : string [@@bs.inline "ios"]
end = struct
    let ios = "ios" [@@bs.inline]
end    
```
It is a bit verbose for library authors, but this should be transparent to library users.

We are also actively working on a new offical release targeted to OCaml 4.06 for the forthcoming reason-conf, below is proposed release schedule:

We are going to support OCaml 4.06 and 4.02 at the same time for a while.

The corresponding versions for bs-platform would be 5.(targeting 4.02 OCaml) and 6. (targeting 4.06).

5.* is recommended for production usage, bug fix is prioritized (tagged as beta-4.02 for pre-rleases)

6.* is expected to have some issues but encouraged to experiment until we make an official announcement it is great for production. (tagged as beta-4.06 for pre-releases)

bs-platform release v6.0+dev

March 31, 2019

bs-platform@6.0.0-dev.1 is released, you can try it with npm i -g bs-platform@next! (if you have permission issues, try sudo npm i --unsafe-perm -g bs-platform@next)

This is the first release that bucklescript compiler using OCaml 4.06.1 typechecker.

It also means that most language features from OCaml will trickle down automatically.

It is not yet ready for production but we recommend you to try it (esp, new OCaml features between 4.02.3 and 4.06.1), note we expect users to have some issues in experiment, feedback is welcome!

bs-platform release v5

March 21, 2019

bs-platform@5.0.0 is released! There are quite a few bug fixes in this release and refmt is synced up, a detailed list of changes is available here.

Several new features are introduced in this release:

first class bs.variadic support, documented here
we prebuilt binaries for Windows, MacOS and Linux, this will help reduce your CI build time significantly. For exotic OSes, it will fall back to build from source
bs.deriving light config. For people who prefer short names, we make it configurable

type t = {
    x : int
} [@@bs.deriving {abstract = light}]

let f (obj : t) = obj |. x

This is the last major release which is targeting OCaml 4.02.3, in the future we will make major releases targeting OCaml 4.06.

Happy Hacking!

First-class `bs.variadic` Support in the Next Release

March 1, 2019

In previous releases, when a bs.variadic external (previously called bs.splice prior to version 4.08) is present, its tail arguments needed to be applied statically. In other words, the external marked with bs.variadic, when used, requires a literal array:

external join : string array -> string = "join"
[@@bs.module "path"][@@bs.variadic]

let _ = join [|"a"; "b"|] (* this is ok *)
let f b = join b (* compiler error when you try to abstract `join` *)

[@bs.module "path"][@bs.variadic]
external join: array(string) => string = "join"

let _ = join([|"a", "b"|]) /* this is ok */
let f = b => join(b) /* compiler error when you try to abstract `join` */

More importantly, such compilation error was leaky in cases such as this one:

let f = join

let f = join

In the next release, we are going to lift such restriction. You'll be able to call an external marked with bs.variadic with an array reference, not just a literal array.

Caveat: it's unclear how to support such first class bs.variadic call in conjunction with bs.new, so an external declaration that contains both will trigger a compilation error. We'll try to figure out this particular case in the future too.

← Prev Next →

Local module compiled into object

Pattern match code generation with annotations

Code generation improvement in various places

Important bug fixes

Upcoming breaking changes

Introducing bsc to public

bstracing to visualize build profile

Support of ppx with arguments

Respect NODE_PATH when resolving dependent modules

Build performance improvement

User land C stubs polyfill

A new warning number 105

Simplified debugger mode

Build performance improvement

Interop design constraints

Easy interop using raw JS

Interop via attributes

Introducing `bsc` to public

`bstracing` to visualize build profile