Implement compiled mode for Perlang

In https://github.com/perlang-org/perlang/discussions/396, I described the recent events leading up to me trying out what LLVM can do for us, in terms of making it possible to run Perlang programs completely independent of the .NET platform.

After that comment was written, and some discussions I had with an old friend of mine (@diwic - thanks a lot to you! 🙏), I started hacking on this and doing a little experiment: How hard would it be to write a compiler for Perlang, which emits C++ code, compiles this code, and then runs the end result? This is obviously not the "final solution" in any way and it is admittedly a bit clumsy. Still, if it was good enough for Bjarne Strousrup, it ought to be good enough for me as well. (Naturally, Strousrup's preprocessor and later Cfront compiler didn't emit C++, but you get the picture.)

I'm setting the milestone for this to 0.4.0, but naturally, given the sheer size of this task, the compiler will in no way be complete in 0.4.0. But it'll probably work to the point where I feel comfortable about pushing it out to the public.

Rough steps

  • Implement a compiler which translates the syntax tree for all/most valid Perlang programs into valid C++ code, and compiles and runs the result: https://github.com/perlang-org/perlang/pull/409
  • Implement a C/C++-based stdlib to support the above: !407 (merged).
    • Make it possible to write unit and/or integration tests for this. We probably have to write these in C or C++ for now. cmocka is a useful unit test library for C that I have used elsewhere.
    • Add support for BigInt: #415 (closed)
    • Distribute the (compiled) stdlib along with snapshot builds. This involves a bit of complexity, since native C++ code has historically only been able to compile on the same platform as the CI job is running on. We'll need to investigate if clang makes this easier for us.
      • In line with the next point, I think it's fine if we are Linux and amd64-only at this point (in compiled mode). In other words, we'll provide a Linux amd64 binary of the stdlib for now and emit an error message on other platforms stating that experimental compilation is not yet supported.
    • We will keep things simple in the 0.4.0 milestone and only support compiled mode on Linux. This makes the above easier. Going forward, we'll need to start building releases separately on each platform (i.e. build macOS on a macOS CI runner, build Linux binaries on Linux and so forth). I'll create a separate issue for this at some point and add a link to it here.
    • Implemented as of !445 (merged), with the above limitation (Linux-only).
  • Make sure PerlangCompiler uses the stdlib artifacts (.so/.a files and .h/.hpp header files), when being executed from a snapshot build.
    • The only thing that will prevent this from happening is if $PERLANG_ROOT is set. $PERLANG_ROOT is still used when running Perlang from source, so let's leave this as-is for now.
  • Once this is stable enough, consider dropping interpreted mode (to avoid having to always make "two implementations" for all new functionality going into the library). Challenge: this will make it hard/impossible to support the REPL though, so ideally we would keep this until we can reimplement the REPL on top of LLVM instead.
    • I am currently (2023-11-03) leaning towards dropping (parts of) the REPL soon, perhaps in the 0.5.0 or 0.6.0 release. This will make things simpler and free us from having to keeping it working all the time, since it won't be working in compiled anyway (for quite a long time, realistically speaking). Once the Perlang compiler is mature enough to be able to interface with LLVM to generate machine code for an arbitrary Perlang expression tree, we can reimplement the REPL on top of this.

      Suggested approach: make some "glue tooling" for interfacing between Perlang and C++ (and perhaps between Perlang and C# in the intermediate stage), so that we can expose the Perlang AST types to a little C++ helper library. The helper library will then consume the LLVM headers and emit machine code for the Perlang AST.

  • Figure out how to answer hard questions, like how to cast an ASCIIString to String (https://github.com/perlang-org/perlang/pull/451/files#r1548516040)
    • Fixed (or worked around) by !453 (merged), which should be "good enough" for now. As the compiler matures (and we can eventually move away from relying too much on C++), we can rework this to use more stack-based ASCIIString instances where possible, to reduce the number of heap allocations.
  • Implement some of the obvious missing string-related operations
  • Make it possible to call methods from Perlang code
    • This is a limitation we currently have. You can not call length() on an array for example, which is a quite important limitation that we need to address fairly soon.
  • Implement some mechanism for multi-file projects (like a "build system" of some form, like MSBuild or cargo)
    • TODO: Definitely deserves an issue of its own. A quick-and-dirty approach could be to support a perlang . or perlang <some-directory> approach, i.e. compile all files in a given directory; this seems to be similar to how https://vlang.io/ does it. The easy way here would be to just emit a single C++ file; if we do it like this, I think we can postpone the "build system" question for (perhaps much) later.
  • Implement a way to call Perlang code from C#, by compiling the Perlang code to one or more .so (subsequently .dll on Windows) files.
  • Implement a way to do "reverse P/Invoke", i.e. expose Perlang code as native functions for calling them from managed C# code.
  • Once the compiler is in place and we have the required mechanics for creating native libraries with Perlang, start planning on gradually rewriting the Perlang compiler in Perlang. The "easiest" way is probably to start rewriting some isolated part of it, and call into the Perlang (native) code from C#.
    • The bootstrapping can be done using a "stable" version of the "compile-via-C++" compiler.
    • Once we have that bootstrapped, we can then subsequently move to depend on the first "stable" version which can compile to native code without any dependency on C++; our only dependency will be on the LLVM libraries at this point. (Challenge: consuming LLVM from non-C++ languages can be impractical. We might need to write some C++-based glue code in the Perlang compiler to make this happen, as described in one of the previous points.)
    • Should also have an issue of its own: #454.
Edited by Per Lundberg