HostGator Web Hosting
HostGator Web Hosting

[ad_1]

Hi all! looking for some advice on things to read or patterns to use related to an optimizing compiler. For some quick background I like several other people I’ve seen on this subreddit read through crafting interpreters book and caught the PL bug. I’ve continued to work on my language to what I would now call a close relative to Lox, [Laythe](https://github.com/Laythe-lang/Laythe).

So what my long term goal is to get a typescript like experience. Essentially I’d like for a program to be either run immediately without any type checking and very simple codegen to my vm’s bytecode or run the typechecker and maybe do some relatively simple optimization passes.

So far I’ve replaced my single pass compiler with an actual parsing step where the program transformed into an ast like format and then codegen is performed on that ast. Currently, the end result is much the same as before but with a clearer separation of parsing and codegen.

So just to elaborate a few of the types of optimizations I’d like to look into. As an example currently if I’m I’m going to access a property or a method, I’m currently doing a hash lookup. I’d emit an instruction like `GetProperty(u16)` which has a unsigned 16 bit argument. What currently happens is something like this

1. Lookup string constant in constant table at offset u16
2. Using said string do a hash lookup on and instance object to get the value of the property

Ideally, I think I should be able to instead have the compiler figure out the offset in the instance to avoid the hash lookup. I’m not sure what sort of passes or data structures would help me track that that some identifier `Foo` is a `Bar` and property `Baz` is therefore at offset 3.

Another relatively simple optimization that I could see would just be on my current bytecode itself. As an example have a `Drop` instruction and a `DropN(u8)` instruction for drop values off the stack. I’m not sure what would be the best approach to condensing something like

“`
Drop
DropN(2)
Drop
“`
into
“`
DropN(4)
“`

Would it make sense to have another intermediate presentation between the ast and the file vm bytecode. How are these sorts of transformations made?

Anyways I would appreciate any advice!

[ad_2]

View Reddit by jonnyboyC13View Source


5 Comments

L8_4_Dinner · September 18, 2020 at 4:25 pm

There’s a weekly online (Zoom) meetup on the topic of compilers hosted by the guy who wrote the Java Hotspot compiler, Cliff Click. Shoot him a DM on Twitter at [twitter.com/cliff_click/](https://twitter.com/cliff_click/) and ask him for an invite; it’s every Friday at 10am PT (1pm ET). That’s in 30 minutes, so if you hurry … 😉

bart7796 · September 18, 2020 at 4:28 pm

>Ideally, I think I should be able to instead have the compiler figure out the offset in the instance to avoid the hash lookup.

So, this sounds like it is a bytecode compiler. Is the language statically typed?

If not then you may not be able the determine such offsets at compile-time for an access such as A.B, as you don’t know what type A will be.

However, you don’t really need to look up a string “B” in a table either, unless this is ultra-dynamic like Python with arbitrary attribute names.

If there is a small set of candidates for “.B” across the program (eg. only 6 classes define an attribute ‘B’), then at runtime you have to search a small table of 6 entries to find the offset that matches the type of A.

(If only one .B is defined, then this time you can output its offset directly.)

dinfuehr · September 18, 2020 at 4:48 pm

Hi,

your first optimization sounds like an inline cache and is probably the most important optimization for dynamic languages (I will assume your language is dynamic here). Look for inline caches/shapes/maps/hidden classes, there is lots of documentation & explanations around. The idea about the inline cache would be to remember the (object type, offset) for each GetProperty at runtime. In case the current object matches the last seen object type, you can just use the recorded offset. If the object type doesn’t match, do the full lookup instead and record (object type, offset) again. From your explanation above it sounds like you want to do that in the compiler statically. Usually in dynamic languages you can’t do this statically for many GetProperty operations, while the inline cache will work for all GetProperty operations. Since you probably have limited time, I would rather spend your time on implementing inline caches than on that optimization pass, this most likely gives you higher speed-ups.

Your second optimization was to optimize the generated bytecode. I wouldn’t do that TBH. Spending CPU cycles on optimizing bytecode, which might only be executed once is not worth it. Even javac didn’t optimize bytecode at all, the last time I looked at its output. For dynamic languages it is even more important to start execution sooner. Runtimes usually only optimize functions/bytecode that was found to be hot (e.g. the function was executed x times). Then it is worth it to spend quite some CPU cycles to optimize that function. You can then also use the data from the inline caches to generate better code (well you even need that data to generate good code).

So personally I would start with implementing inline caches/type feedback and only then start trying to optimize hot bytecode only. For simplicity reasons you could create optimized bytecode first instead of machine code. Here I would use an additional IR for the optimization. Dynamic languages usually don’t have an IR between AST and bytecode, it makes sense if either type checking is easier on an IR than on the AST or for optimization purposes. But it doesn’t sound like this is the case for you.

Hope that makes sense…

bart7796 · September 18, 2020 at 7:21 pm

>
>
>Another relatively simple optimization that I could see would just be on my current bytecode itself. As an example have a Drop
instruction and a DropN(u8)
instruction for drop values off the stack. I’m not sure what would be the best approach to condensing something like
>
>Drop DropN(2) Drop
>
>into
>
>DropN(4)

As previously pointed out, bytecode optimisations can wait (until you know what is slowing things down). But tidy bytecode is a different matter.

Here I’d have only one bytecode, DropN. So the first Drop yields:

DropN(1)

A subsequent DropN(2) can see that the last bytecode was DropN(1), and just change it to DropN(3). And the last Drop further turns it into DropN(4).

(This is actually exactly what I do. My equivalent bytecode is ‘k_free’, which takes one count operand (in-line, not on the stack), which also recovers any resources used by any each stack item.)

yorickpeterse · September 18, 2020 at 8:04 pm

> Another relatively simple optimization that I could see would just be on my
> current bytecode itself. As an example have a Drop instruction and a DropN(u8)
> instruction for drop values off the stack. I’m not sure what would be the best
> approach to condensing something like
>
> Drop DropN(2) Drop into DropN(4)

This is known as a “super instruction”: an instruction that combines one or more
instructions together. While this optimisation itself is fine, I will say this:
avoid super-instructions unless you have found the need for them. I myself fell
for the trap of combining instructions together too often, only to find out
months later it limited me in some ways.

Leave a Reply