Compiler as a black-box

I'd like to share some personal thoughts and observations about compilers and their integration into an IDE. This topic might be interesting for those who design programming tools and IDE add-ins. A disclaimer: I'm not an expert in IDE design, but I'm learning, so if you have something to say, I welcome your feedback. Probably I'm saying trivial and well-known things, so please bear with me.

Observation number one - in some IDEs, a compiler is a black-box, meaning the interface between the compiler and the IDE is text-based: source code is input, and binaries are output. Compiler error messages and warnings are returned as plain text, with line and column numbers which indicate the position of an error in the source code.

This would be a good approach if we were to bind the compiler to a simple text editor - one could swap in another editor or another compiler without even recompiling the whole system. But nowadays, the compiler functionality is also required outside the command-line compiler - for features like code completion, refactoring, etc. I think it doesn't make sense anymore to encapsulate the compiler in a black-box with input and output, but instead, to expose the compiler internals to the rest of the IDE and tools.

The classical Dragon Book pipeline splits the whole compilation process into phases (scanner - parser - resolver - code generator). Each phase has input and output and presents a more fine-granular black-box. For example, the parser receives a stream of tokens and outputs an abstract syntax tree (AST). With this architecture, we can plug-in additional steps (e.g. tree transformations) between the compiler phases, such as the resolver and the code generator. This would be highly useful for tools that want to extend the language or the IDE (AOP, design-by-contract, code generation etc).

However, some compilers hide this pipeline from us, encapsulating the compilation process in a single black-box. Someone even coined the term monolithic compiler. Many tools developers express a growing need for compilers to expose these internal compilation steps and to provide hooks to plug-in custom functionality into the compilation process.

Once exposed through classes and interfaces (compiler API), one could apply various design patterns to extend and modify the compilation process. For example, one could wrap the parser into a decorator, which will perform additional tree transformations, or append additional output to the code generator. The whole compiler would provide a factory that would produce scanners, parsers, code generators, etc. One could plug one's own parts into this factory to replace defaults.

This is one important step towards compile-time reflection (which would enable things like syntactic macros, quasi-quotation and metaprogramming in general). My favorite example of these technologies is the Nemerle language.

An advantage of exposing a compiler API would be the reusability of the compiler functionality. With a monolithic compiler, one would need to duplicate functionality to implement code completion, refactoring etc. An IDE would be full of places of round-tripping from code to AST and back. With an extensible compiler, the AST would become the main data structure of the IDE. Soon I plan to blog more about AST as the primary data structure of the IDE, as opposed to the source code as text.

Please note, I'm not talking about any concrete IDE implementations, because I haven't had a chance to look at them more closely. From what I've heard, Eclipse does a pretty good job at sharing its AST with plug-ins. In some future post I'll talk about SharpDevelop - the IDE I had some experiences with.

In the meanwhile, here are some links for those who found this topic interesting:

I'd love to hear your opinions and feedback on this. Thanks!


James Swaine said...

Are there any plans to expose this sort of API in the .NET production compilers (C#, VB, etc.)?

We have limited ability to plug into the compilation process in ASP.NET via Build Providers, and I think this could be extremely useful in broader scenarios where we can access the compiler itself.

Kirill Osenkov said...

Thanks James!

actually this was the first question I asked when I started at the C# team. It turns out, Microsoft is receiving a lot of feedback like this, so we are definitely aware of your needs. However, now we are in the process of shipping Visual Studio "Orcas" and just moving to planning the next release, so it is probably too early to say anything definite at this point. But I can assure you that your scenario reaches the right people and we'll definitely consider your feedback as we move towards the next version of Visual Studio.


Anonymous said...

forget it, of course they know that since the dawn of time, but they will never really do anything about it

M$ will never ever expose a complete API, 'cause that would mean, that you could do something REALLY useful with one the most important things MS owns
- without MS or the need to upgrade or pay for support ...

MS tactics always was to open one area X while closing secretly area Y or leaving behind area Z and the other way round from release to release.

Anonymous said...

yep, this has become painfully obvious since this was supposedly a planned feature for 2.0, but here we are after the 3.5 release and no word on it.

the funny thing is that i really think it would be trivial to offer this kind of functionality...maybe i'm wrong.

Kirill Osenkov said...

Hi anonymous, this is Kirill from 2015, check out https://github.com/dotnet/roslyn.

LOL I guess we delivered on this one.