Kirill Osenkov: September 2007

9/19/07

Flags Enum

Imagine we have an enum:

[Flags]
enum BitField
{
    ZeroBit = 1,
    OneBit = 2,
    TwoBit = 4,
    ThreeBit = 8
}

I thought about syntactic sugar to simplify working with bits.
Consider the following:

BitField b = BitField.TwoBit | BitField.ThreeBit;

Testing bits:

// how about this sugar?
bool secondBitSet = b.TwoBit;
// instead of:
bool secondBitSet = (b & BitField.TwoBit) == BitField.TwoBit;

Setting bits:

// how about
b.TwoBit = true;
// instead of:
b |= BitField.TwoBit;

// Same for clearing and inverting bits:
b.TwoBit = false;
// or
b.TwoBit = !b.TwoBit;

If you like it, you can vote on Microsoft Connect: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=247537

9/15/07

Why do we need "where T: enum" generic constraint

Here's an example from my code where I wish there was an "enum" generic constraint available. Basically, it is a combobox for choosing a value from enum's available values. It is automatically filled with values of the enumeration and has a strongly typed Value property.

public class Client {
    // use DriveTypeCombo as a usual Combobox control
    public class DriveTypeCombo : EnumSelectorCombo<DriveType> { }

    ...

    // and use the Value property like this:
    void Foo() {
        driveTypeCombo1.Value = DriveType.CDRom;
    }
}

public class EnumSelectorCombo<TEnum> : EnumSelectorComboBox 
// where TENum : enum
{
    public EnumSelectorCombo() : base(typeof(TEnum)) { }

    [DesignerSerializationVisibility
        (DesignerSerializationVisibility.Hidden)]
    public TEnum Value {
        get {
            return (TEnum)Enum.Parse(
                typeof(TEnum),
                this.SelectedItem.ToString());
        }
        set {
            this.SelectedItem = value.ToString();
        }
    }
}

public partial class EnumSelectorComboBox : ComboBox {
    public EnumSelectorComboBox() : base() { }

    public EnumSelectorComboBox(Type enumeration) : this() {
        this.Enumeration = enumeration;
    }

    private Type mEnumeration;
    [DesignerSerializationVisibility
        (DesignerSerializationVisibility.Hidden)]
    public Type Enumeration {
        get {
            return mEnumeration;
        }
        set {
            if (value != null && value != mEnumeration) {
                mEnumeration = value;
                FillItems();
            }
        }
    }

    private void FillItems() {
        this.Items.Clear();
        this.Items.AddRange(Enum.GetNames(Enumeration));
        this.DropDownStyle = ComboBoxStyle.DropDownList;
    }
}

9/14/07

Making C# enums more usable - the Parse() method

I'll try and accumulate some feedback and thoughts about using enums in C#. There are several issues I see with the current (C# 2.0) enum API:

Methods like Parse() are not strongly typed

Working with flags and bits is a little cumbersome

There is no generic constraint where T: Enum

Today I'll start with the first post about the Parse method. Here's how to use it:

MyEnum enumValue = (MyEnum)Enum.Parse(typeof(MyEnum), stringValue);

This is somewhat ugly. Christopher Bennage proposes a solution with generics which I like (see his post). Also, there are a lot of other links about the lack of generic methods on the Enum class (see posts by Scott Watermasysk, CyrusN, Dustin Campbell etc.)

Still, a generic solution is not perfect from the readability point of view. What I mean is, wouldn't it be nice if the C# compiler could generate a strongly typed Parse method directly on the MyEnum type, so that it could read:

MyEnum enumValue = MyEnum.Parse(stringValue);

See also a nice suggestion at MS Connect: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=96897

But the problem is, most probably you'd have to change the CLR to achieve this behavior, not to mention all the tools that think that enum types cannot have members. I'm not exactly sure, if the C# compiler can generate custom methods on enum types and whether you'd have to change the CLR for it. The reason is that enums (inheriting from the special class System.Enum) are treated a little bit different than other classes. I used ILDASM.exe to view the IL for the following code:

namespace TestEnum
{
    public enum MyEnumEnum
    {
        A,
        B
    }

    public sealed class MyEnumClass
    {
        public static MyEnumClass A;
        public static MyEnumClass B;
        public int value__;
    }
}

In ILDASM, it looks like this:

See, an enum looks internally almost like a class - maybe it is not difficult to add methods to it?

Anyway, you can vote for such features to be implemented:
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=98356
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=293587

However, we can use extension methods in C# 3.0! Like this:

public static T ParseAsEnum<T>(this string value)
    // where T : enum
{
    if (string.IsNullOrEmpty(value))
    {
        throw new ArgumentNullException
            ("Can't parse an empty string");
    }

    Type enumType = typeof(T);
    if (!enumType.IsEnum)
    {
        throw new InvalidOperationException
            ("Here's why you need enum constraints!!!");
    }

    // warning, can throw
    return (T) Enum.Parse(enumType, value); 
}

It could be used like:

DriveType disk = "Network".ParseAsEnum<DriveType>();

Update: Thomas Watson also proposed adding an extension method directly to the Enum class in the comments here. Cool!

9/10/07

C# 3.0 Collection Initializers, Duck Typing and ISupportsAdd

Ilya Ryzhenkov from the ReSharper team has an interesting post: C# 3.0 Collection Initializers - Incomplete Feature? The problem is:

... restrictions are too strong - type being constructed should implement IEnumerable and have instance method "Add". IEnumerable is not of a big deal, but inability to use extension methods for Add is deal breaker.

I think that Ilya makes a very good point and that this feature indeed could be made more universally applicable. However, it is important to clearly understand the semantics of this feature to use it correctly - developers should know precisely what happens behind the curtains, otherwise unexpected side-effects can occur.

Now let's brainstorm a little bit. Recently I wrote about more fine-granular interfaces. I exaggerated a little, but the idea was that an interface should define a minimal contract. Specifically, I expressed regret that there is no ISupportsAdd interface, because adding stuff is a widely used ability of many entities, not necessarily collections.

I think it would help a lot if we had an ISupportsAdd/ISupportsAdd<T> interface with the only void Add(T item) method (you could also name it IFillable, IAllowsAdd, ICanAdd, you name it). For places in the framework where Add methods are named differently (e.g. AddPermission), this interface could be implemented explicitly.

This leads us to another problem: there are already lots of shipped types that do not implement this "ISupportsAdd" interface. How to add an implementation of an interface to a type without changing the type (and its declaring assembly)? We'd need something like extension methods ("extension interfaces", anyone?). Orion Edwards makes (in my opinion) a terrific suggestion:

Define an alternative to an interface called "requirement". It would work and behave exactly the same as an interface EXCEPT that it would use duck typing instead of static typing. For example:

public requirement Closeable
{
    void Close();
}

public void TestMethod( Closeable c )
{
    c.Close();
}
TestMethod( new System.Windows.Forms.Form() );

Collection initializers are another case where duck typing is sneaking into the C# language. Guess what the first case is? Right, the foreach statement. I was surprised, too, when I read a post by Krzysztof Cwalina about Duck Notation. It turns out, foreach doesn't necessarily require IEnumerable - it can also use duck typing to recognize iterable entities.

However, I'm not a professional language designer and I have no idea how duck typing would behave in a statically typed language. The known problem with duck-typing is that it orients to spelling of members, and not the semantics. There could be cases where members are named equally but have totally different semantics, so duck-typing would destroy the benefits of static type checking by allowing semantically incompatible assignments. But with this explicit "requirement" duck typing, who knows, maybe it's a good idea.

I'm just saying that I already had a lot of cases where I regretted that a shipped type doesn't implement some interface - and I couldn't add this interface to the type declaration because I'm the consumer, and not the producer of the type. I believe that "Extension interfaces" or "explicit duck typing" would really help here.

P.S. Oh, and can anyone explain why do we need to implement IEnumerable to be able to use collection initializers? I'm probably overseeing something obvious, but I thought I'd take the risk of sounding stupid and ask anyway :)

Links:
On duck typing in .NET see also:

9/9/07

Compiler as a black-box

I'd like to share some personal thoughts and observations about compilers and their integration into an IDE. This topic might be interesting for those who design programming tools and IDE add-ins. A disclaimer: I'm not an expert in IDE design, but I'm learning, so if you have something to say, I welcome your feedback. Probably I'm saying trivial and well-known things, so please bear with me.

Observation number one - in some IDEs, a compiler is a black-box, meaning the interface between the compiler and the IDE is text-based: source code is input, and binaries are output. Compiler error messages and warnings are returned as plain text, with line and column numbers which indicate the position of an error in the source code.

This would be a good approach if we were to bind the compiler to a simple text editor - one could swap in another editor or another compiler without even recompiling the whole system. But nowadays, the compiler functionality is also required outside the command-line compiler - for features like code completion, refactoring, etc. I think it doesn't make sense anymore to encapsulate the compiler in a black-box with input and output, but instead, to expose the compiler internals to the rest of the IDE and tools.

The classical Dragon Book pipeline splits the whole compilation process into phases (scanner - parser - resolver - code generator). Each phase has input and output and presents a more fine-granular black-box. For example, the parser receives a stream of tokens and outputs an abstract syntax tree (AST). With this architecture, we can plug-in additional steps (e.g. tree transformations) between the compiler phases, such as the resolver and the code generator. This would be highly useful for tools that want to extend the language or the IDE (AOP, design-by-contract, code generation etc).

However, some compilers hide this pipeline from us, encapsulating the compilation process in a single black-box. Someone even coined the term monolithic compiler. Many tools developers express a growing need for compilers to expose these internal compilation steps and to provide hooks to plug-in custom functionality into the compilation process.

Once exposed through classes and interfaces (compiler API), one could apply various design patterns to extend and modify the compilation process. For example, one could wrap the parser into a decorator, which will perform additional tree transformations, or append additional output to the code generator. The whole compiler would provide a factory that would produce scanners, parsers, code generators, etc. One could plug one's own parts into this factory to replace defaults.

This is one important step towards compile-time reflection (which would enable things like syntactic macros, quasi-quotation and metaprogramming in general). My favorite example of these technologies is the Nemerle language.

An advantage of exposing a compiler API would be the reusability of the compiler functionality. With a monolithic compiler, one would need to duplicate functionality to implement code completion, refactoring etc. An IDE would be full of places of round-tripping from code to AST and back. With an extensible compiler, the AST would become the main data structure of the IDE. Soon I plan to blog more about AST as the primary data structure of the IDE, as opposed to the source code as text.

Please note, I'm not talking about any concrete IDE implementations, because I haven't had a chance to look at them more closely. From what I've heard, Eclipse does a pretty good job at sharing its AST with plug-ins. In some future post I'll talk about SharpDevelop - the IDE I had some experiences with.

In the meanwhile, here are some links for those who found this topic interesting:

I'd love to hear your opinions and feedback on this. Thanks!

9/5/07

Traits/mixins

Today just a quick post about language design.

SecretGeek pointed me to this article: Create Mixins with Interfaces and Extension Methods and I really liked the idea. This reminded me of Haskell's type classes, where you can implement part of the interface based on the other part. Later, you don't have to implement the whole interface, the "default" part gets implemented automatically.

Generally, I'm fond of the idea of traits/mixins, this would be useful in quite a number of situations, especially when you'd like to share some functionality across different class hierarchies. Here's a popular link to the research about traits: Traits — Composable Units of Behavior

Update: I found more interesting links about this:

9/3/07

Static analysis and source code querying

Professionally, I am very interested in developer tools, especially how to develop them in a proper way. One kind of developer tools are those that let developers analyse the code to extract some statistics or other characteristics about it. This is called static analysis because the information about the code can be obtained at compile-time, when the code is not even running yet.

Today, I'd like to write about three tools in this area that I'm interested in.

1. SemmleCode

Released by Semmle as a free product, this Eclipse plug-in allows to write and execute queries against the source code base using the .QL query language. The .QL language is a specially developed SQL/LINQ-like query language an interesting property of which is extensibility and object orientation. An example:

from Field f
where f.hasModifier("public")
      and
      not f.hasModifier("final")
select f.getDeclaringType().getPackage(),
       f.getDeclaringType(),
       f

This query returns all public non-final fields, and for each field it also returns the type and package where the field is defined.

How can SemmleCode be useful? The website gives six mainline usage scenarios:

Search and Navigate code
Find bugs
Compute metrics
Enforce coding conventions
Generate charts and graphs
Share your queries

How does SemmleCode work? First, it walks the entire source code and parses it into an intermediate representation. Eclipse is kind enough to provide tools with a Java parser and full access to the AST, so Semmle folks didn't have to write their own Java parser. Note how great it is, when the IDE takes so much care about its tools and lets them warmly become part of the IDE family.
Anyway, then SemmleCode dumps the AST into a relational database, whereas only class and member information is being stored. Currently they don't go down to the statement level and mostly do inter-procedural analysis (not intra-procedural). However, method calls still land in the DB, which is a good thing.

When you execute your query it's being internally rewritten in Datalog, a dialect of Prolog. Prolog is a terrific eye-opener and deserves a separate post in the future. Finally, Datalog is being converted to very efficient and highly optimised SQL, which is then run against the DB engine.

To sum up, Semmle emphasizes flexible arbitrary querying against the code model. This is a little bit different usage pattern if we compare it to checking against fixed and predefined rules, like for example FxCop does. SemmleCode is more about discovery and analysis, while FxCop is more about automated quality control and checking.

That's about it. The tool is great, .QL is expressive, and Semmle is moving forward with promising regularity. Watch them at QCon in San-Francisco later this year.

2. .NET tools

OK, Eclipse is good, but what about the rest of us, .NET folk? Well, first there is NDepend, which I still haven't had a chance to look at (sorry Patrick!) But it looks like a good tool, I should definitely give it a try in my spare time.

Then, there is FxCop, the widely used one. FxCop contains a library of distilled developer experience formulated as rules. The code is checked against the rules and FxCop annoys developers until they either fix the code or finally lose their temper and just turn the offending rule off :) It is noteworthy that FxCop doesn't parse the source code - it goes in the reverse direction and analyses the compiled assemblies.

But today I'd like to specially write about NStatic, which is a promising tool I'm really excited about. Wesner Moise is the talented developer behind it, who applies AI and algebraic methods to code analysis. As of now, NStatic hasn't been released yet, but I'm closely watching Wesner's blog, which is a real wealth of insightful information. Beside that, Wesner seems to like the idea of structured editing, which also happens to be my own passion.

3. Sotograph

Last, but not least, another product which I'm interested in - http://www.software-tomography.com. This tool emphasizes visualization of large systems and the metaphor behind comes from medicine. Just like tomography allows to peek into the human body to see what exactly is wrong, Sotograph allows to visualize large software systems to analyse dependencies and find architecture flaws.

Software Tomography recently introduced a highly-efficient C# parser specially developed at the University of Linz, Austria - home of Prof. Hanspeter Mössenböck, the creator of Coco/R, a .NET parser generator. This is also a good topic for a separate post.

One possible usage scenario for such tools could be determining dependencies between subsystems, for example, when planning a large refactoring or other massive code changes. Static analysis tools allow us to peek into the future and see what dependencies are going to be broken if I do this and that. We can also conduct targeted search using source code querying. Whatever we do - we do it, in the end, to increase code quality and plan for future maintenance and scalability.

Update: see also my del.icio.us links about static analysis: http://del.icio.us/KirillOsenkov/StaticAnalysis

9/2/07

A usage scenario for empty "marker" interfaces

There is a well-known advice (originating probably from the FDG book) to avoid interfaces with no members. Such interfaces are mostly used to mark types, and testing if a type is marked is done with the "is" operator like this:

if (myObject is INamespaceLevel)

It is being offered to use attributes instead, for example, like this:

if (!obj.GetType().IsDefined(
    typeof(ObsoleteAttribute), false))
    ...

The advantages of attributes are:

they can have parameters
you can precisely control how attributes are inherited. You can easily turn the attribute off on a derived type, whereas you can't erase an interface from a derived type's inheritance tree, if a base type already implements it.

The disadvantages of attributes are:

the clumsy syntax
and the runtime costs of checking (reflection is slower than the is operator).

A possible usage scenario for marker interfaces
I had one situation so far where marker interfaces seem to be quite useful and look neat. Frankly, as I write this, I realize that I could have taken attributes as well, but I already wrote too much of a post so it's a pity to throw it away. When I started this post, I was a strong believer that marker interfaces are good, now I think attributes deserve a chance as well :-)

Anyway, here's the example that I originally wanted to post (and you judge by yourself if marker interfaces are justified here). In the C# code editor I am building, language constructs were modeled by types inheriting from the Block class, for example, like this (a small subtree of the entire AST):

Now, some blocks are allowed to be nested in other blocks. For example, a class can be nested within a namespace or another class, and a method with a body can be nested in a class or a struct. To determine, where a language construct can be used, I introduced a set of marker interfaces:

public interface INamespaceLevel { }
public interface IClassLevel { }
public interface IMethodLevel { }

Now, when we drag-and-drop or copy-paste blocks, determining if a block can be dropped within a container is easy. Each container has a list of allowed interfaces that can be accepted (not necessarily a single interface, we want to be flexible). Once we drag a block over the container, we look if the dragged block implements any of the interfaces we can accept:

bool foundAssignable = false;
foreach (Type acceptableType in AcceptableBlockTypes)
{
    if (acceptableType.IsAssignableFrom(dragged.GetType()))
    {
        foundAssignable = true;
    }
}

We fill the AcceptableBlockTypes list like this:

AddAcceptableBlockTypes<IClassLevel>();

And here's the definition for AddAcceptableBlockTypes:

private Set<Type> AcceptableBlockTypes = new Set<Type>();

public void AddAcceptableBlockTypes(params Type[] acceptableBlockTypes)
{
    foreach (Type t in acceptableBlockTypes)
    {
        if (!AcceptableBlockTypes.Contains(t))
            AcceptableBlockTypes.Add(t);
    }
}

public void AddAcceptableBlockTypes<T1>()
{
    AddAcceptableBlockTypes(typeof(T1));
}

public void AddAcceptableBlockTypes<T1, T2>()
{
    AddAcceptableBlockTypes(typeof(T1), typeof(T2));
}

public void AddAcceptableBlockTypes<T1, T2, T3>()
{
    AddAcceptableBlockTypes(typeof(T1), typeof(T2), typeof(T3));
}

Now I wonder how sane this is and if I should really have taken attributes instead. I like the usability of the current API and it looks like the approach works fine for my editor. Now let's wait and see how it scales as I modernize the editor to support more recent C# versions than 1.0 :) I'll keep you posted.

More about Interface usage in .NET

In a recent post I shared some personal experiences about when to use interfaces or abstract classes.

As it turns out, internet is full with information and advice about it:

Evan Hoff has an interesting post dividing the interface usage scenarios into three groups: interfaces modeling object characteristics, capabilities and complex entities. To reiterate, the only reason I see to model a complex entity as an interface is that implementations will use more than one base class as roots of the class hierarchy.
Thomas Gravgaard in the post Random Ramblings and Rumblings: The Interface Tax confirms my experience about duplicating members in both a class and its interface. I totally agree with him, that an interface in this case is mostly redundant and not justified. I also learned the cool YAGNI acronym. If you speak German, I like this description more.
An advice to avoid marker interfaces is ubiquitous, although I still can't find any justification for it. I'll disagree with this advice in my future post.

Kirill Osenkov