Column Apply in Deedle

22 May 2016

The concept of Apply is pretty universal in data frame libraries. At a glance, it allows an operation to be performed on every element within a given subset (typicaly either all elements in a column or all elements in a row). Conceptually this is just a loop, but depending on the parameterization, the frame takes care of some other messiness as well. In Deedle, ColumnApply allows you to invoke some logic on every column of a certain type. The challenge, though, is how to use it effectively when some columns are convertible to other types (i.e. most of your columns are double, but some are int and that is a trivial conversion in .NET). Deedle’s ColumnApply supports some parameters to address that point and that will be demonstrated in this post.

All examples will use the below data frame that emulates a dataset with data points in a variety of data types.

 var frame = Frame.FromRecords(new[] {  new { Label = 1, RealAttr = 2.0, IntAttr = 1, BoolAttr = true, StringAttr = "a" },
                                                new { Label = 1, RealAttr = 3.1, IntAttr = 2, BoolAttr = true, StringAttr = "bb" },
                                                new { Label = 2, RealAttr = 0.9, IntAttr = 1, BoolAttr = false, StringAttr = "ccc" },
                                                new { Label = 2, RealAttr = 0.4, IntAttr = 2, BoolAttr = false, StringAttr = "dddd" },
                                        });

In all cases, the function we’re going to apply simply doubles each value in the column.

Exact

Exact is precisely what it sounds: unless the type of the column is an exact match for the type parameter to ColumnApply’, the column will be skipped. Note that ColumnApply mutates the frame so we have to either overwrite our frame reference or assign it to a new value (here I just call Print() on the new reference and then it’s lost):

frame.ColumnApply<double>(ConversionKind.Exact, series => series * 2).Print();
LabelRealAttrIntAttrBoolAttrStringAttr
0 ->14.01Truea
1 ->16.22Truebb
2 ->21.81Falseccc
3 ->20.82Falsedddd

As you can see, RealAttr has each of its values doubled, but all other columns are unaffected.

Flexible

Flexible conversion is intended to make maximum use of .NET type conversions through the static Convert class and similar methods. Its use in ColumnApply has an interesting effect:

frame.ColumnApply<double>(ConversionKind.Flexible, series => series * 2).Print();
LabelRealAttrIntAttrBoolAttrStringAttr
0 ->24.022a
1 ->26.242bb
2 ->41.820ccc
3 ->40.840dddd

As you can see, a lot more columns are affected this time than when using Exact. Both Label and IntAttr have had their values doubled in addition to RealAttr. The really interesting one, though, is that BoolAttr has gone from boolean representation to integer.

Safe

Safe is the intermediate level between Exact and Flexible: it will allow numeric widening conversions, but no others:

frame.ColumnApply<double>(ConversionKind.Safe, series => series * 2).Print();
LabelRealAttrIntAttrBoolAttrStringAttr
0 ->24.02Truea
1 ->26.24Truebb
2 ->41.82Falseccc
3 ->40.84Falsedddd

You’ll see that again Label and IntAttr are affected, which makes sense as those are basic widening conversions. However, BoolAttr is left alone and retains its boolean representation.

Wrap Up

I hope that better illustrates the uses of ColumnApply. Unfortunately within the lambda you supply, there is no way to determine exactly what column is being applied against so you cannot do any kind of conditional apply (i.e. apply to all floating point and integer columns except one named Label or something like that).