Version from Directory.Build.prop in .NET Framework projects

Posted on December 30, 2024 by Wubbi

Most of the time when working with multiple projects in a solution I want to set the version globally using the Directory.Build.props file. That way I only have to update the version in one file, rather than all project files.
While this works great for SDK style projects, it doesn’t work well with .NET Framework projects. Basically, whatever value I have set in the .props file is read once when I open the solution and any changes are ignored until I close and reopen the solution.

As a workaround I use this small target in Directory.Build.targets to actively read the Version property from the .props file and write it into a separate version info file in every Framework project before every build.

Here’s a quick reminder regarding the two Directory.Build files.

<Target Name="SetAssemblyVersion" BeforeTargets="BeforeBuild"  Condition="'$(TargetFramework)'==''">
  <!-- Read the Version property from the .props file -->
  <XmlPeek XmlInputPath="$(MSBuildThisFileDirectory)Directory.Build.props" Query="/n:Project/n:PropertyGroup/n:Version/text()"
           Namespaces="&lt;Namespace Prefix='n' Uri='http://schemas.microsoft.com/developer/msbuild/2003' /&gt;">
    <Output TaskParameter="Result" ItemName="PropVersion"/>
  </XmlPeek>
 
  <!-- Prepare the attribute -->
  <ItemGroup>
    <AssemblyAttributes Include="AssemblyVersion">
      <_Parameter1>@(PropVersion)</_Parameter1>
    </AssemblyAttributes>
  </ItemGroup>
 
  <!-- Write the attribute -->
  <Message Importance="high" Text="Setting assembly version to '@(PropVersion)'" />
  <WriteCodeFragment Language="C#" OutputFile="Properties\AssemblyVersion.cs" AssemblyAttributes="@(AssemblyAttributes)" />
</Target>

Since MSBuild files are basically just XML, we can use XmlPeek to read the content of the Version property. Don’t forget the namespace!
Since the .props and .targets files are right next to each other, we can use the predefined MSBuildThisFileDirectory property to find the path to the .props file.

Then we take that information and write it in a standalone file next to the AssemblyInfo.cs file you get by default. Keep in mind that the target is executed for the project, meaning the path is relative to the .csproject file.
Also remember to remove the AssemblyVersion from the existing AssemblyInfo.cs to avoid errors.

Lastly, we don’t want to run this target for SDK style projects. One of the easiest ways to avoid that is to check the TargetFramework property in the condition.

Explained

Understanding Unicode encoding

Posted on July 23, 2024 by Wubbi

Unicode is basically just a standardized list of various symbols, which assigns a specific integer value to each of them. These are called code points (not to be confused with code units, which come later).
They are grouped into 17 planes, with 2¹⁶ code points each. The first one (plane 0) is called Basic Multilingual Plane, the BMP.

To store text that follows the Unicode standard, you simply have to store a sequence of code points, which is just a sequence of integer numbers.
The specifics of how these numbers are stored can vary and are determined by the texts encoding format.

UTF-32

All of the integer values currently defined in Unicode can be expressed in 21 bits or less.
Therefore the simplest approach would be to simply store each code point as a 24 bit (3 byte) integer.
While no official 24 bit encoding for Unicode exits (so far), there is UTF-32, in which each code point is stored as a 32 bit integer.

UTF-8

Due to the order of symbols in Unicode, the code points of the most common characters all have rather small values. UTF-8 acts similar to a compression algorithm that makes more common characters use less memory, but then uses more memory for larger code points.
Up to a value of 127 only a single byte is used, which (not coincidentally) contains all ASCII characters.
Once you need more than 7 bits, you have to split them among two or more bytes. This is where the term code unit comes in. Every byte in UTF-8 represents a single code unit. All code units combined give you the code point. Of course the code point can also be represented by a single code unit. The already discussed UTF-32 is an encoding where that is always the case.

The way the data is split is rather simple:
The highest order byte has to fill the most significant bits with as many 1s as there are bytes in total, followed by a single 0. The rest of its bits are space for your code point.
All other bytes have to set only their most significant bit to 1, but also followed by a single 0, leaving 6 bits of space each.

Input	UTF-8
`abcdefg`	`0abcdefg`
`ab` `cdefghijk`	`110abcde` `10fghijk`
`abcdefgh` `ijklmnop`	`1110abcd` `10efghij` `10klmnop`
`abcde` `fghijklm` `nopqrstu`	`11110abc` `10defghi` `10jklmno` `10pqrstu`

UTF-8 bit arrangement

UTF-16

UTF-16 uses 16 bit code units, which just like UTF-8 allows you to use a single code unit for more common characters, but requires 2 units for larger code points.
Unlike UTF-8, it requires some math and cannot be done by just moving bits around.

Up to a value of 65,535 (0xFFFF), you can store the code point as a 16 bit integer directly.
If the value is larger than this, you first have to subtract 65,536 (0x10000) from it.
Then take the lowest 10 bits (0-9) of the result and put the bit sequence 110111 in front to form the so called low surrogate, the less significant code unit.
Then take the next higher 10 bits (10-19) and prepend it with 110110 to get the high surrogate.
Combined they give you a surrogate pair.

Now you might be wondering: How do you distinguish the code units in a surrogate pair from two single-unit code points?
The answer lies in the possible integer values a surrogate can have. Since the most significant bits are fixed, the high surrogate can only represent values from 55,296 (0xD800) to 56,319 (0xDBFF), and the low only 56,320 (0xDC00) to 57,343 (0xDFFF).
The range from 55,296 to 57,343 in Unicode is reserved for surrogates and can never be occupied by a real code unit. To decode a UTF-16 encoded text you simply have to check if a code unit falls within the surrogate range. If not, you can use it as is. Otherwise it will be part of a pair (assuming the data is valid), where you can combine the lower 10 bits of each to form one 20 bit number to which you add 65,536 (0x10000).

A different way to look at UTF-16:
Each code unit can represent 2¹⁶ different code points, but 1,024 of those are not used to represent any symbols and are reserved for surrogates.
By combining two code units that represent one of these unused code points, you get 1,024*1,024 = 2²⁰ possible values they can represent together. Which, combined with the code points a single unit can encode, is enough to cover all of the (current) Unicode code points.

Tidbits

Sharing COM objects between processes

Posted on February 4, 2024 by Wubbi

A good 70% of this is taken more or less directly from the .NET Framework source code for the class IWbemClassObjectFreeThreaded and backed by an article from Raymond Chen.

With this code you can create a byte array based on a COM object with which you can create a reference to that COM object from a different process.

public static class ComMarshalHelper
{
    #region Externals
 
    [ResourceExposure(ResourceScope.None), DllImport("ole32.dll", PreserveSig = false)]
    private static extern void CoMarshalInterface([In] IStream pStm, [In] ref Guid riid, [In] IntPtr Unk, [In] uint dwDestContext, [In] IntPtr pvDestContext, [In] uint mshlflags);
 
    [ResourceExposure(ResourceScope.None), DllImport("ole32.dll", PreserveSig = false)]
    private static extern IntPtr CoUnmarshalInterface([In] IStream pStm, [In] ref Guid riid);
 
    [ResourceExposure(ResourceScope.None), DllImport("ole32.dll", PreserveSig = false)]
    private static extern IStream CreateStreamOnHGlobal(IntPtr hGlobal, int fDeleteOnRelease);
 
    [ResourceExposure(ResourceScope.None), DllImport("ole32.dll", PreserveSig = false)]
    private static extern IntPtr GetHGlobalFromStream([In] IStream pstm);
 
    [ResourceExposure(ResourceScope.None), DllImport("kernel32.dll", PreserveSig = true)]
    private static extern IntPtr GlobalLock([In] IntPtr hGlobal);
 
    [ResourceExposure(ResourceScope.None), DllImport("kernel32.dll", PreserveSig = true)]
    private static extern int GlobalUnlock([In] IntPtr pData);
 
    private enum MSHCTX
    {
        MSHCTX_LOCAL = 0,
        MSHCTX_NOSHAREDMEM = 1,
        MSHCTX_DIFFERENTMACHINE = 2,
        MSHCTX_INPROC = 3
    }
 
    private enum MSHLFLAGS
    {
        MSHLFLAGS_NORMAL = 0,
        MSHLFLAGS_TABLESTRONG = 1,
        MSHLFLAGS_TABLEWEAK = 2,
        MSHLFLAGS_NOPING = 3
    }
 
    #endregion //Externals
 
    private static readonly Type CoClassAttributeType = typeof(CoClassAttribute);
 
    public static byte[] MarshalComObject<T>(T comObject)
    {
        Type type = typeof(T);
        Guid objectId = type.GUID;
        if (!type.IsInterface)
        {
            foreach (Type comInterface in type.GetInterfaces())
            {
                if (!comInterface.IsDefined(CoClassAttributeType))
                    continue;
 
                objectId = comInterface.GUID;
                break;
            }
        }
 
        IntPtr iUnknown = IntPtr.Zero;
        IStream? stream = null;
        IntPtr lockedStreamPointer = IntPtr.Zero;
        try
        {
            iUnknown = Marshal.GetIUnknownForObject(comObject);
            stream = CreateStreamOnHGlobal(IntPtr.Zero, 1);
            CoMarshalInterface(stream, ref objectId, iUnknown, (uint)MSHCTX.MSHCTX_LOCAL, IntPtr.Zero, (uint)MSHLFLAGS.MSHLFLAGS_NORMAL);
            stream.Stat(out STATSTG streamInfo, 0);
            byte[] array = new byte[streamInfo.cbSize];
            lockedStreamPointer = GlobalLock(GetHGlobalFromStream(stream));
            Marshal.Copy(lockedStreamPointer, array, 0, array.Length);
 
            return array;
        }
        finally
        {
            if (iUnknown != IntPtr.Zero)
                Marshal.Release(iUnknown);
 
            if (lockedStreamPointer != IntPtr.Zero)
                GlobalUnlock(lockedStreamPointer);
 
            if (stream is not null)
                Marshal.ReleaseComObject(stream);
        }
    }
 
    public static T UnmarshalComObject<T>(byte[] data)
    {
        Type type = typeof(T);
        Guid objectId = type.GUID;
        if (!type.IsInterface)
        {
            foreach (Type comInterface in type.GetInterfaces())
            {
                if (!comInterface.IsDefined(CoClassAttributeType))
                    continue;
 
                objectId = comInterface.GUID;
                break;
            }
        }
 
        IntPtr streamPointer = IntPtr.Zero;
        IStream? stream = null;
        try
        {
            streamPointer = Marshal.AllocHGlobal(data.Length);
            Marshal.Copy(data, 0, streamPointer, data.Length);
            stream = CreateStreamOnHGlobal(streamPointer, 0);
            IntPtr iUnknown = CoUnmarshalInterface(stream, ref objectId);
 
            return (T)Marshal.GetObjectForIUnknown(iUnknown);
        }
        finally
        {
            if (stream is not null)
                Marshal.ReleaseComObject(stream);
 
            if (streamPointer != IntPtr.Zero)
                Marshal.FreeHGlobal(streamPointer);
        }
    }
}

What’s the deal with the GetInterfaces?
We need the GUID of the interface, not the CoClass. This code simply tries to backtrack from the CoClass to the interface.

Why make MarshalComObject generic?
GetType won’t always give you what you expect. Often it will just be System.__ComObject, even if the variable you passed in was the actual type. The generic is simply to ensure we know which type the object belongs to for determining the correct GUID.

Here’s an example using the Microsoft XML COM library (v6.0):

Process A creates a document and fills it with data before it is marshaled.

DOMDocument60 document = new();
document.loadXML("<Root>Some text to fill out this string</Root>");
byte[] data = ComMarshalHelper.MarshalComObject(document);

Sharing data can happen any way you like. Since it isn’t very big, one method might be to just convert it into Base64 and pass it on as CLI argument.

Process B can now unmarshal it and output the XML content.

DOMDocument60 document = ComMarshalHelper.UnmarshalComObject<DOMDocument60>(data);
Console.WriteLine(document.xml);//Outputs '<Root>Some text to fill out this string</Root>'

Tidbits

My method for static methods on generic type parameters

Posted on March 8, 2023March 29, 2023 by Wubbi

Sorry to disappoint the people most likely to stumble upon this, I have no fancy method for providing real static methods in a generic context.
Short version:

Find the method through reflection
Compile into a Func
Store in generic base class

The idea is to call a method from a base class, which redirects the call to the intended method.

Let me give you an example right away, below it you’ll find the explanation:

public abstract class Parent<T> where T : Parent<T>
{
    //Since this is a generic class, we'll have one Func for each T
    private static readonly Func<string, int> ChildGetCount;
 
    static Parent()
    {
        //Find the method
        Type[] types = { typeof(string) };
        MethodInfo methodInfo = typeof(T).GetMethod
        (
            nameof(GetCount), 
            BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.DeclaredOnly, 
            null, 
            types, 
            Array.Empty<ParameterModifier>()
        );
 
        //Verify the method exists and throw a dedicated exception if it doesn't (to make debugging easier)
        if (methodInfo is null)
            throw new MissingMethodException(typeof(T).FullName, nameof(GetCount));
 
        //Build an Expression to speed up performance
        ParameterExpression parameterExpression = Expression.Parameter(types[0]);
        MethodCallExpression methodCallExpression = Expression.Call(methodInfo, parameterExpression);
        Expression<Func<string, int>> expression = Expression.Lambda<Func<string, int>>
        (
            methodCallExpression, 
            parameterExpression
        );
 
        //Store the Func for later use
        ChildGetCount = expression.Compile();
    }
 
    public static int GetCount(string value) => ChildGetCount(value);
}
 
public class ChildA : Parent<ChildA>
{
    private new static int GetCount(string value) => value.Count(c => c == 'a');
}
 
public class ChildB : Parent<ChildB>
{
    private new static int GetCount(string value) => value.Count(c => c == 'b');
}
 
public class ChildC : Parent<ChildC>
{
    private new static int GetCount(string value) => value.Count(c => c == 'c');
}

One of the most important parts of this whole thing is the generic base class that is constrained to itself.
This way, due to the way generics work, we can force a unique parent class for each child type. And each of those parents will have the information about what exact type the child is.
By using reflection, we can now search for a specific method in that child type, and store a reference to it in the childs unique parent.
To improve performance I use System.Linq.Expression to compile the MethodInfo into a Func, but that’s not necessary to make all of this work.
Finally, we add a static method in the parent, which simply redirects the call to the stored Func.
Another unnecessary detail: I used the same name for the method in the parent and child classes.
This was mostly for cohesion, but needing the new keyword is also a good reminder that you have or haven’t “overridden” the required method yet.

With the code above we can now do this:

const string value = "a ab abc";
 
int countA = ChildA.GetCount(value);//3
int countB = ChildB.GetCount(value);//2
int countC = ChildC.GetCount(value);//1

Which we could have done with plain static methods too, of course. But we can also do this:

const string value = "a ab abc";
 
int genericCountA = Parent<ChildA>.GetCount(value);//3
int genericCountB = Parent<ChildB>.GetCount(value);//2
int genericCountC = Parent<ChildC>.GetCount(value);//1

Which finally allows for things like these, which were the whole reason I needed all of this in the first place:

public static int GenericGetCount<T>(string value) where T : Parent<T>
    => Parent<T>.GetCount(value);

const string value = "a ab abc";
 
int genericCountA = GenericGetCount<ChildA>(value);//3
int genericCountB = GenericGetCount<ChildB>(value);//2
int genericCountC = GenericGetCount<ChildC>(value);//1

Downsides to this approach:

First of all, this needs a very specific generic structure. Which can not always be done, mostly because you might already use generics for other reasons. At least not without some additional complications.

Secondly, the compiler doesn’t verify anything. You can add additional checks in the static initializer, and a simple test framework can verify if all children are working correctly. But it’s still noway near the level of compiler enforced security that you get with proper inheritance. It also requires more awareness on the side of the developer, making it less comfortable to work with.

Lastly, the performance impact isn’t something you should blindly ignore. Reflection is still relatively costly. And the generic base class means we have a lot of additional classes once compiled (one for each child). Nothing too disruptive, but certainly something to keep in mind.

Tidbits

Coding problems based on real scenarios

Posted on May 31, 2022 by Wubbi

Interesting problems I encountered while coding for my job.
These are simplified of course, so you get only the essential “challenge”.

I had two sequences of values, similar to enumerations, for which I needed to know the differences. In my specific case, just the amount of unique items in each was enough. The items in both sequences were provided in ascending order. The amount of items can differ between both and even be 0.

(long OnlyA, long OnlyB, long Both) CountDifferences<T>(IEnumerator<T> a, IEnumerator<T> b) where T : IComparable<T>

As an example:
a = [1, 2, 3, 5] b = [1, 4, 5] => OnlyA=2, OnlyB=1, Both=2

The tricky part here was: The sequences were long. Very long. Just storing them in some data structure would take too many resources. I had to compare the two without storing the values and without a second look at any item.

I got a randomly ordered list of pairs of integers. What was required was a list of all those integers, ordered in such a way that the order of each pair still holds up. Which can of course only work if there are no cycles, so I had to catch that scenario if the data was corrupted and throw an Exception. The specific order didn’t matter, if there were multiple solutions.

int[] FlattenPairedOrder((int before, int after)[] individualOrders)

Example:

[(1, 2), (2, 3), (9, 4), (5, 7)] => [1, 2, 3, 5, 7, 9, 4]

In case you’re wondering: These pairs did in fact describe directed edges between nodes.

The input was a sequence of integers in ascending order without duplicates. I needed to split this sequence into chunks of continuous integers, meaning split at every point where two neighbors are more than one apart. Additionally, each block may not exceed a predefined maximum of numbers.

IEnumerable<(int From, int To)> SplitInBlocks(IEnumerator<int> individual, int maxBlockSize)

Example:

individual = [-1, 1, 2, 4, 5, 6, 7, 8, 10] maxBlockSize = 3 => [(-1, -1), (1, 2), (4, 6), (7, 8), (10, 10)]

If this doesn’t seem like much of a challenge, it’s because it isn’t. The part that made this stick in my mind was not the overall “how do I solve this problem”, but the specific “how do I translate that in code”.

We had a situation where a large set of data contained some corruption. We didn’t know what kind, or where, or how much. All we had was the info that a certain operation, which takes all the data at once, failed. Splitting the data apart was easy, but testing for the problem could only be done by passing it into a method and see if it fails. And that method unfortunately takes quite a bit of time, regardless of how much data is passed into it. The goal now was to find all defective parts of the data while keeping the number of checks to a minimum.

List<int> FindCorruptValues(List<int> values, Func<List<int>, bool> hasCorruptData)

Example:

values = [1, 4, 5, 10, 20, 22, 999, 1000] hasCorruptData = v => v.Any(i => i > 10 && i < 100) => [20, 22]

The obvious solution would be a form of binary search, just keep in mind that there can be several corrupt values, not just one!

This last one isn’t exactly an abstract issue, but it is an interesting problem that I had to solve at one time. We have a system that has to trigger certain actions from time to time. To control when exactly this happens, we use cron syntax. Which is essentially a filter for giving you a yes or no to a specific time. What I needed know however, was how long the system is going to sleep before triggering again.

This one won’t get any examples, I think it’s pretty clear what to expect.
I recommend you create your own cron implementation before solving the problem. The syntax isn’t hard to parse, and it will help you greatly in understanding exactly what values and combinations are possible. Also, try to find a solution other than simply checking every minute until one matches.

Tidbits

Semaphore implementation with adjustable concurrency limit

Posted on February 23, 2022 by Wubbi

A Semaphore can be used to block threads from continuing execution while another thread uses specific code or resources. But unlike a lock or Monitor, they can allow more than one thread access at a time.
For that purpose the existing implementation receives a limit when initializing, which represents the maximum amount of threads that may “enter”, before any thread has to wait. This limit cannot be changed afterwards anymore. A feature I recently needed.
My solution was my own Semaphore class, that mimics the behavior, but without worrying about the details about how to safely and efficiently block a thread until the counter ticks down.

/// <summary>
/// A simple alternative to <see cref="Semaphore"/> that allows for changes to the thread limit
/// </summary>
public class VariableLimitSemaphore : IDisposable
{
    private readonly EventWaitHandle _waitHandle;
    private readonly object _entryLock;
    private readonly object _counterLock;
    private int _limit;
    private int _counter;
 
    /// <summary>
    /// The current amount of threads that have been granted entry
    /// </summary>
    public int CurrentCounter
    {
        get
        {
            lock (_counterLock)
                return _counter;
        }
    }
 
    /// <summary>
    /// The maximum number of threads allowed entry
    /// </summary>
    public int Limit
    {
        get
        {
            lock (_counterLock)
                return _limit;
        }
        set
        {
            if (value < 1)
                throw new ArgumentOutOfRangeException(nameof(value));
 
            lock (_counterLock)
            {
                _limit = value;
                if (_limit <= _counter)
                    _waitHandle.Reset();
            }
        }
    }
 
    /// <summary>
    /// Creates a new <see cref="VariableLimitSemaphore"/>
    /// </summary>
    /// <param name="initialLimit">The initial limit for concurrent threads</param>
    /// <exception cref="ArgumentOutOfRangeException"><paramref name="initialLimit"/> is less than 1</exception>
    public VariableLimitSemaphore(int initialLimit)
    {
        if (initialLimit < 1)
            throw new ArgumentOutOfRangeException(nameof(initialLimit));
 
        _limit = initialLimit;
        _counter = 0;
 
        _waitHandle = new EventWaitHandle(true, EventResetMode.AutoReset);
        _entryLock = new object();
        _counterLock = new object();
    }
 
    /// <summary>
    /// Blocks the current thread until entry is permitted
    /// </summary>
    public void Wait()
    {
        lock (_entryLock)
        {
            _waitHandle.WaitOne();
            lock (_counterLock)
            {
                if (++_counter < _limit)
                    _waitHandle.Set();
            }
        }
    }
 
    /// <summary>
    /// Frees up a single entry for use by another (waiting) thread
    /// </summary>
    public void Release()
    {
        lock (_counterLock)
        {
            if (--_counter < _limit)
                _waitHandle.Set();
        }
    }
 
    /// <inheritdoc />
    public void Dispose()
    {
        _waitHandle?.Dispose();
    }
}

Core of this is an EventWaitHandle, which is used to let threads pass in a controlled manner, one by one. Each time a thread “enters” it increments the counter, compares it to the limit, and lets another thread enter if the limit allows it.
Similarly, when a thread “leaves”, the counter is decremented, compared with the limit and another thread might be granted access.
The only tricky part was eliminating race conditions. For example, if I didn’t use _entryLock to queue up threads even before the WaitHandle, threads could enter, pause before incrementing the counter and that way make another thread that is leaving believe, that there is extra space for yet another thread.

Things you might want to change are overloads for Wait, to include cancellation or timeouts.
Also, this implementation starts accepting threads as soon as it’s initialized, but could easily be extended for more options.
Furthermore, if you have a look at the System.Threading.Semphore source code, you’ll notice a lot of work to make sure that thing runs reliably. A lot of consideration that I didn’t put into this. This is enough for my simple use cases, but I wouldn’t trust it with critical code!

QueuePacked

QueuePacked

Programming and Software Development

Author: Wubbi

Version from Directory.Build.prop in .NET Framework projects

Understanding Unicode encoding

UTF-32

UTF-8

UTF-16

My method for static methods on generic type parameters

Coding problems based on real scenarios

Semaphore implementation with adjustable concurrency limit