List

Posted on January 19, 2020June 18, 2020 by Wubbi

The following references the implementation of System.Collections.Generic.List<T> in the .NET Framework 4.8, as it is shown here: https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,cf7f4095e4de7646
The code below is simplified to show the ideas. No verification, exception handling, optimizations, etc. In other words: unsafe.
Please do not use in your project as is.

What problem is List trying to solve?
Basically, the fixed size of arrays.
To do so, List uses an array to store data and offers convenient methods to copy it into a larger arrays, should the old one not have enough space anymore. Simple, right?
Right.
No, really. There’s no catch here. It really is that simple.

However, that doesn’t mean you can’t benefit from knowing some of the implementation details!
So let’s create our own version of List and try to see how the “official” version handles different problems.

We’re going to start with a naive approach:

public class List<T>
{
    private T[] _container;
 
    public T this[int index]
    {
        get => _container[index];
        set => _container[index] = value;
    }
 
    public int Size => _container.Length;
 
    public List()
    {
        _container = new T[0];
    }
 
    public void Add(T item)
    {
        T[] newContainer = new T[_container.Length + 1];
 
        Array.Copy(_container, newContainer, _container.Length);
 
        newContainer[_container.Length] = item;
 
        _container = newContainer;
    }
}

Technically it works, but allocating space for a new array takes time.
Since we’re probably going to add quite a few items over the lifetime of this List, we should always allocate a bit more space than needed, so we have a buffer that can fill up instead without bothering the memory so much.

This is the first detail you might be interested in:
How much extra space do we allocate?
If we take too much, we’re wasting memory.
If we take too little, we might need to allocate space again, wasting time (and for a brief time occupy all the old and new memory).

In a specific use case you might be able to predict the usage of your List, in which case you can benefit from your own implementation.
For a general purpose library however, there is no ideal answer.

The Framework has a rather straightforward solution:
Need more space? Double what you have!

public class List<T>
{
    private T[] _container;
 
    private int _usedSpace;
 
    public T this[int index]
    {
        get => _container[index];
        set => _container[index] = value;
    }
 
    public int Size => _usedSpace;
 
    public List()
    {
        _container = new T[0];
        _usedSpace = 0;
    }
 
    public void Add(T item)
    {
        if (_usedSpace < _container.Length)
        {
            _container[_usedSpace++] = item;
            return;
        }
 
        T[] newContainer = new T[_usedSpace < 1 ? 1 : _usedSpace * 2];
 
        Array.Copy(_container, newContainer, _container.Length);
 
        newContainer[_usedSpace++] = item;
 
        _container = newContainer;
    }
}

That’s it. A new field is needed to keep track of how much of the array is actually used, and at which index the buffer starts.
Don’t forget our List might contain 0 items. Add a minimum initial size for adding items!
(The Framework uses a default of 4 items, by the way)

On a different note, did you notice how we move the data between arrays?
The Framework uses the same approach. Pretty much every operation is using Array.Copy.

One more thing that doesn’t come up often:
How do we handle adding/removing several items at once?
The official List has two (and a half) methods that accept IEnumerable. What happens to those?

The important part first: The default approach is to enumerate through all items and process each as it’s own thing. This way isn’t exactly efficient, but in some cases there is no other way (when you can’t enumerate again and don’t know how many additions you’re going to make).

However, if your IEnumerable is actually a type of ICollection, things get easier. Now we have access to Count, which means we can plan ahead. At most we need to create a bigger array once and can use a single Array.Copy to transfer the data.

public void Add(T item)
{
    IncreaseSize(1);
 
    _container[_usedSpace++] = item;
}
 
public void Add(IEnumerable<T> items)
{
    if (items is ICollection<T> collection)
    {
        if (collection.Count == 0)
            return;
 
        IncreaseSize(collection.Count);
 
        collection.CopyTo(_container, _usedSpace);
 
        _usedSpace += collection.Count;
    }
    else
    {
        foreach (T item in items)
            Add(item);
    }
}
 
private void IncreaseSize(int additionalSpaceNeeded)
{
    if (_usedSpace + additionalSpaceNeeded <= _container.Length)
        return;
 
    int newSize = _container.Length * 2;
 
    if (_usedSpace + additionalSpaceNeeded > newSize)
        newSize = _usedSpace + additionalSpaceNeeded;
 
    T[] newContainer = new T[newSize];
 
    Array.Copy(_container, newContainer, _container.Length);
 
    _container = newContainer;
}

As you can see, we’ve added an overload for Add that accepts an IEnumerable, as well as a dedicated method for increasing our container space (to avoid duplicate code).

Inserting items at specific positions or removing them is basically just moving the values inside the container back and forth using Array.Copy, nothing fancy to see here.

And with that you already have a simple List.
The official version has of course more than this.
That means on one side more care for special cases, like the size growing beyond the limit of an int. On the other it includes fancy methods for searching and sorting – which you might find are being delegated to Array, since under the hood you’re using one anyway.

Conclusion:
The List implementation is rather straightforward. No big surprises or mind benders.
Personally, what i took away from it are the growth rate, and how it can pay off to use the TrimExcess method to keep it in check, as well as a renewed confidence in AddRange.

Tidbits

Semantic Versioning is amazing!

Posted on November 28, 2019June 28, 2020 by Wubbi

This post will be nothing but me repeating what’s already beeing said on semver.org, so have a look over there. If you want me to tell you how incredibly useful it is and why you definitely (maybe) should use it, you can still come back later to get a quick reminder.

Sematic Versioning is a set of simple rules that tell you how to change the version of whatever library you’re developing, depending on the kind of changes you’re making. Though while libraries are where it shines, i did find myself using it for fully fledged programs before, in sligthly adapted forms (but for similar reasons).
To explain these rules we’re going to use an example: A calculator.
Our initial code might look like this

public static class Calc
{
    public static int Add(int a, int b)
    {
        return a + b;
    }
 
    public static int Sub(int a, int b)
    {
        return a + b;
    }
}

Nothing but a static class with two static methods for addition and substraction.
Now, with Semantic Versioning, this might be our version 1.0.0 (we’re going to ignore suffixes for now). Assume this to be the state you’re releasing your library in, which will now get used by people and integrated into their own projects.

Even if “people” includes only you yourself, proper versioning can still prevent a lot of frustration. Having more people involved makes it only more useful.

But what’s this? Your substraction performs an addition instead? No time to figure out why there was no automated test to catch this obvious error before. We need to fix this and release a newer version!

public static class Calc
{
    public static int Add(int a, int b)
    {
        return a + b;
    }
 
    public static int Sub(int a, int b)
    {
        return a - b;
    }
}

There, much better. Now we need to make this public in a way that shows others, that there was a change – we need to update our version.
While our change had a huge impact on what our library can do, it doesn’t require anyone to change how they use it. It’s a fix to restore behaviour to what was expected already, meaning anybody using our library will have included it in a way matching our update. Therefore we make the least significant change to our version number possible: 1.0.1
Simple, right?

After some time passed, and more people play arround with your library, you might get a request for additional features. And you’re happy to add them, beeing unsatisfied with your meager two methods yourself.

public static class Calc
{
    public static int Add(int a, int b)
    {
        return a + b;
    }
 
    public static int Sub(int a, int b)
    {
        return a - b;
    }
 
    public static int Mul(int a, int b)
    {
        return a * b;
    }
 
    public static int Div(int a, int b)
    {
        return a / b;
    }
}

Wow, two whole new features. People are gonna love this! Of course we need to update the version, too. This one’s a bit different though:
To make use of the new features, someone using your library needs to make changes to their own code as well. So we need to mark this update as something bigger than just a fix: 1.1.0

And why stop here? There are plenty of operations waiting to be added! Modulo? 1.2.0. Sinus? 1.3.0. Messed up the sinus and made a fix for it? 1.3.1.

At some point however, you’ll look back at your library and notice it blowing up in the wrong way. You can simplify it, maybe add options for custom operations, overall clean up your code. Time to refactor!
And you start by unifying all those methods from before.

public static class Calc
{
    public static int Calculate(IOperation operation, int a, int b)
    {
        return operation.GetResult(a, b);
    }
}
 
public interface IOperation
{
    int GetResult(params int[] values);
}

Looks much better than all those static methods. Now people can write their own operations!
Unfortunately, unlike the changes we made before, this one can actually break whatever someone using your library made. That Add-method that they called before? It doesn’t exist anymore. Even if they get arround the compiler by loading your files at runtime, it will fail eventually.
To make sure they know to rewrite their things, even if they don’t plan to use any new features, we need to make a big change when updating our version: 2.0.0
With this, people can see they absolutely need to rewrite parts of their projects (there are exceptions of course, but those still require them to at least verify compatibility).

By the way, in a situation like this, where you retire parts of your code and add replacements, it’s a good idea to make a hybrid version first (if possible). This way people can quickly grab the newest update to your library without it causing issues, but it also gives them the new code they then can safely migrate to in the time it takes for your next release. Just make sure to mark your code as deprecated!

public static class Calc
{
    [Obsolete]
    public static int Add(int a, int b)
    {
        return a + b;
    }
 
    [Obsolete]
    public static int Sub(int a, int b)
    {
        return a - b;
    }
 
    [Obsolete]
    public static int Mul(int a, int b)
    {
        return a * b;
    }
 
    [Obsolete]
    public static int Div(int a, int b)
    {
        return a / b;
    }
 
    public static int Calculate(IOperation operation, int a, int b)
    {
        return operation.GetResult(a, b);
    }
}
 
public interface IOperation
{
    int GetResult(params int[] values);
}

All in all, with the most basic interpretation of Semantic Versioning, changes in the version (in the format Major.Minor.Patch) reflect these changes to the code:

Patch – Bugfixes, optimzation, etc.
None of what you did requires action by someone using your library.

Minor – New features
Your old code does the same as always, but new things are available. Users only need to make changes to their code if they want to use any of it.

Major – Removed or changed functionality
You removed things that people used or had direct access to before, or something changed the way it behaves.
Existing code (that depends on your library) requires rewrites to work with these changes.

Something i didn’t mention at all before:
Additional markers to differentiate various stages in your development cycle may be added to the version number. For example 1.0.0-beta might sound like a full release going by the numbers alone, but the suffix tells us it’s not. Instead your first release might be 1.0.1-release.
Basically, since the numbers tell us how the code changed, but not the more abstract state of your library as a whole, we can add some better qualifiers in the end. Suffixes can also contain build numbers, should you prefer to include them.

To close this off, let us have look at Semantic Versioning from the other side:

Imagine yourself implementing a library into your own project. Everything worked, you’re happy with it as is and you stop thinking about it. At some point you encounter a bug and want to update to a newer version. But several releases happend since you last checked. Which one do you use, assuming you want to release an update of your own as soon as possible?
You could read through the changelogs (which is a good idea anyway) to find out what version is still compatible with your code. Timeconsuming and not necessarily reliable.
But if they use Semantic Versioning it’s easy – just take the newest version that still falls into the Major group you’re currently using. Done.
You can still look at the changelogs up to that version, and maybe you’ll find interesting features you want to try. If there are Major releases you can also start planning the migration, while already releasing an updated version of your own project.

Semantic Versioning becomes even more important if you want to automate updates (to a certain degree). What if the library isn’t added before compiling, while you still have the control, but sometime later, when it is used by someone else? This might be the case if it is loaded dynamically at runtime. Or we might be talking about NuGet.

When you create a NuGet package which itself depends on other NuGet packages, you need to make sure whoever installs it has the right version of your dependencies.
Of course you could require them to use a very specific version only. In that case, if there is an update of the package you depend upon, you’d need to make a new version of your own package as well to reflect the newer dependency, even if you didn’t change anything in your code.
The better solution is to allow a range of versions, including ones that haven’t been released yet. The most simple case would be allowing versions starting at the one you used, up to any bigger version in the same Major group.

So that’s Semantic Versioning. Pretty useful, isn’t it? And you can apply the idea to other things as well.
What if your program allows plugins? You can leave the programs own version in whatever format you like, but give the interface a version that plugins can read and evaluate, to see if they still work.
Or maybe add a semantic version to your custom file format, so different versions of a program know whether or not they can read from or write to a file.
You could even apply it to a fully fledged program, where it is intended to inform the enduser about the scope of changes: Bugfix? Added any new features? Any menus gone?

Just be creative!

Uncategorized

Let’s give it a try, i guess?

Posted on November 23, 2019March 24, 2020 by Wubbi

The temptation was too big, the entry barrier too low. So….

I always come across interesting things when tinkering with my little projects, or when a certain video hosting platform recommends yet another talk from whatever conference was held the week before.

And whenever i find something new to play with, i want to share it with others. Tell them how much i enjoy my new toy.

Unfortunately the chances to talk about this kind of stuff are surprisingly rare. Pestering random strangers (or people i care about) with my ramblings is not the way to go…..
…unless those random strangers are on the internet. No need to “read the room” if there is no room!

In any case, i’m hoping to write down some of my thoughts and findings on this site. For now it’s more of an experiment to see how much time i’d actually invest in it.

QueuePacked

QueuePacked

Programming and Software Development

Author: Wubbi

Semantic Versioning is amazing!

Let’s give it a try, i guess?