Explained

List

The following references the implementation of System.Collections.Generic.List<T> in the .NET Framework 4.8, as it is shown here: https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,cf7f4095e4de7646
The code below is simplified to show the ideas. No verification, exception handling, optimizations, etc. In other words: unsafe.
Please do not use in your project as is.


What problem is List trying to solve?
Basically, the fixed size of arrays.
To do so, List uses an array to store data and offers convenient methods to copy it into a larger arrays, should the old one not have enough space anymore. Simple, right?
Right.
No, really. There’s no catch here. It really is that simple.

However, that doesn’t mean you can’t benefit from knowing some of the implementation details!
So let’s create our own version of List and try to see how the “official” version handles different problems.


We’re going to start with a naive approach:

public class List<T>
{
    private T[] _container;
 
    public T this[int index]
    {
        get => _container[index];
        set => _container[index] = value;
    }
 
    public int Size => _container.Length;
 
    public List()
    {
        _container = new T[0];
    }
 
    public void Add(T item)
    {
        T[] newContainer = new T[_container.Length + 1];
 
        Array.Copy(_container, newContainer, _container.Length);
 
        newContainer[_container.Length] = item;
 
        _container = newContainer;
    }
}

Technically it works, but allocating space for a new array takes time.
Since we’re probably going to add quite a few items over the lifetime of this List, we should always allocate a bit more space than needed, so we have a buffer that can fill up instead without bothering the memory so much.

This is the first detail you might be interested in:
How much extra space do we allocate?
If we take too much, we’re wasting memory.
If we take too little, we might need to allocate space again, wasting time (and for a brief time occupy all the old and new memory).

In a specific use case you might be able to predict the usage of your List, in which case you can benefit from your own implementation.
For a general purpose library however, there is no ideal answer.

The Framework has a rather straightforward solution:
Need more space? Double what you have!

public class List<T>
{
    private T[] _container;
 
    private int _usedSpace;
 
    public T this[int index]
    {
        get => _container[index];
        set => _container[index] = value;
    }
 
    public int Size => _usedSpace;
 
    public List()
    {
        _container = new T[0];
        _usedSpace = 0;
    }
 
    public void Add(T item)
    {
        if (_usedSpace < _container.Length)
        {
            _container[_usedSpace++] = item;
            return;
        }
 
        T[] newContainer = new T[_usedSpace < 1 ? 1 : _usedSpace * 2];
 
        Array.Copy(_container, newContainer, _container.Length);
 
        newContainer[_usedSpace++] = item;
 
        _container = newContainer;
    }
}

That’s it. A new field is needed to keep track of how much of the array is actually used, and at which index the buffer starts.
Don’t forget our List might contain 0 items. Add a minimum initial size for adding items!
(The Framework uses a default of 4 items, by the way)

On a different note, did you notice how we move the data between arrays?
The Framework uses the same approach. Pretty much every operation is using Array.Copy.


One more thing that doesn’t come up often:
How do we handle adding/removing several items at once?
The official List has two (and a half) methods that accept IEnumerable. What happens to those?

The important part first: The default approach is to enumerate through all items and process each as it’s own thing. This way isn’t exactly efficient, but in some cases there is no other way (when you can’t enumerate again and don’t know how many additions you’re going to make).

However, if your IEnumerable is actually a type of ICollection, things get easier. Now we have access to Count, which means we can plan ahead. At most we need to create a bigger array once and can use a single Array.Copy to transfer the data.

public void Add(T item)
{
    IncreaseSize(1);
 
    _container[_usedSpace++] = item;
}
 
public void Add(IEnumerable<Titems)
{
    if (items is ICollection<T> collection)
    {
        if (collection.Count == 0)
            return;
 
        IncreaseSize(collection.Count);
 
        collection.CopyTo(_container_usedSpace);
 
        _usedSpace += collection.Count;
    }
    else
    {
        foreach (T item in items)
            Add(item);
    }
}
 
private void IncreaseSize(int additionalSpaceNeeded)
{
    if (_usedSpace + additionalSpaceNeeded <= _container.Length)
        return;
 
    int newSize = _container.Length * 2;
 
    if (_usedSpace + additionalSpaceNeeded > newSize)
        newSize = _usedSpace + additionalSpaceNeeded;
 
    T[] newContainer = new T[newSize];
 
    Array.Copy(_container, newContainer, _container.Length);
 
    _container = newContainer;
}

As you can see, we’ve added an overload for Add that accepts an IEnumerable, as well as a dedicated method for increasing our container space (to avoid duplicate code).


Inserting items at specific positions or removing them is basically just moving the values inside the container back and forth using Array.Copy, nothing fancy to see here.

And with that you already have a simple List.
The official version has of course more than this.
That means on one side more care for special cases, like the size growing beyond the limit of an int. On the other it includes fancy methods for searching and sorting – which you might find are being delegated to Array, since under the hood you’re using one anyway.


Conclusion:
The List implementation is rather straightforward. No big surprises or mind benders.
Personally, what i took away from it are the growth rate, and how it can pay off to use the TrimExcess method to keep it in check, as well as a renewed confidence in AddRange.

Leave a comment