C# LINQ - Under the hood

C# LINQ - Under the hood

LINQ is a powerful tool. To use it at its full potential, understanding how LINQ works under the hood is necessary.

Introduction to LINQ

LINQ stands for Language Integrated Query. LINQ enables the use of SQL Query Like syntax on data collections in C#. Filtering, projection, sorting, etc can be done using the query syntax (that looks a lot like SQL) or the method syntax. LINQ was introduced in .NET 3.5 with the System.LINQ namespace.

A dive into LINQ

Do all C# collection types support LINQ?

Any collection that implements IEnumerable such as List, Array, Dictionary... These collections reside inside the System.Collection.Generic namespace.

Knowing some specifics about the different collections might be helpful. For instance, List implements IEnumerable and ICollection, ICollection allows you to add, remove and find objects in a list.

Performance considerations with LINQ

LINQ uses lazy evaluation. Let's take a look at the following example:

public class Program
{
    public int ExecCount { get; set; } = 0;
    public static void Main()
    {
        Program program = new Program();
        program.Execute();
    }

    public void Execute()
    {
        IEnumerable<int> numbers = new List<int>(){1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        var evenNumbers = numbers.Where(isDivisibleByTwo);

        Console.WriteLine($"1. isDivisibleByTwo is executed {ExecCount} times.");

        ExecCount = 0;
        int first = evenNumbers.First();
        Console.WriteLine($"2. isDivisibleByTwo is executed {ExecCount} times, First value is {first}.");

        ExecCount = 0;
        IEnumerable<int> threeEvenNumbers = evenNumbers.Take(3);
        Console.WriteLine($"3.a. isDivisibleByTwo is executed {ExecCount} times.");
        IList<int> evaluatedThreeEvenNumbers = threeEvenNumbers.ToList();
        Console.WriteLine($"3.b. isDivisibleByTwo is executed {ExecCount} times, evaluatedThreeEvenNumbers count is {evaluatedThreeEvenNumbers.Count}.");

        ExecCount = 0;
        IList<int> evaluatedEvenNumbers = evenNumbers.ToList();
        Console.WriteLine($"4. isDivisibleByTwo is executed {ExecCount} times, evaluatedEvenNumbers count is {evaluatedEvenNumbers.Count}.");
    }

    public bool isDivisibleByTwo(int number)
    {
        ExecCount++;
        return number % 2 == 0;
    }
}

// The output of this program is:
// 1. isDivisibleByTwo is executed 0 times.
// 2. isDivisibleByTwo is executed 2 times, First value is 2.
// 3.a. isDivisibleByTwo is executed 0 times.
// 3.b. isDivisibleByTwo is executed 6 times, evaluatedThreeEvenNumbers count is 3.
// 4. isDivisibleByTwo is executed 10 times, evaluatedEvenNumbers count is 5.

The code above takes a list of numbers (1 to 10) and defines a query that filters the even numbers. The predicate used to filter this list is called isDivisibleByTwo.

1. isDivisibleByTwo is executed 0 times.:the predicate was not executed at all because there is no need to (no statement needed data to be extracted).

2. isDivisibleByTwo is executed 2 times, First value is 2.:the usage of the LINQ method First() requires one item to be extracted from the query, the first even number is second in the numbers list, that's why the predicated was executed twice.

3.a. isDivisibleByTwo is executed 0 times.:the usage of the LINQ method Take() doesn't require any predicate execution but using ToList() evaluates the query and needed six total predicate executions to find the three even numbers, hence 3.b. isDivisibleByTwo is executed 6 times, evaluatedThreeEvenNumbers count is 3.

4. isDivisibleByTwo is executed 10 times, evaluatedEvenNumbers count is 5.: ToList() re-evaluates the whole query.

Tip: you should avoid using ToList() and ToArray() when possible to affect the result of the LINQ query directly to a list or array as these methods evaluate the whole query.

Any special cases?

Using Count() would generally evaluate the whole query to know the item count, however, it is optimized to use ICollection.Count internally if the collection is an ICollection.

There is no point in listing all the LINQ methods here, it is something that should be acquired with practice.

How LINQ is implemented internally

Knowing how LINQ works internally is not something Microsoft had in mind, developers should express their intent with the queries and LINQ guarantees a certain level of optimization.

⚠️ This is a popular interview question!!!

A return statement inside a for-loop or foreach-loop exits (or breaks) the loop, however, using the keyword "yield" in yield return ... one new item is returned each time the loop is entered. Below is the previous example tweaked to use a custom definition of the Where method:

public class Program
{
    public int ExecCount { get; set; } = 0;
    public static void Main()
    {
        Program program = new Program();
        program.Execute();
    }

    public void Execute()
    {
        IEnumerable<int> numbers = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        var evenNumbers = numbers.customWhere(isDivisibleByTwo);

        Console.WriteLine($"1. isDivisibleByTwo is executed {ExecCount} times.");

        ExecCount = 0;
        int first = evenNumbers.First();
        Console.WriteLine($"2. isDivisibleByTwo is executed {ExecCount} times, First value is {first}.");

        ExecCount = 0;
        IEnumerable<int> threeEvenNumbers = evenNumbers.Take(3);
        Console.WriteLine($"3.a. isDivisibleByTwo is executed {ExecCount} times.");
        IList<int> evaluatedThreeEvenNumbers = threeEvenNumbers.ToList();
        Console.WriteLine($"3.b. isDivisibleByTwo is executed {ExecCount} times, evaluatedThreeEvenNumbers count is {evaluatedThreeEvenNumbers.Count}.");

        ExecCount = 0;
        IList<int> evaluatedEvenNumbers = evenNumbers.ToList();
        Console.WriteLine($"4. isDivisibleByTwo is executed {ExecCount} times, evaluatedEvenNumbers count is {evaluatedEvenNumbers.Count}.");
    }

    public bool isDivisibleByTwo(int number)
    {
        ExecCount++;
        return number % 2 == 0;
    }
}

public static class LinqExtensions
{
    public static IEnumerable<T> customWhere<T>(this IEnumerable<T> sourceCollection, Predicate<T> predicate)
    {
        foreach (T item in sourceCollection)
            if (predicate(item))
                yield return item;
    }
}

LINQ methods are extension methods to System.Collection.Generics.

Parallel LINQ or PLINQ

As the name suggests, PLINQ allows for parallel query execution and is a great feature to use for performance-focused projects. This MSDN article provides a great starting point to learn how to use PLINQ.

Final words

LINQ is a strong selling point for C#, I consider it a GOAT feature. It also explains why C# developer interviews focus on this subject.

Thanks for reading!

I hope this article was a one-stop shop for you to start using LINQ comfortably. Feel free to add a Reaction, comment and subscribe to the newsletter for more similar content!

Did you find this article valuable?

Support Sami Mejri by becoming a sponsor. Any amount is appreciated!