jehugaleahsa/LTT.md

## LTT.md

      
    Raw
  

              LTT.md
            
          
    Learn to Love LINQ

I've been using LINQ for years and I've come to deeply respect it as a technology. Unfortunately it has not been adopted by the .NET community as much as I would have expected. I honestly believe it is the single one feature that gives C# a competitive edge over many of the other general-purpose languages out there today. This document will go over some of the tricks I have learned to make the most of LINQ's powerful features. Hopefully you'll learn at least one trick along the way.
Query vs Method syntax

Yup, you can write your queries as a series of method calls or you can use the query syntax. Method syntax is fine when you're first starting out, but query syntax is more expressive in the long run. My experience has been that operations such as where, select, join, group by and order by should be expressed using query syntax and "reduce" operations should be done in method syntax. Unavoidably, you sometimes have to use method calls inside of your queries simply because there's no equivalent query syntax. When you finally need to generate a result or jump back into boring imperative programming style, you must use a LINQ method call (or using a foreach) to get your results back.
I prefer to write my LINQ like this:
var query = from customer in context.Customers
            where customer.CustomerId == CustomerId
            select customer;
var result = query.SingleOrDefault();
return result;
rather than like this:
var customer = context.Customers
    .Where(c => c.CustomerId == customerId)
    .SingleOrDefault();
return customer;
Some people don't like to switch to query syntax until the query gets past a cetain length. I like to start out with query syntax and only use method calls at the end, no matter how much more typing it involves. Query syntax is often more verbose but also more expressive and readable. It also makes it easier to transition an initially simplistic query into something more complicated.
Use multiples wheres to break up long conditionals

Another controversial practice is using separate where clauses rather than using && everywhere. First of all, combining conditionals with && usually results in long lines of code. With LINQ's already heavily indented syntax, keeping filters short keeps the code readable. Moreover, it emphasizes the declarative/functional style of programming: applying filter/map/reduce to a stream. Unfortunately, there are plenty StackOverflow posts about the minute performance differences between the two and apparently a billionth of a second justifies using &&. Pfft. I still prefer:
var query = from customer in customers
            where customer.IsActive
            where customer.ActivationDate >= new DateTime(2017, 01, 01)
            select customer;
over this:
var query = from customer in customers
            where customer.IsActive && customer.ActivationDate >= new DateTime(2017, 01, 01)
            select customer;
Unfortunately this syntactic sugar doesn't work if you need to || your conditions. Nonetheless, I don't consider it a break down in symmetry, just an annoyance.
Use where anywhere

In SQL, your queries must always take the form of a SELECT list, followed by FROM, some JOINs, a WHERE clause, GROUP BY, HAVING, ORDER BY and so on. There's not a lot flexibility in the order that those sections can appear. LINQ is very flexible and you can you have where clauses just about anywhere. Not having to wait until the end of the query actually helps you to avoid forgetting conditions. Consider:
var query = from customer in customers
            where customer.CustomerId == 123
            from order in customer.Orders
            where order.IsOpen
            select order.TotalAmount;
It would be really easy to forget to filter by the customer ID, here. Once your mind switches over to thinking about "orders" you aren't thinking about the "customer" anymore. With this approach, you can perform all your operations on the customer before moving on to orders. It's such a small thing but I'm sure it's saved me from hundreds of silly mistakes.
Uniqueness on multiple keys

Have you ever wanted to filter out duplicates using a property of your type? By default, LINQ's Distinct will use the default equality comparison for a type (using Equals and GetHashCode). Creating a custom IEqualityComparer<User> is a pain, but what else can you do? Try this:
var query = from user in users
            group user by user.UserName into unGroup
            select unGroup.FirstOrDefault();
This trick basically groups the users by UserName and takes the first item, eliminating any duplicates. You can add an orderby if you have a preference over which user is returned. The same trick is really useful when you need to determine uniqueness on multiple fields. The version below not only groups by two fields, but prefers active users over inactive ones:
var query = from user in users
            group user by new { Domain = user.Domain, UserName = user.UserName } into unGroup
            let orderedUsers = from user in unGroup
                               orderby user.IsActive descending
                               select user
            select orderedUsers.FirstOrDefault();
Using into and let

Notice that my previous example used the into keyword. This allowed me to name the results of the group by operation so I could continue to operate on the group (singular) further down the line (via order by). Similarly, the let keyword will also let you name a result so you can reference it later. A common need within queries is to see if any related entities satisfy a condition:
var query = from customer in customers
            let openOrders = from order in customer.Orders
                             where order.IsOpen
                             select order
            let hasOpenOrders = openOrders.Any()
            where hasOpenOrders
            select customer;
This intimidating query simply retrieves all customers with open orders. The let keyword allowed storing the results of a sub-query so we could then inspect them further down in the query (via Any). The alternative would be to use parentheses:
var query = from customer in customers
            where (from order in customer.Orders
                   where order.IsOpen
                   select order).Any()
            select customer;
This is bizarre looking and is harder to read. Another nice thing about let is that it can help to avoid unnecessary processing. In the example below, the sub-query isn't evaluated unless it needs to be:
var query = from customer in customers
           let activeOrders = from order in customer.Orders
                              where order.IsOpen
                              select order
           where customer.IsActive || activeOrders.Any()
           select customer;
Even though activeOrders appears first, it will not be evaluated unless the customer is inactive.
DefaultIfEmpty and left-joins

There's a weird method in LINQ called DefaultIfEmpty. If you call it on a collection with items, it just returns the items. However, calling it on an empty collection results in a single value being returned. So,
new List<int>() { 1, 2, 3 }.DefaultIfEmpty(); // 1, 2, 3
new List<int>().DefaultIfEmpty(); // 0
You can provide a different "default" value if you want.
The main use of this method is to support "left joins". So, in SQL, a LEFT JOIN will return a row for each row in the right table matching a row in the left table; if it can't find a matching row in the right table, it just returns nulls for the columns in the right table. Therefore one or more rows are returned for every row in the left table. Entity framework abstracts away the fact that it is working against tabular data (relational data) and allows you to write queries in terms of objects. You don't have the combined columns from two tables; you have a pair of objects. Rather than null columns, you have a null object. Well, null is exactly what DefaultIfEmpty will return for reference types.
That's why this code works exactly like a LEFT JOIN:
var query = from customer in customers
            let orders = from order in orders
                         where order.CustomerId == customer.CustomerId
                         select order
            from order in orders.DefaultIfEmpty()
            select new { Customer = customer, Order = order };
Joins and alternatives

The more conventional way to write a left join is as follows:
var query = from customer in customers
            join order in orders on customer.CustomerId equals order.CustomerId into left
            from order in left.DefaultIfEmpty()
            select new { Customer = customer, Order = order };
When working with Entity Framework, you can often avoid explicit joins using navigation properties. If a Customer has an Orders navigation property, the following will generate the equivalent LEFT JOIN:
var query = from customer in customers
            from order in customer.Orders.DefaultIfEmpty()
            select new { Customer = customer, Order = order };
This example is a little contrived since you'd just use navigation properties rather than create pairs. Ignoring that, notice you still have to say DefaultIfEmpty so the inner from will iterate at least once. This is a significant redunction in the amount of code you need to write! The additional benefit is that it keeps the details about how entities are related limited to you entity configurations. This might, for example, allow you to convert a one-to-many relationship to a many-to-many relationship without modifying your LINQ queries.
You'd really want to implement the code above as simply:
customers.Include(c => c.Orders)
Include always at the end

If you are writing queries with Entity Framework, do yourself a favor and always put .Include lists at the end, especially if you are using the System.Data.Entity.QueryableExtensions extension method accepting an Expression<Func<TEntity, TProp>>. Internally, that overload of .Include will look at the underlying IQueryable and see if it has an .Include method accepting a string. If not, it just ignores the call. The real issue is that the shape of the final result must match the shape of the query when .Include is called. Operations further down the line might simply wipe previous .Include calls out. Keeping .Include near the end gives a nice consistency to your queries.
Avoiding null tests

How often do you write code like this?
private string getCustomerName(int customerId)
{
    var query = from customer in context.Customers
                where customer.CustomerId == customerId
                select customer;
    Customer result = query.SingleOrDefault();
    if (result == null)
    {
        return null;
    }
    else
    {
        return result.Name;
    }
}
That's a lot of code, even with the ?. operator. Check out this simplification:
var query = from customer in context.Customers
            where customer.CustomerId == customerId
            select customer.Name;
return query.SingleOrDefault();
This version maps a customer to its name before calling SingleOrDefault. This saves from checking if the customer is null before grabbing its name. Furthermore, if this is querying a database, the database only has to return a single column instead of an entire customer. Nice!
Any() && !Skip(1).Any()

The SingleOrDefault method is pretty awesome. I often see FirstOrDefault used exclusively and I think this is unfortunate. SingleOrDefault communicates the expectation that only one value will be found, such as when grabbing a database record by its primary key. Otherwise a rather nasty exception will be thrown. A similar situation is finding records with exactly one related value. This is how you could grab a customer with exactly one order:
var query = from customer in customers
            where customer.Orders.Any()  // at least one
            where !customer.Orders.Skip(1).Any()  // no more than one
            select customer;
Checking this way is particularly efficient when checking against lazily evaluated collections that can have thousands of records. While this is pretty neat trick, be careful not to overuse it. It can have a negative impact if used to generate SQL. Many of the LINQ operators are optimized to calculate Count() in constant time (like IGrouping), so it is some times just as efficient and way easier to read to simply check Count() == 1.
UniqueOrDefault

Building off the previous example, what if someone gave you a list and you needed to know if that list contained exactly one unique value? Imagine you were given a collection of messages. If all the messages were from the same person, you want to display their name; otherwise, you just want to list "multiple people". Here's an initial approach:
var names = people.Select(p => p.Name).Distinct().ToArray();
if (names.Count() == 0)
{
    return "nobody";
}
else if (names.Count() == 1)
{
    return names.Single();
}
else
{
    return "multiple people";
}
If people had 10,000 items in it, this could be really slow. Every person would need to be mapped to its Name and then passed through Distinct. The following implementation avoids enumerating the entire collection, but could potentially re-execute a SQL or other slow query multiple times (so it is probably an even worse implementation):
var names = people.Select(p => p.Name).Distinct();
if (!names.Any())  // forces evaluation the first time
{
    return "nobody";
}
string name = names.First();  // re-evaluate!
if (names.Skip(1).Any())  // re-evaluate again!
{
    return "multiple people";
}
return name;
Now consider this code that will jump out immediately if more than one value is found:
var names = people.Select(p => p.Name).Distinct();
using (IEnumerator<string> nameEnumerator = names.GetEnumerator())
{
    if (!nameEnumerator.MoveNext())
    {
        return "nobody";
    }
    string name = nameEnumerator.Current;
    if (nameEnumerator.MoveNext())
    {
        return "multiple people";
    }
    return name;
}
Working directly with the IEnumerator avoids potentially re-evaluating the enumerable but adds an uncomfortable amount of complexity. In case you are wondering, the Distinct method was written very intelligently so it doesn't search for the next unique value until it has to.
This final version only evaluates once, is less typing and supports checking for an arbitrary size:
var names = people.Select(p => p.Name).Distinct().Take(2).ToArray();  // desired amount + 1
if (names.Length == 0)
{
    return "nobody";
}
else if (names.Length == 1)
{
    return names.Single();
}
else
{
    return "multiple people";
}
It just goes to show it's still useful to know a little bit about algorithms even if you are using LINQ. What's humorous about this code is that it only performs efficiently in the case that there are multiple unique values. Although, it makes sense if you think about it, since the only way to verify uniqueness is to enumerate the entire collection. Here's a generic UniqueOrDefault method. Note it doesn't discern between "none" and "many":
public static T UniqueOrDefault<T>(this IEnumerable<T> source, T defaultValue = default(T), IEqualityComparer<T> comparer = null)
{
    if (comparer == null)
    {
        comparer = EqualityComparer<T>.Default;
    }
    var results = source.Distinct(comparer).Take(2).ToArray();
    return results.Length == 1 ? results[0] : defaultValue;
}
Enumerable.Repeat(x, 1) and Enumerable.Empty().DefaultIfEmpty(x)

I've always felt it was an omission that there's no way to create an enumerable from a list of values. Consider:
public static IEnumerable<T> Of<T>(params T[] values)
{
    foreach (T value in values)
    {
        yield return value;
    }
}
This is equivalent to just iterating over values, but slower, which might explain why it's not available in LINQ. Although, it would be useful to just say Enumerable.Of(0, 1, 2) rather than new int[] { 0, 1, 2 }. Fortunately, you can build up collections using Empty, DefaultIfEmpty, Repeat and/or Concat. Here are a couple ways to generate 0 though 2:
Enumerable.Empty<int>().DefaultIfEmpty(0).Concat(Enumerable.Empty<int>().DefaultIfEmpty(1)).Concat(Enumerable.Empty<int>().DefaultIfEmpty(2));
Enumerable.Repeat(0, 1).Concat(Enumerable.Repeat(1, 1)).Concat(Enumerable.Repeat(2, 1));
Enumerable.Repeat(0, 3).Select((i, ix) => i + ix);
Enumerable.Range(0, 3);
While this example demonstrates why you wouldn't want to use this pattern for long lists, it does show ways to represent single values as collections. Many elogant solutions involve using collections containing a single value.
Building up an IEnumerable

All of the LINQ operations are lazily evaluated. That means nothing will happen until you try to evaluate a result. IEnumerables and IQueryables are really building blocks for creating complex queries in pieces. Here's two ways to filter by an optional ID:
private static IEnumerable<Order> getOrders(int? customerId = null)
{
    IQueryable<Order> query = context.Orders;
    if (customerId != null)
    {
        query = query.Where(o => o.CustomerId == customerId);
    }
    return query.ToArray();
}

private static IEnumerable<Order> getOrders(int? customerId = null)
{
    var query = from order in context.Orders
                where customerId == null || order.CustomerId == customerId
                select order;
    return query.ToArray();
}
The same techniques can be used to add order by clauses, etc. When filters and sorting options come from the user, this is a slightly harder problem to solve. I wrote a helper class that converts named fields to lambda expression trees to build up a query at runtime: https://gist.github.com/jehugaleahsa/2405c3eece2fc2b0653d. I have a similar class in production that do filtering, as well.
Conclusion

I am sure there many more techniques I have picked up over the years. I will keep this document up-to-date as I stumble upon them again. I hope you found some of these tricks useful.