Thursday, June 4, 2009

Running Parallel Tasks

.NET 4.0 introduces a more "high level" construct than thread to write parallel program, System.Threading.Tasks.Task. Task is a unit of work which could be assigned on to a CPU core. This is a versatile and comprehensive API. You can wait/join a Task, cancel a Task, continue with another task when a task completes and more. When you need this functionality, you should try this class. Task class has a static property, Factory, of type TaskFactory. This class provides methods to create tasks from different options. The simplest way to create a task is, call the StartNew(Action act) method on Task.Factory, passing an Action delegate.

By calling ContinueWith(Action) on a task, we can specify another task which should be completed after a particular task has completed.
You can wait on a task to complete, by calling wait method on task:

You can cancel a task asynchronously, by calling Cancel() on the task:


You can cancel a task Synchronously, by calling CancelAndWait():

You can run a task synchronously, by calling RunSycnronously():

Tuesday, June 2, 2009

From LINQ To Parallel LINQ

With .NET 4.0, Microsoft is introducing a new Language INtegrated Query (LINQ) i.e. Parallel Language INtegrated Query (PLinq). With PLinq, we can query "in memory" collections and objects in parallel. Mind it, we can query in memory objects in parallel not the Linq to SQL. Because, parallellizing Linq to SQL means to shift the overhead to SQL Server. For that purpose we have to wait for a new version of SQL Server.
For PLinq System.Linq.ParallelEnumerable class. This class defines all the query operators which were provided with LINQ. The method to write PLinq query is just that simple. Consider a LINQ query:

This is our old friend from C# 3.0. Now, to make it a PLinq query, we will only call AsParallel() extension method on the collection after in keywork:

This method, AsParallel(), returns an object of type ParallelEnumerable. And then the where, select, groupby, thenby keywords are mapped to the extenshion methods from ParallelEnumerable. For this code, I used the old prime determination algorithm defined as:


Monday, June 1, 2009

Parallel.ForEach and Parallel.Invoke

Parallel.ForEach method is the parallel equivalent to "serial" foreach loop. It Iterates on a collection of generic type parameter type TSource. You can pass three different collection types: System.Collections.Generic.IEnumerable, System.Collections.Concurrent.Partitioner and System.Collections.Concurrent.OrderablePartitioner. Partioner classes are abstract and are here to write custom partioning code. This topic is beyond the scope of this post, and leave it for the future.
Here is a quick example of how to use this overload of Parallel.ForEach:

ints is IEnumerable and a is an object from this IEnumerable instance. You can use other Stop and Break semantics by replacing the second parameter with other delegate:
Action. and use the ParallelLoopState.Stop or ParallelLoopState.Break to exit from "loop" at the earliest convenience.
Lastly, lets see what's the Parallel.Invoke method has to offer. It takes a variable number of Action delegates, and executes them in parallel. Here is a quick example:

Task1,Task2,Task3,Task4 are dumb methods whose signature matches the Action delegate's, Just to show an example.

Sunday, May 31, 2009

I would like to close this month with the comparison of Parallel.For and serial for loop. For comparison, I took an INEFFICIENT algorithm to find if a given number is prime. Here is the algorithm:


Here is the serial for loop to find the primes from 2 to 100000:

And here is its output:


Note down the Thread ID, and it took 24511 milliseconds. And here is its equivalent Parallel code:



Here comes the output:

Did you notice that 6 different threads run the code in parallel and it took 18220 milliseconds? The Parallel class uses Task class behind the scenes, to parallelize the loop iterations. And depending upon the number of cores of your CPU, it determines the optimal number of threads and assigns these tasks to the threads.

This program determines prime number in a big range of 100000 numbers. But if you replace the both programs for less numbers, say 5000, or 10000, then the serial for loop will be more efficient. Because, Task objects creation incurs overhead and exceeds the solution time. So, if your problem is small try to avoid parallelize your code.

Saturday, May 30, 2009

Breaking and Stopping Parallel Loop in Task Parallel Library

What if you want to exit from a Parallel.For method, like you could with for loop? You can not write code like this:

The problem here is that, For is a method of class Parallel. It is not a loop. And you can not exit from a method except with return statement.
The Solution:
TPL Team came with a great idea. You can pass in a delegate, which takes two parameters. On is the loop counter, and the other is of type class ParallelLoopState. ParallelLoopState has two methods to deal with exiting from loop:
Parallel.Break()
is shared with all other concurrent threads in the system which are participating in the loop's execution. After calling Break(), no additional iterations past the iteration of the caller will be executed on the current thread, and other parallel workers will be stopped at their earliest convenience.


Parallel.Stop()
is shared with all other concurrent threads in the system which are participating in the loop's execution. After calling Stop(), no additional iterations will be executed on the current thread, and other parallel workers will be stopped at their earliest convenience.

So, to stop the loop you should write the code as follows:


For a comprehensive overview of Parallel visit This Blog Post

Friday, May 29, 2009

Parallel Programming with .NET Framework 4.0

         In the mscorlib.dll version 4.0.0.0 there are three main namespaces: System.Threading, System.Threading.Tasks and System.Collections.Concurrent. System.Threading contains types like ThreadLocal, ParallelOptions, ParallelLoopState, ParallelLoopResult, Parallel, ManualResetEventSlim to name a few.
         System.Threading.Tasks contains Task, Task, TaskFactory etc to use the Task level parallelism. While the System.Collections.Concurrent contains the generic collections to work with data in thread save manner.
         To get started with TPL, there is simplest class Parallel in .NET Framework 4.0. Lets see what does it have to offer. It has four main methods Parallel.For, Parallel.For<>, Parallel.ForEach<> and Parallel.Invoke. First three methods are, as their names imply, provied to parallelize the loops. These methods have a lot of overloads to cover vast scenarios. While, Parallel.Invoke method has two overloads and just takes the array of Action delegate and invokes them in parallel.
         And here is an example of simplest for method:

Parallel.For takes three arguments: two integers and the third parameter Action delegate. This third delegate parameter is executed parallel on a multi-core machine.

Thursday, May 28, 2009

Task Parallel Library (TPL) in .NET 4.0

Untill now, I have written about major enhancements to C# 4.0. Today, I want to talk about what is coming with the .NET Framework 4.0. There are a lot of new things coming with .NET 4.0, WPF 4.0, WCF 4.0, WF 4.0, Code Contracts to name a few. But for the matter of this post, I would like to throw light on the Task Parallel Library (TPL from now on, to save me typing).
TPL is the enhancement to the .NET Framework to make concurrent programming on multi core systems easier. We had an option previously to write parallel programs, Threads, but it was "expensive" and difficult. With "expensive", I mean, you have to create a thread object and it incurs system overhead. And by "difficult", I mean, it is very difficult to synchronize access to shared resources and finding dead locks etc.
The .NET Framework 4.0 provides another "high-level" layer on top of the stack to make our lives easier. Now, we don't need to create "Thread" ourselves. We only create a Task (class), and pass in the delegate or method to execute. The underlying framework determines itself either to create a thread or not for the particular task or to assign it to already executing task. Before I go into the details about the TPL, I would like you to read this post by Daniel moth.
In which he describes the "major over hauling" of ThreadPool in .NET 4.0.