The aim of technology has always been solving problems facing humanity. The aim isn’t only to
solve problems, but to solve them efficiently in the least possible time. With time problems
become more complex to solve (i.e.: better hardware needed to solve these problems quickly).
In the field of Computer Science, the hardware component responsible for how fast will a
problem be solved is the processor.
In the past hardware corporations like Intel used to introduce faster processor each year. But at
some point they figured out that they wouldn’t be able to introduce new uni-core processors
with more transistors; and since the speed of a processor is directly proportional with the
number of transistors, it wasn’t feasible to have faster uni-core processors.
Hardware corporations found that the way to introduce faster processors is to introduce multicore
processors. Introducing multi-core processors resulted in a problem that former
programming models don’t support multi-core processors programming. So, new programming
models and languages were introduced for programmers to be able to utilize the presence of
multi-cores in one processor. The article demonstrates one of these new models which is
Introduction:-The aim of technology has always been solving problems facing humanity. The aim isn’t only tosolve problems, but to solve them efficiently in the least possible time. With time problemsbecome more complex to solve (i.e.: better hardware needed to solve these problems quickly).In the field of Computer Science, the hardware component responsible for how fast will aproblem be solved is the processor.In the past hardware corporations like Intel used to introduce faster processor each year. But atsome point they figured out that they wouldn’t be able to introduce new uni-core processorswith more transistors; and since the speed of a processor is directly proportional with thenumber of transistors, it wasn’t feasible to have faster uni-core processors.Hardware corporations found that the way to introduce faster processors is to introduce multicoreprocessors. Introducing multi-core processors resulted in a problem that formerprogramming models don’t support multi-core processors programming. So, new programmingmodels and languages were introduced for programmers to be able to utilize the presence ofmulti-coresin one processor. The article demonstrates one of these new models which isCilk++.
Cilk++ VS C++:-
Cilk++ development started in 1994 in of the MIT labs. Cilk++ is based on C++. So, writing Cilk++
is exactly the same as writing C++ with the ability of programming parallel applications with the
use of some new keywords specifically introduced to Cilk++ to enable parallel programming.
The new keywords are: cilk, spawn, sync, inlet and abort. The “cilk” keyword is added to the
header of any function to notify the compiler that this function may include parallel
programming (i.e.: one of the other Cilk++ specific keywords might be included in this function).
Below is a detailed description for the rule of each of the other keywords.
The “spawn” keyword can be added to the start of any line of code to notify the processor that
it can execute this line of code on a separate core of the processor if possible. This line of code
might be a call for function or even a set of functions giving the ability to run a piece of code
The “sync” keyword is closely related to the “spawn” keyword. After one or more lines of code
has been “spawned” the “sync” keyword can be put afterwards to notify the processor that
should stop executing the code till all spawned processes finish processing. The advantage of
“sync” is to synchronize parallel processes in order to ensure secure coding (i.e.: no problem
will take place as a result of a process demanding resources that are being used by another
process running in parallel). Below is a figure showing the functionality of “sync”.
The “inlet” keyword is used in advanced programming. Whenever a parent process spawns
other processes, these child processes are supposed to return results to the parent process.
Inlet is used in order to make sure that no write conflicts will take place as a result of multiple child processes writing in a variable in the same parent process. An “inlet” is like a function that
can be written inside another function to control how return values of child processes are to be
written to the parent process and to ensure that only one child process can write to the return
value of the parent process at a time. The code below shows how “inlet” can be used in a
function used to calculate Fibonacci numbers.
cilk int fib (int n)
int x = 0;
inlet void summer (int result)
x += result;
if (n<2) return n;
summer(spawn fib (n-1));
summer(spawn fib (n-2));
The “abort” keyword is also used in advanced programming. Sometimes a process is spawned
in multiple processes, and at some point the result is reached and no further processing of
parallel processes is needed, so the abort keyword is used to stop the unneeded processing of
these other processes. For example, if a search algorithm works by means of having multiple
processes working in parallel on different parts of an array of integers or so, if one of the
processes finds the element the algorithm is searching for, it notifies the processor to stop the
execution of the rest of the processes in order not to waste time and resources.
Parallelism is an attribute whose value can be calculated to show how much beneficial was
Cilk++ for an algorithm over running the same algorithm in the ordinary sequential way. To
calculate the value of parallelism for an algorithm both work and depth has to be calculated for
this algorithm. Work is equivalent to the total number of operations done by the algorithm.
Depth is the length of the longest sequential dependence chain. Parallelism is equivalent to the
value of “work” over the value of “depth”. The figure below shows an example for parallelism.
Cilk work-stealing scheduler:-
The Cilk work-stealing scheduler is one of the most beneficial features of Cilk. The aim from it is
to ensure maximum utilization of processor’s capabilities for programs to execute in shorter
time. For example, in case of a processor with X cores and a program is running X processes on
its X cores waiting for them to finish in order to output the result; if one of these processes
finishes before the others, it starts taking tasks from the bottom of other process’s stack. The
figure below shows two processes running in parallel on two different cores, where one of
them finishes before the other then use the work-stealing feature to help the unfinished
process in its work.
Calculating Fibonacci numbers is a good example for an operation that would run in much less
time if processed in parallel on multiple cores of a processor. As shown below in the code and
figure, the recursion of the function is processed in parallel which resulted in executing the
function in much less time.
int fib (int n)
if (n <= 2) return 1;
int x, y;
x = fib (n-1);
y = fib (n-2);
return x + y;
int fib (int n)
if (n <= 2) return 1;
int x, y;
x = cilk spawnfib (n-1);
y = cilk spawnfib (n-2);
return x + y;
Quick sort is regarded as one of the efficient algorithms for sorting data structures of data
according to a specific criterion for sorting. Quick sort is one of those algorithms that would run
in much less time if implemented using Cilk++ to run on multi-core processors. The idea behind
quick sort is to take the 1st element of data and put it its correct order then repeat the same
thing for the items less than it and the items greater than it till all the elements are sorted.
Below is the algorithm along with a figure to demonstrate how a multi-core processor is utilized
to run it when implemented using Cilk++.
c static void QuickSort(index low, index high)
if (low < high) then
index p = Partition(low, high);
cilk spawn QuickSort (low, p -1);
cilk spawn QuickSort (p +1, high);
public static index Partition(index low, index high)
pivot = S[low];
j = low;
for (i = low + 1; i <= high; i++) do
if (S[i] < pivot) then
Exchange S[i] and S[j];
p = j;
Exchange S[low] and S[p];
Cilk++ VS OpenMP:-
OpenMP is another programming model for parallel programming. Below is a diagram showing
difference in execution time between Cilk++ and OpenMP when executing the quick sort code
for both the adaptive and parallel models.
Cilk++ is available for windows and linux operating systems and can be downloaded from the
following link: http://software.intel.com/en-us/articles/download-intel-cilksdk/
Cilk++ for MAC was a project under development that was never completed and not under
 Prof. Richard (Rich) Vuduc, Introduction to Cilk++. Georgia Tech, College of Computing,
 Mingdong Feng1, Charles E. Leiserson2, Efficient Detection of Determinacy Races in Cilk
Programs. 1 National University of Singapore, 2 MIT Laboratory for Computer Science.
 Matteo Frigo, Multithreaded Programming in Cilk. CILK ARTS.
 Charles E. Leiserson, Aske Plaat, Programming Parallel Applications in Cilk. MIT Laboratory
for Computer Science.
 Cilk 5.4.6 Reference Manual. MIT Laboratory for Computer Science.
 Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H.
Randall, Yuli Zhou, Cilk: An Efficient Multithreaded Runtime System. MIT Laboratory for