Showing posts with label Concurrency. Show all posts
Showing posts with label Concurrency. Show all posts

Monday 15 April 2013

Multicore Processors – Then , Now And Beyond..

 

galaxy-s4-htc-one-crystal-balldownload (1)download (2)download (3)

The day when HTC first released the windows pocket pc circa 2003 (XDA II) !!..I still remember fiddling around with Windows CE and .Net Compact Framework. I had even implemented a double buffering mechanism for rendering sprites on Windows Mobile (The complete source c0de of which is still available on my blog). The computing power on phones these days put all those things to shame. I now play GTA 3 on my iPhone effortlessly!!. Back in 2003, I had to buy a decent graphics card [PCI Express] for my PC in order to be able to support the massive area this game rendered as and when you explored the city..

Now, the smart phones Samsung Galaxy S4 V/s the HTC One is the battle for the quad core + category of smartphones today ( at the time of publishing this article.) [17].  Multicore processor products  today also include Apple’s iPad 3 with quad core for graphics and dual core for processing, XBox with 3 IBM Xenon Cores, Microsoft Surface with Quad-core NVIDIA Tegra 3 etc to name a few.

Why are multicore processors becoming so common these days ?. What is the future beyond multi core processors ?

Lets take a look at a bit of history on multicore processors now. From early 2007 onwards, we knew that Moore's law [1] was coming close to an end and that the era of parallel programming had begun to take shape with transistors not necessarily doubling but we began to see more cores instead. Now, why did this happen ?, Why didn’t the transistor count double on the same chip ?.
moores_law_chart  

As the microprocessor manufacturers tried to improve processor performance by adding transistors and logic to their CPU’s and increase clock frequencies, so did rise the complexity and the cost of manufacturing. The semiconductor industry had predicted that the clock rates would reach 12GHz by 2013.Take the Intel’s Pentium Processor for instance. The Intel's troubled Pentium processor so clearly illustrated, the laws of nature and diminishing returns were catching up to the Law of Moore. The complex logic required to implement execution pipelines and other throughput techniques required lots of expensive silicon real estate and were highly prone to post-manufacture errors, according to the SciDAC report.[2]

Let’s look at another example, Intel’s Tejas Processor – which was getting built as the successor to it’s Pentium 4 class of processors. As they tried to increase the clock speed, the power consumption started to increase to 100 watts(TDP). This was a big issue, since power draws more ampere as they are pushed to multi-gigahertz clock speeds, eating up wattage and becoming difficult to cool with conventional heat-sink technology. A processor with this specification was not just viable to produce.[3]

It was clear to chip makers that it was becoming exceedingly impractical (and sometimes impossible) to verify all that complex logic; the cost of these designs was reaching into the hundreds of millions of dollars. The SciDAC report cited "A View of the Parallel Computing Landscape,” a paper published in 2006 by a team of computer scientists from Lawrence Berkeley National Laboratory (LBNL) and the University of California (UC), Berkeley. The Berkeley scientists offered the following solutions for these processor issues: [2]

Power: Parallelism is an energy-efficient way to achieve performance. Many simple cores offer higher performance per unit area for parallel codes than a comparable design employing smaller numbers of complex cores.

Design Cost: The behaviour of a smaller, simpler processing element is much easier to predict within existing electronic design-automation workflows and more amenable to formal verification. Lower complexity makes the chip more economical to design and produce.

Defect Tolerance: Smaller processing elements provide an economical way to improve defect tolerance by providing many redundant cores that can be turned off if there are defects.[2]

Soon after, Chip manufacturer’s like Intel were clear from their experiments / experiences and tests that the future of computing was heading to multi-cores. Paul Otellini, Intel’s President announced (circa 2004)that “All of our microprocessor development going forwards is multi-core. We’ll add multi-cores into all products – notebooks, desktops and servers.”. The idea to parallelize existing tasks or to divide up multiple tasks to gain efficiency had caught up. Microsoft was keen to capitalize on multicore technology and it went ahead with this strategy on big scale shipping Windows code named Longhorn (Vista) operating system which supported it.[3]

With Intel, This direction of multi-core eventually development led to the birth of the Arrandale range of processors. Arrandale is the codename for a two core mobile processor used on the Intel 5 series chipset based platforms codenamed Calpella. Arrandale is comprised of a 32nm processor core and 45nm graphics & memory controller integrated in a multi-chip package. Arrandale is part of the family of 32nm processors codename Westmere based on Intel microarchitecture codename Nehalem and targeted for production in 4Q'09, with availability in 1Q'10. Check this table below to see the TDP’s of the Arrandale range of processors. [4]

Product Name                        Status    Embedded                     Recommended
Options Available Max TDP Customer Price
-----------------------------------------------------------------------------------------------
Intel® Pentium® Processor U5600 Launched No 18 W N/A
(3M Cache, 1.33 GHz)
Intel® Pentium® Processor U5400 Launched No 18 W N/A
(3M Cache, 1.20 GHz)
Intel® Pentium® Processor P6300 Launched No 35 W TRAY: $134.00
(3M Cache, 2.27 GHz)
Intel® Pentium® Processor P6200 Launched No 35 W TRAY: $134.00
(3M Cache, 2.13 GHz)
Intel® Pentium® Processor P6100 Launched No 35 W TRAY: $134.00
(3M Cache, 2.00 GHz)
Intel® Pentium® Processor P6000 Launched No 35 W N/A
(3M Cache, 1.86 GHz)
Intel® Core™ i7-680UM Processor Launched No 18 W N/A
(4M Cache, 1.46 GHz)
Intel® Core™ i7-660UM Processor Launched No 18 W TRAY: $289.00
(4M Cache, 1.33 GHz)
Intel® Core™ i7-660UE Processor Launched Yes 18 W TRAY: $301.00
(4M Cache, 1.33 GHz)
Intel® Core™ i7-660LM Processor Launched No 25 W N/A
(4M Cache, 2.26 GHz)
Intel® Core™ i7-640UM Processor Launched No 18 W N/A
(4M Cache, 1.20 GHz)
Intel® Core™ i7-640M Processor Launched No 35 W N/A
(4M Cache, 2.80 GHz)
Intel® Core™ i7-640LM Processor Launched No 25 W N/A
(4M Cache, 2.13 GHz)
Intel® Core™ i7-620UM Processor Launched No 18 W N/A
(4M Cache, 1.06 GHz)
Intel® Core™ i7-620UE Processor Launched Yes 18 W TRAY: $289.00
(4M Cache, 1.06 GHz)
Intel® Core™ i7-620M Processor Launched Yes 35 W TRAY: $332.00
(4M Cache, 2.66 GHz)
Intel® Core™ i7-620LM Processor Launched No 25 W TRAY: $300.00
(4M Cache, 2.00 GHz)
Intel® Core™ i7-620LE Processor Launched Yes 25 W TRAY: $311.00
(4M Cache, 2.00 GHz)
Intel® Core™ i7-610E Processor Launched Yes 35 W TRAY: $320.00
(4M Cache, 2.53 GHz)
Intel® Core™ i5-580M Processor Launched No 35 W N/A
(3M Cache, 2.66 GHz)
Intel® Core™ i5-560UM Processor Launched No 18 W N/A
(3M Cache, 1.33 GHz)
Intel® Core™ i5-560M Processor Launched No 35 W N/A
(3M Cache, 2.66 GHz)
Intel® Core™ i5-540UM Processor Launched No 18 W TRAY: $250.00
(3M Cache, 1.20 GHz)
Intel® Core™ i5-540M Processor Launched No 35 W TRAY: $257.00
(3M Cache, 2.53 GHz) BOX : $269.00
Intel® Core™ i5-520UM Processor Launched No 18 W N/A
(3M Cache, 1.06 GHz)
Intel® Core™ i5-520M Processor Launched Yes 35 W TRAY: $225.00
(3M Cache, 2.40 GHz)
Intel® Core™ i5-520E Processor Launched Yes 35 W TRAY: $224.00
(3M Cache, 2.40 GHz)
Intel® Core™ i5-480M Processor Launched No 35 W N/A
(3M Cache, 2.66 GHz)
Intel® Core™ i5-470UM Processor Launched No 18 W N/A
(3M Cache, 1.33 GHz)
Intel® Core™ i5-460M Processor Launched No 35 W N/A
(3M Cache, 2.53 GHz)
Intel® Core™ i5-450M Processor Launched No 35 W TRAY: $210.00
(3M cache, 2.40 GHz)
Intel® Core™ i5-430UM Processor Launched No 18 W N/A
(3M cache, 1.20 GHz)
Intel® Core™ i5-430M Processor Launched No 35 W N/A
(3M Cache, 2.26 GHz)
Intel® Core™ i3-390M Processor Launched No 35 W TRAY: $225.00
(3M Cache, 2.66 GHz)
Intel® Core™ i3-380UM Processor Launched No 18 W TRAY: $250.00
(3M Cache, 1.33 GHz)
Intel® Core™ i3-380M Processor Launched No 35 W TRAY: $225.00
(3M Cache, 2.53 GHz)
Intel® Core™ i3-370M Processor Launched No 35 W TRAY: $210.00
(3M cache, 2.40 GHz)
Intel® Core™ i3-350M Processor Launched No 35 W TRAY: $225.00
(3M Cache, 2.26 GHz)
Intel® Core™ i3-330UM Processor Launched No 18 W N/A
(3M cache, 1.20 GHz)
Intel® Core™ i3-330M Processor Launched No 35 W TRAY: $225.00
(3M Cache, 2.13 GHz)
Intel® Core™ i3-330E Processor Launched Yes 35 W TRAY: $177.00
(3M Cache, 2.13 GHz)
Intel® Celeron® Processor U3600 Launched No 18 W N/A
(2M Cache, 1.20 GHz)
Intel® Celeron® Processor U3405 Launched Yes 18 W TRAY: $134.00
(2M Cache, 1.07 GHz)
Intel® Celeron® Processor U3400 Launched No 18 W N/A
(2M Cache, 1.06 GHz)
Intel® Celeron® Processor P4600 Launched No 35 W TRAY: $86.00
(2M Cache, 2.00 GHz)
Intel® Celeron® Processor P4505 Launched Yes 35 W TRAY: $86.00
(2M Cache, 1.86 GHz)
Intel® Celeron® Processor P4500 Launched Yes 35 W TRAY: $86.00
(2M Cache, 1.86 GHz)

 

Well, Intel had by then successfully reduced the size of it’s processors to 32nm in order to gain efficiency on the Core i3, i5 & i7 multicore processors. All of them gave good results and are mostly seen on many laptops today including the Mac. [5]

And now, with the Ivy Bridge Chip [6], Intel has released a set of powerful and power efficient processors to be produced from it’s 22nm process built with the revolutionary 3-D Tri-gate technology [7]. With this triple-gate transistor [9], Intel has reinvented the technology that powers every computing device -- indeed, all of today's electronics -- and in doing so, has broken the size restrictions currently inherent in today's microprocessors. The breakthrough is not so much the Tri-Gate technology, which was first unveiled by Intel in 2002. What's significant is that the company has developed a process for mass-producing the circuits in a process that's 22 billionths of a meter thick, about a third thinner than the 32-nm process it uses for Sandy Bridge microprocessor architecture. [8] (a human hair is about 100,000-nm thick)

According to an Intel announcing the news [10], Intel's 3-D Tri-Gate transistors will operate at lower voltage and with lower leakage. "The additional control enables as much transistor current flowing as possible when the transistor is in the 'on' state (for performance), and as close to zero as possible when it is in the 'off' state (to minimize power), and enables the transistor to switch very quickly between the two states." With performance increases according to claims of as much as 37 percent compared with 32-nm planar-transistor devices and consuming less than half the power, the new parts will be highly suited to small handheld devices such as smartphones, medical devices, media players, portable gaming systems and anything that can benefit from the ability to switch quickly between high performance and low power consumption. In essence, everything these days.[11]

This capability will no doubt benefit Microsoft Surface for Windows 8, a full-featured tablet that's built around Ivy Bridge. Perhaps more impressive is the Surface for Windows RT, which will incorporate Nvidia Tegra 3 quad-core processor. Technically considered to be a system-on-chip, the Tegra 3 casts four 1.4GHz Cortex A9 cores on the die along with a fifth low-power core that can perform all functions during device standby. The Tegra 3 system-on-chip also includes a GeForce GPU with as many as 12 graphics processor cores and supports a maximum resolution of 2,560 x 1,600 pixels. Google’s Nexus 7 tablet also will feature the Tegra 3.[13]


Intel’s MIC (Many Integrated Core) [12]  puts 32 cores on a single chip and makes them available for highly parallel applications in high-performance computing such as those for climate simulation, energy research and genetic analysis. According to Intel, Developers applications written in standard programming languages can still take advantage of these extremely high levels of application performance. Code-named Knights Ferry, the Intel MIC boards are even being experimented with to run applications that use cloud-based ray tracing, a compute-intensive light rendering technique used in video games that's currently limited to dedicated, high-end graphics processors. This allows laptops, smartphones and other lightweight computing devices to experience sophisticated games without a heavy GPU. Once such a technology becomes mainstream, tasks that were once the province of supercomputers will be available from the average smartphone.


Now, Let’s take a look at how ARM Processors having been shaping the mobile [14] / hand-held low powered devices which noticeably starting from the iPhone in 2007. Apple's A4/A5/A5X, Nvidia's Tegra, Samsung's Exynos and Texas Instruments' OMAP products all integrate ARM processors into what is known as a system-on-a-chip (SoC). SoCs merge many of the essential components of a computer (such as the CPU, RAM, ROM etc.) on a single chip which allows devices that utilize them to be lightweight and compact. These SoCs have gone on to be implemented in blockbuster products such as Apple's iPhone and iPad or Samsung's series of Galaxy phones. ARM's presence as the CPU and architecture of choice on many mobile devices cannot be understated as estimates put their numbers in the billions.


What’s Qualcomm brewing?, Qualcomm considers Snapdragon [18] a "platform" for use in smartphones, tablets, and hand helds.Snapdragon is a family of mobile system on chips (SoC). The original Snapdragon CPU, dubbed Scorpion, is Qualcomm's own design. It has many features similar to those of the ARM Cortex-A8 core and it is based on the ARM v7 instruction set, but theoretically has much higher performance for multimedia-related SIMD operations.The successor to Scorpion, found in S4 Snapdragon SoCs is named Krait and has many similarities with the ARM Cortex-A15 CPU and is also based on the ARMv7instruction set.All Snapdragon processors contain the circuitry to decode high-definition video (HD) resolution at 720p or 1080p depending on the Snapdragon chipset. Adreno, the company's proprietary GPU technology, integrated into Snapdragon chipsets (and certain other Qualcomm chipsets) is Qualcomm's own design, using assets the company acquired from AMD. The Adreno 225 GPU in Snapdragon S4 SoCs adds support for DirectX 9/Shader Model 3.0 which makes it compatible with Microsoft's Windows 8.Compared to System on chips from many competitors, Snapdragon SoCs have been unique in that they have had the antenna for cellular communication on-die. That is, they do not require a separate external antenna on the PCB. Since Snapdragon S4, the majority of S4 SoCs also features on-die Wi-Fi, GPS/GLONASS and Bluetooth basebands. This integration reduces the complexity and cost of the final design for the OEM. It also has the advantage of benefiting from advancements made in the manufacturing process, for example 28 nm in most S4 SoCs, thus providing antennas and other dedicated circuitry with lower power characteristics than external chips manufactured with older processes.


AMD—unable to mimic this success in more recent years, has shifted their focus towards both enthusiast and budget-oriented system configurations. As a result, AMD is considered to be a viable alternative to Intel. Their current offerings are flanked by the Phenom series processors and Fusion APU processors. The Fusion APU (AMD A-Series) is a relatively new platform (as of 2011 and on going) that attempts to merge high-end graphical capabilities on the same chip as the processor. This means if your work or play requires a powerful graphics card, then AMD can potentially offer a cost effective alternative.[16]


The Future Of ComputingQuantum Theory or Atomic Physics. My previous blog post and thoughts on the future of  nano-technology, it’s constraints and how the future could take shape with Quantum Theory (with respect to Computing).


References


1. Moore’s law
2. crn.com
3. Intel’s Tejas Processor
4. Arrandale
5. Benchmarks for products with Arandale

6. Ivy Bridge Chip
7. Making of the Intel Chip, 22nm / 3D Transistors , PDF

8. Pentium flaws aid Intel in Sandy Bridge
9. Intel’s 3D Transistors
10. Intel reinvents transistors using new 3-D structure
11. Intel’s 3-D Transistor explained
12. Intel MIC
13. NVIDIA Tegra 4
14. ARM – Cortex , 2007 Annual Report
15. Mobile Virtualization
16. Intel, AMD & ARM Processors – Comparison

17.
 Galaxy S4 Vs HTC One
18. Qualcomm Snapdragon Processors


All views expressed here are a study based on my interest and not biased towards a product or company. – Sudhir Murthy.  2013.

Thursday 31 July 2008

A Concurrent Programming Primer on Code Project

 

I found a good read on PLINQ (Parallel LINQ) and TPL (Task Parallel Library) on code project. It doesn’t cover everything though but gives you good examples and common pitfalls using them.

The examples cover,

  • Parallel Class - Parallel.For, Parallel.Do, AsParallel  constructs
  • Exception Handling for Parallel programming constructs
  • Handling of compiler side-effects.

You need the TPL CTP download and of course VS 2008 for trying out the samples.

Converting a LINQ query from sequential to parallel execution is as simple and straightforward as below :

   1: Enumerable<T> data = ...;



   2: var q = from x in data.AsParallel() where p(x) orderby k(x) select f(x);




Here’s the link

Wednesday 2 July 2008

Running Queries on Multi-Core Processors


Multi-core processors are here. Once pervasive mainly in servers and desktop PCs, now multi-core processors are being used in mobile phones and PDAs, resulting in great benefits in power consumption. Responding to the increased availability of multi-processor platforms, Parallel Language Integrated Query (PLINQ) offers an easy way to take advantage of parallel hardware, including traditional multi-processor computers and the newer wave of multi-core processors.
PLINQ is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available. The change in programming model is tiny, meaning you don't need to be a concurrency guru to use it. In fact, threads and locks won't even come up unless you really want to dive under the hood to understand how it all works. PLINQ is a key component of Parallel FX, the next generation of concurrency support in the Microsoft® .NET Framework.
Using technologies like PLINQ will become increasingly crucial to ensuring the scalability of software on future parallel microprocessor architectures. By utilizing LINQ at choice places throughout your applications today—such as where you have data- or compute-intensive operations that can be expressed as queries—you will ensure that those fragments of your programs continue to perform better when PLINQ becomes available and the machines running your application grow from 2 to 4 to 32 processors and beyond. And even if you only run that code on a single-processor machine, the overhead of PLINQ is typically so small that you won't notice a difference. In addition, the data parallel nature of PLINQ ensures your programs will continue to scale as the size of your data sets increases.
More here

Saturday 21 June 2008

Programming in the age of Concurrency!!

 

Enter parallel extensions for .Net (PFX), Microsoft’s new take on parallel programming and concurrency in the .net world. I saw an extensive 34 minute video on channel 8 from the man himself – Anders Hejlsberg.

In summary, the PFX set of libraries can be used to leverage true multi-core programming which are becoming common these days.

Here is the link

Thursday 19 June 2008

Programming the Thread Pool in .Net

I discovered this excellent article on Thread Pooling in .Net on MSDN, It can be found here. If you are interested in understanding why thread pooling made it to the .Net Platform, read this article here. Also, check out this article on CodeProject which discusses some behind the scenes issues with .Net ThreadPool Class.