Europe wants to have one exascale supercomputer by 2023

On March 23rd, in Rome, seven European countries signed a joint declaration on High Performance Computing (HPC), committing to an initiative that aims at securing the required budget and developing the technologies necessary to acquire and deploy two exascale supercomputers, in Europe, by 2023. Other Member States will be encouraged to join this initiative.

Exascale computers, defined as machines that execute 10 to the 18th power operations per second will be roughly 10 times more powerful than the existing fastest supercomputer, the Sunway TaihuLight, which clocks in at 93 petaflop/s, or 93 times 10 to the 15 floating point operations per second. No country in Europe has, at the moment, any machine among the 10 most powerful in the world. The declaration, and related documents, do not fully specify that these machines will clock at more than one exaflop/s, given that the requirements for supercomputers are changing with the technology, and floating point operations per second may not be the right measure.

This renewed interest of European countries in High Performance Computing highlights the fact that this technology plays a significant role in the economic competitiveness of research and development. Machines with these characteristics are used mainly in complex system simulations, in physics, chemistry, materials, fluid dynamics, but they are also useful in storing and processing the large amounts of data required to create intelligent systems, namely by using deep learning.

Andrus Ansip, European Commission Vice-President for the Digital Single Market remarked that: “High-performance computing is moving towards its next frontier – more than 100 times faster than the fastest machines currently available in Europe. But not all EU countries have the capacity to build and maintain such infrastructure, or to develop such technologies on their own. If we stay dependent on others for this critical resource, then we risk getting technologically ‘locked’, delayed or deprived of strategic know-how. Europe needs integrated world-class capability in supercomputing to be ahead in the global race. Today’s declaration is a great step forward. I encourage even more EU countries to engage in this ambitious endeavour”.

The European Commission press release includes additional information on the next steps that will be taken in the process.

Photo of the signature event, by the European Commission. In the photo, from left to right, the signatories: Mark Bressers (Netherlands), Thierry Mandon (France), Etienne Schneider (Luxembourg), Andrus Ansip (European Commission), Valeria Fedeli (Italy), Manuel Heitor (Portugal), Carmen Vela (Spain) and Herbert Zeisel (Germany).

 

IBM TrueNorth neuromorphic chip does deep learning

In a recent article, published in the Proceedings of the National Academy of Sciences, IBM researchers demonstrated that the TrueNorth chip, designed to perform neuromorphic computing, can be trained using deep learning algorithms.

brain_anatomy_medical_head_skull_digital_3_d_x_ray_xray_psychedelic_3720x2631

The TrueNorth chip was designed to efficiently simulate the efficient modeling of spiking neural networks, a model for neurons that closely mimics the way biological neurons work. Spiking neural networks are based on the integrate and fire model, inspired on the fact that actual neurons integrate the incoming ion currents caused by synaptic firing and generate an output spike only when sufficient synaptic excitation has been accumulated. Spiking neural network models tend to be less efficient than more abstract models of neurons, which simply compute the real valued output directly from the values of the real valued inputs multiplied by the input weights.

As IEEE Spectrum explains: “Instead of firing every cycle, the neurons in spiking neural networks must gradually build up their potential before they fire. To achieve precision on deep-learning tasks, spiking neural networks typically have to go through multiple cycles to see how the results average out. That effectively slows down the overall computation on tasks such as image recognition or language processing.

In the article just published, IBM researchers have adapted deep learning algorithms to run on their TrueNorth architecture, and have achieved comparable precision, with lower energy dissipation. This research raises the prospect that energy-efficient neuromorphic chips may be competitive in deep learning tasks.

Image from Wikimedia Commons

Inching towards an exascale supercomputer

The Sunway TaihuLight became, as of June 2016, the fastest supercomputer in the world. At this time, the Top 500 ranking was rearranged to put this computer ahead of TianHe-2 (also from China). Sunway TaihuLight clocked in at 93 petaflop/sec (93,000,000,000,000,000 floating point operations per second)  using its 10 million cores This performance compares with the 34 petaflop/sec for the 3 million core TianHe-2. An exascale computer would have a performance of 1000 petaflops/sec.

What is maybe even more important, is that the new machine uses 14% less power than TianHe-2 (it uses a mere 15.3 MW), which makes it more than three times as efficient.

Mjc4OTczNg

As IEEE Spectrum reports, “TaihuLight uses DDR3, an older, slower memory, to save on power“. Furthermore, it tries to use small amounts of local memory near each core instead of a more traditional (and power demanding) memory hierarchy. Other architectural choices aimed at reducing the power while preserving the performance.

It is interesting to compare the power efficiency of this supercomputer with that of the human brain. Imagine that this supercomputer is used to simulate a full human brain (with its 86 billion neurons), using a standard neuron simulator package, such as NEURON.

Using some reasonable assumptions, it is possible to estimate that such a simulation would proceed at a speed about 3 million times slower than real time, and would require about three trillion times more energy than the human brain, to perform equivalent calculations. In terms of speed and power efficiency, it is still hard to compete with the 20W human brain.

 

Could a neuroscientist understand a microprocessor?

In a recent article, which has been widely commented (e.g., in a wordpress blog and in marginal revolution) Eric Jonas and Konrad Korning, from UC Berkeley and Northwestern universities, respectively, have described an interesting experiment.

They have applied the same techniques neuroscientists use to analyze the brain to the study of a microprocessor. More specifically, they used local field potentials, correlations between activities of different zones, the effects of single transistor lesions, together with other techniques inspired in state of the art brain sciences.

Microprocessors are complex systems, although they are much simpler than a human brain. A modern microprocessor could have several billion transistors, a number that compares poorly with the human brain, which has close to 100 billion neurons, and probably more than one quadrillion synapses. One could imagine that, by applying techniques similar to the ones used in neuroscience, one could obtain some understanding of the role of different functional units, how they are interconnected, and even how they work.

Castle_Chip_Layout

The authors conclude, not surprisingly, that no significant insights on the structure of the processor can be be gained by applying neuroscience techniques. The authors have indeed observed signals that are reminiscent of the signals obtained when applying NMR and other imaging techniques to live brains, and have observed significant correlations between these signals and the tasks the processor was doing, as in the following figure, extracted from the paper.

signals

However, the analysis of these signals did not provide any significant knowledge on the way the processor works, nor about the different functional units involved. They did, however, provide significant amounts of misleading information. For instance, the authors investigated how transistor damage affected three chip “behaviors”, specifically the execution of the games Donkey Kong, Space Invaders and Pitfall. They were able to find transistors which uniquely crash one of the games but not the others. A neuroscientist studying this chip might thus conclude a specific transistor is uniquely responsible for a specific game – leading to the possible conclusion that there may exist a “Space Invaders” transistor and a “Pitfall” transistor.

These may be bad news for neuroscientists. Reverse engineering the brain, by observing the telltales left by neurons working, may remain forever an impossible task. Fortunately, that still leaves open the possibility that we may be able to fully reconstruct the behavior of a brain, even without ever having a full understanding of its behavior.

First image: Chip layout of EnCore Castle processor, by Igor Bohem, available at Wikimedia commons.

Second image: Observed signals, in different parts of the chip.

Moore’s law is dead, long live Moore´s law

Google recently announced the Tensor Processing Unit (TPU), an application-specific integrated circuit (ASIC) tailored for machine learning applications that, according to the company, delivers an order of magnitude improved performance, per watt, over existing general purpose  processors.

The chip, developed specifically to speed up the increasingly common machine learning applications, has already powered a number of state of the art applications, including AlphaGo and StreetView. According to Google, this type of applications is more tolerant to reduced numerical precision and therefore can be implemented using fewer transistors per operation. Because of this, Google engineers were able to squeeze more operations per second out of each transistor.

chip

The new chip is tailored for TensorFlow, an open source library that performs numerical computation using data flow graphs. Each node in the graph represents one mathematical operation that acts on the tensors that come in through the graph edges.

Google stated that TPU represents a jump of ten years into the future, in what regards Moore’s Law, which has been recently viewed as finally coming to a halt. Developments like this, with alternative architectures or alternative ways to perform computations, are likely to continue to lead to exponential improvements in computing power for years to come, compatible with Moore’s Law.

Whole brain emulation in a super-computer?

The largest spiking neural network simulation performed to date modeled the behavior of a network of 1.8 billion neurons, for one second or real time, using the 83,000 processing nodes of the K computer. The simulation took 40 minutes of wall-clock time, using an average number of sinapses, per neuron, of 6000.

This result, obtained by a team of researchers from the Jülich Research Centre and the Riken Advanced Institute for Computational Science, among other institutions, shows that it is possible to simulate networks with more than one billion neurons in fast supercomputers. Furthermore, the authors have shown that the technology scales up and can be used to simulate even larger networks of neurons, perhaps as large as a whole brain.
K-Comp-640x353The simulations were performed using the NEST software package, designed to efficiently model and simulate networks of spiking neurons. If one extrapolates the use of this technology to perform whole brain (with its 88 billion neurons) emulation, the simulation performed using the K super-computer would be about 100,000 times slower than real time.

The K-computer has an estimated performance of 8 petaflops, or 8 quadrillion (10 to the 15th power) floating point operations per second and is currently the world’s fourth fastest computer.

The end of Moore’s law?

Gordon Moore, scientist and chairman of Intel, first noticed that the number of transistors that can be placed inexpensively on an integrated circuit increased exponentially over time, doubling approximately every two years. This corresponds to an exponential increase on the number of transistors per chip that led to an increase by a factor of more than 1,000,000 in 40 years. Moore’s law has fueled the enormous developments in computer technology that have revolutionized technology and society in the last decades.

law

A long standing question is for how long will Moore’s law hold, since no exponential growth can last forever. The Technology Quarterly section of this week edition of the  Economist, summarized in this short article, analyzes this question in depth.

The conclusions are that, while the rate of increase of the number of transistors in a chip will become smaller and smaller, advances in other technologies, such as software and cloud computing, will cover the slack, providing us with increases in computational power that will not deviate much from what Moore’s law would have predicted.

Gordon_Moore_Scientists_You_Must_Know

Image of computer scientist and businessman Gordon Moore. The image is a screenshot from the Scientists You Must Know video, created by the Chemical Heritage Foundation, in which he briefly discusses Moore’s Law