The Book of Why

Correlation is not causation is a mantra that you may have heard many times, calling attention to the fact that no matter how strong the relations one may find between variables, they are not conclusive evidence for the existence of a cause and effect relationship. In fact, most modern AI and Machine Learning techniques look for relations between variables to infer useful classifiers, regressors, and decision mechanisms. Statistical studies, with either big or small data, have also generally abstained from explicitly inferring causality between phenomena, except when randomized control trials are used, virtually the unique case where causality can be inferred with little or no risk of confounding.

In The Book of Why, Judea Pearl, in collaboration with Dana Mackenzie, ups the ante and argues not only that one should not stay away from reasoning about causes and effects, but also that the decades-old practice of avoiding causal reasoning has been one of the reasons for our limited success in many fields, including Artificial Intelligence.

Pearl’s main point is that causal reasoning is not only essential for higher-level intelligence but is also the natural way we, humans, think about the world. Pearl, a world-renowned researcher for his work in probabilistic reasoning, has made many contributions to AI and statistics, including the well known Bayesian networks, an approach that exposes regularities in joint probability distributions. Still, he thinks that all those contributions pale in comparison with the revolution he speared on the effective use of causal reasoning in statistics.

Pearl argues that statistical-based AI systems are restricted to finding associations between variables, stuck in what he calls rung 1 of the Ladder of Causation: Association. Seeing associations leads to a very superficial understanding of the world since it restricts the actor to the observation of variables and the analysis of relations between them. In rung 2 of the Ladder, Intervention, actors can intervene and change the world, which leads to an understanding of cause and effect. In rung 3, Counterfactuals, actors can imagine different worlds, namely what would have happened if the actor did this instead of that.

This may seem a bit abstract, but that is where the book becomes a very pleasant surprise. Although it is a book written for the general public, the authors go deeply into the questions, getting to the point where they explain the do-calculus, a methodology Pearl and his students developed to calculate, under a set of dependence/independence assumptions, what would happen if a specific variable is changed in a possibly complex network of interconnected variables. In fact, graphic representations of these networks, causal diagrams, are at the root of the methods presented and are extensively used in the book to illustrate many challenges, problems, and paradoxes.

In fact, the chapter on paradoxes is particularly entertaining, covering the Monty Hall, Berkson, and Simpson’s paradoxes, all of them quite puzzling. My favorite instance of Simpson’s paradox is the Berkeley admissions puzzle, the subject of a famous 1975 Science article. The paradox comes from the fact that, at the time, Berkeley admitted 44% of male candidates to graduate studies, but only 35% of female applicants. However, each particular department (departments decide the admissions in Berkeley, as in many other places) made decisions that were more favorable to women than men. As it turns out, this strange state of affairs has a perfectly reasonable explanation, but you will have to read the book to find out.

The book contains many fascinating stories and includes a surprising amount of personal accounts, making for a very entertaining and instructive reading.

Note: the ladder of causation figure is from the book itself.

Deepmind presents Artificial General Intelligence for board games

In a paper recently published in the journal Science, researchers from DeepMind describe Alpha Zero, a system that mastered three very complex games, Go, chess, and shogi, using only self-play and reinforcement learning. What is different in this system (a preliminary version was previously referred in this blog), when compared with previous ones, like AlphaGo Zero, is that the same learning architecture and hyperparameters were used to learn different games, without any specific customization for each different game.
Historically, the best programs for each game were heavily customized to use and exploit specific characteristics of that game. AlphaGo Zero, the most impressive previous result, used the spatial symmetries of Go and a number of other specific optimizations. Special purpose chess program like Stockfish took years to develop, use enormous amounts of field-specific knowledge and can, therefore, only play one specific game.
Alpha Zero is the closest thing to a general purpose board game player ever designed. Alpha Zero uses a deep neural network to estimate move probabilities and position values. It performs the search using a Monte Carlo tree search algorithm, which is general-purpose and not specifically tuned to any particular game. Overall, Alpha Zero gets as close as ever to the dream of artificial general intelligence, in this particular domain. As the authors say, in the conclusions, “These results bring us a step closer to fulfilling a longstanding ambition of Artificial Intelligence: a general game-playing system that can master any game.
While mastering these ancient games, AlphaZero also teaches us a few things we didn’t know about the games. For instance, that, in chess, white has a strong upper hand when playing the Ruy Lopez opening, or when playing against the French and Caro-Kann defenses. Sicilian defense, on the other hand, gives black much better chances. At least, that is what the function learned by the deep neural network obtains…
Actualization: The NY Times just published an interesting piece on this topic, with some additional information.

AlphaZero masters the game of Chess

DeepMind, a company that was acquired by Google, made headlines when the program AlphaGo Zero managed to become the best Go player in the world, without using any human knowledge, a feat reported in this blog less than two months ago.

Now, just a few weeks after that result, DeepMind reports, in an article posted in, that the program AlphaZero obtained a similar result for the game of chess.

Computer programs have been the world’s best players for a long time now, basically since Deep Blue defeated the reigning world champion, Garry Kasparov, in 1997. Deep Blue, as almost all the other top chess programs, was deeply specialized in chess, and played the game using handcrafted position evaluation functions (based on grand-master games) coupled with deep search methods. Deep Blue evaluated more than 200 million positions per second, using a very deep search (between 6 and 8 moves, sometimes more) to identify the best possible move.

Modern computer programs use a similar approach, and have attained super-human levels, with the best programs (Komodo and Stockfish) reaching a Elo Rating higher than 3300. The best human players have Elo Ratings between 2800 and 2900. This difference implies that they have less than a one in ten chance of beating the top chess programs, since a difference of 366 points in Elo Rating (anywhere in the scale) mean a probability of winning of 90%, for the most ranked player.

In contrast, AlphaZero learned the game without using any human generated knowledge, by simply playing against another copy of itself, the same approach used by AlphaGo Zero. As the authors describe, AlphaZero learned to play at super-human level, systematically beating the best existing chess program (Stockfish), and in the process rediscovering centuries of human-generated knowledge, such as common opening moves (Ruy Lopez, Sicilian, French and Reti, among others).

The flexibility of AlphaZero (which also learned to play Go and Shogi) provides convincing evidence that general purpose learners are within the reach of the technology. As a side note, the author of this blog, who was a fairly decent chess player in his youth, reached an Elo Rating of 2000. This means that he has less than a one in ten chance of beating someone with a rating of 2400 who has less than a one in ten chance of beating the world champion who has less than a one in ten chance of beating AlphaZero. Quite humbling…

Image by David Lapetina, available at Wikimedia Commons.

The last invention of humanity

Irving John Good was a British mathematician who worked with Alan Turing in the famous Hut 8 of Bletchley Park, contributing to the war effort by decrypting the messages coded by the German enigma machines. After that, he became a professor at Virginia Tech and, later in life, he was a consultant for the cult movie 2001: A Space Odyssey, by Stanley Kubrick.

Irving John Good (born Isadore Jacob Gudak to a Polish jewish family) is credited with coining the term intelligence explosion, to refer to the possibility that a super-intelligent system may, one day, be able to design an even more intelligent successor. In his own words:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

We are still very far from being able to design an artificially intelligent (AI)  system that is smart enough to design and code even better AI systems. Our current efforts address very narrow fields, and obtain systems that do not have the general intelligence required to create the phenomenon I. J. Good was referring to. However, in some very restrict domains, we can see at work mechanisms that resemble the that very same phenomenon.

Go is a board game, very difficult to master because of the huge number of possible games and high number of possible moves at each position. Given the complexity of the game, branch and bound approaches could not be used, until recently, to derive good playing strategies. Until only a few years ago, it was believed that it would take decades to create a program that would master the game of Go, at a level comparable with the best human players.

In January 2016, DeepMind, an AI startup (which was at that time acquired by Google by a sum reported to exceed 500M dollars), reported in an article in Nature that they had managed to master the complex game of Go by using deep neural networks and a tree search engine. The system, called AlphaGo, was trained on databases of human games and eventually managed to soundly beat the best human players, becoming the best player in the world, as reported in this blog.

A couple of weeks ago, in October of 2017, DeepMind reported, in a second article in Nature, that they programmed a system, which became even more proficient at the game, that mastered the game without using any human knowledge. AlphaGo Zero did not use any human games to acquire knowledge about the game. Instead, it played millions of games (close to 30 millions, in fact, played over a period of 40 days) against another version of itself, eventually acquiring knowledge about tactics and strategies that have been slowly created by the human race for more than two millennia. By simply playing against itself, the system went from a child level (random moves) to a novice level to a world champion level. AlphaGo Zero steamrolled the original AlphaGo by 100 to 0,  showing that it is possible to obtain super-human strength without using any human generated knowledge.

In a way, the computer improved itself, by simply playing against itself until it reached perfection. Irving John Good, who died in 2009, would have liked to see this invention of mankind. Which will not be the last, yet…

Picture credits: Go board, picture taken by Hoge Rielen, available at Wikimedia Commons.


IBM TrueNorth neuromorphic chip does deep learning

In a recent article, published in the Proceedings of the National Academy of Sciences, IBM researchers demonstrated that the TrueNorth chip, designed to perform neuromorphic computing, can be trained using deep learning algorithms.


The TrueNorth chip was designed to efficiently simulate the efficient modeling of spiking neural networks, a model for neurons that closely mimics the way biological neurons work. Spiking neural networks are based on the integrate and fire model, inspired on the fact that actual neurons integrate the incoming ion currents caused by synaptic firing and generate an output spike only when sufficient synaptic excitation has been accumulated. Spiking neural network models tend to be less efficient than more abstract models of neurons, which simply compute the real valued output directly from the values of the real valued inputs multiplied by the input weights.

As IEEE Spectrum explains: “Instead of firing every cycle, the neurons in spiking neural networks must gradually build up their potential before they fire. To achieve precision on deep-learning tasks, spiking neural networks typically have to go through multiple cycles to see how the results average out. That effectively slows down the overall computation on tasks such as image recognition or language processing.

In the article just published, IBM researchers have adapted deep learning algorithms to run on their TrueNorth architecture, and have achieved comparable precision, with lower energy dissipation. This research raises the prospect that energy-efficient neuromorphic chips may be competitive in deep learning tasks.

Image from Wikimedia Commons

An autonomous car has its first fatal crash! Now what?

For the first time, an autonomously driven vehicle, a model S Tesla, had a fatal crash. According to the manufacturer, the car hit a tractor trailer that crossed the highway where the car was traveling. Neither the autopilot, which was in charge, nor the driver, noticed “the white side of the tractor trailer against a brightly lit sky“. In this difficult lighting conditions, the brakes were not applied and the car crashed into the trailer. The bottom of the trailer hit the windshield of the car, leading to the dead of its only occupant.

This was the first fatal accident to happen with and autonomous driven car, and it happened after Tesla Autopilot logged in 130 million miles. In the average, there is an accident for every 94 million miles driven, in the US, and 60 million miles, worldwide, according to Tesla.


It its statement, Tesla makes clear that Autopilot “is an assist feature that requires you to keep your hands on the steering wheel at all times,” and that “you need to maintain control and responsibility for your vehicle” while using it.

Nonetheless, this crash is bound to be the first to raise significant questions, with very difficult answers. Who is to blame for the fact that the autopilot did not break the car, in order to avoid the impact?

The programmers, who coded the software that was driving the vehicle at the time of accident? The driver, who did not maintain control of the vehicle? The designers of the learning algorithms, used to derive significant parts of the control software? The system architects, who did not ensure that the Autopilot was just an “assist feature“?

As autonomous systems, in general, and autonomous cars, in particular, become more common, these questions will multiply and we will need to find answers for them. We may on the eve of a new golden age for trolleyology.

March of the Machines

The Economist dedicates this week’s special report to Artificial Intelligence and the effects it will have on the economy.


The first article on the report addresses a two centuries old question, which was called, then, the machinery question. First recorded during the industrial revolution, this question asks whether machines will replace so many human jobs as to leave a large fraction of humanity unemployed. The impact on jobs is addressed in more detail in another piece of the report, automation and anxiety and includes a reference to a 2013 article, by Frey and Osborne. This article reportes that 47% of workers in America have jobs at high risk, including many white collar jobs.


Countermeasures to these challenges are discussed in some detail, including the idea of universal basic income but, in the end, The Economist seems to side with the traditional opinion of economists, that technology will ultimately create more jobs than it will destroy.

Other pieces on the report describe the technology behind the most significant recent advances in AI, deep learning and the complex ethical questions raised by the possibility of advances artificial intelligences.

Images in this article are from the print edition of The Economist.



How deep is deep learning, really?

In a recent article, Artificial Intelligence (AI) pioneer and Yale retired professor Roger Schank states that he is “concerned about … the exaggerated claims being made by IBM about their Watson program“. According to Schank, IBM Watson does not really understands the texts it processes, and the IBM claims are baseless, since no deep understanding of the concepts takes place when Watson processes information.

Roger Schank’s argument is an important one and deserves some deeper discussion. First, I will try to summarize the central point of Schank’s argument. Schank has been one of the better known researchers and practitioners of “Good Old Fashioned Artificial Intelligence”, or GOFAI. GOFAI practitioners aimed at creating symbolic models of the world (or of subsets of the world) that were comprehensive enough to support systems able to interpret natural language. Roger Schank is indeed well known for introducing Conceptual Dependency Theory and Case Based Reasoning, well-known GOFAI approaches to natural language understanding.

As Schank states, GOFAI practioners “were making some good progress on getting computers to understand language but, in 1984, AI winter started. AI winter was a result of too many promises about things AI could do that it really could not do.” The AI winter he is referring to, a deep disbelief in the field of AI that lasted more than a decade, was the result of the fact that creating symbolic representations complete enough and robust enough to address real world problems was much harder than it seemed.

The most recent advances in AI, of which IBM Watson is a good example, use mostly statistical methods, like neural networks or support vector machines, to tackle real world problems. Due to much faster computers, better algorithms, and much larger amounts of data available, systems trained using statistical learning techniques, such as deep learningare able to address many real world problems. In particular, they are able to process, with remarkable accuracy, natural language sentences and questions. The essence of Schank’s argument is that this statistical based approach will never lead to true understanding, since true understanding depends on having clear-cut, symbolic representations of the concepts, and that is something statistical learning will never do.

Schank is, I believe, mistaken. The brain is, at its essence, a statistical machine, that learns from statistics and correlations the best way to react. Statistical learning, even if it is not the real thing, may get us very close to the strong Artificial Intelligence. But I will let you make the call.

Watch this brief excerpt of Watson’s participation in the jeopardy competition, and answer by yourself: IBM Watson did, or did not, understand the questions and the riddles?

AlphaGo beats Lee Sedol, one the best Go players in the world

AlphaGo, the Go playing program developed by Google’s DeepMind, scored its first victory in the match against Lee Sedol.


This win comes in the heels of AlphaGo victory over Fan Hui, the reigning 3-times European Champion,  but it has a deeper meaning, since Lee Sedol is one of the two top Go players in the world, together with Lee Changho. Go is viewed as one of the more difficult games to be mastered by computer, given the high branching factor and the inherent difficulty of position evaluation. It has been believed that computers would not master this game for many decades to come.

Ongoing coverage of the match is available in the AlphaGo website and the matches will be livestreamed on DeepMind’s YouTube channel.

AlphaGo used deep neural networks trained by a combination of supervised learning from professional games and reinforcement learning from games it played with itself. Two different networks are used, one to evaluate board positions and another one to select moves. These networks are then used inside a special purpose search algorithm.

The image shows the final position in the game, courtesy of Google’s DeepMind.

Computers finally excel at Go



Go is a beautiful game, with a very large branching factor that makes it extremely hard for computers. For decades, playing this game well was outside the reach of existing programs.

We just learned that computers finally mastered Go, in a paper published in the journal Nature. By using machine learning techniques and, in particular, deep learning, the program AlphaGo, created by Google’s company DeepMind, managed to beat Fan Hui, the European Go champion, five times out of five. Whether AlphaGo is sufficiently strong to beat the best players in the world, remains to be seen. However, it already represents a very significant advance of the state of the art.

What was maybe the last bastion in table games still unconquered by computers is no more. Computers are now better than humans at all table games invented by humanity.