Artificial Intelligence (AI) has been broadly defined as intelligence exhibited by machines and a branch of AI called Deep Learning (DL) has garnered a lot of attention in the past few years. DL, or hierarchical learning, is a subset of Machine Learning that mimics the function of the human neocortex. The neocortex is the part of the brain involved in complex brain functions such as sensory perception, motor commands, spatial reasoning, language and cognition.
A company called DeepMind, which was acquired by Google in 2014, has been driving many of the breakthroughs in DL, fueled by a lot of the data Google has access to. Note that DL algorithms traditionally “learn” by digesting large amounts of data. As we will see, there are new ways in which DL is starting to learn.
The most recent breakthrough was achieved by Professor Tuomas Sandholm and a team of his graduate students at Carnegie Mellon University running on supercomputing hardware from HPE.
We will look at the three main breakthroughs that have occurred in the past 2+ decades, explain at a high level the different approaches that were used and discuss some exciting possibilities for how this emerging technology might be used in the future.
The First Breakthrough
Dial back to 1996-97 when a system called Deep Blue became the first AI program to beat a reigning world chess champion. This system was originally developed by Feng-hsiung Hsu at Carnegie Mellon University, but was completed after he joined IBM along with fellow team members Thomas Anantharaman and Murray Campbell. Chess has a relatively small playing board of 8×8 squares and while there are many possible moves at any point in a game, the problem could be solved by brute-force computing power, which was provided by IBM hardware at the time. That system was capable of evaluating 200 million positions per second.
The basic approach created a generalized algorithm with a lot of parameters. The system then calculated the optimal values of these parameters by analyzing hundreds of thousands of master and grandmaster games. The results were then fine-tuned by actual grandmasters themselves. The system would generally evaluate searches to a depth of 6-8 moves, with the ability to go 20 moves or more in certain cases.
The pure compute power applied to the problem made it appear to the human opponent, reigning world champion Garry Kasparov, that at times it was exhibiting deep intelligence and creativity. In reality, it was simply able to evaluate more data more quickly to mathematically calculate and present optimum moves.
In today’s computer chess research, the focus has shifted from computer hardware to more optimized software. By comparison, in a November 2006 match, a modern chess program named Deep Fritz beat world chess champion Vladimir Kramnik, with the program running on a common desktop computer with a dual-core Intel Xeon 5160 CPU that was capable of evaluating only 8 million positions per second, but searching to an average depth of 17 to 18 moves using a heuristics technique.
Advance Number Two Emerges
The next major advance was accomplished in March 2016 when an AI system developed by DeepMind called AlphaGo beat one of the highest ranked world Go champions, Lee Sedol, in 4 of 5 games. Go is a more complex game than chess with a 19×19 board and the added complexity of pieces being flipped when surrounded by an opponent’s stones. AlphaGo applied DL techniques rather than the brute force approach of Deep Blue. This approach leveraged a lot of data from prior human matches to train the AI.
The version that beat Lee, AlphaGo Lee, was replaced in late 2016/early 2017 by a version called AlphaGo Master, which reduced the compute power from 48 distributed TPUs to 4 TPUs running on a single machine. A TPU, Google’s Tensor Processing Unit, delivers 15-30x higher performance than contemporary CPUs and GPUs. AlphaGo Master won 60 of 60 online matches against a group that included most of the world’s best players and world champion Ke Jie, who it beat 3 of 3 matches.
In an article published in October 2017, the AlphaGo team announced AlphaGo Zero, which was a version that learned without human data by only playing itself. This technique is known as reinforcement learning. AlphaGo Zero used one neural network to do this rather than the two used by earlier versions of AlphaGo – a “policy network” to select the next move to play and a “value network” to predict the winner of the game from each position. This new algorithm surpassed AlphaGo Lee in 3 days and AlphaGo Master in 21. By the 40th day it had surpassed all previous versions!
In another paper released in December 2017, DeepMind claimed that it generalized AlphaGo Zero’s approach into an algorithm named AlphaZero, which achieved within 24 hours a superhuman level of play across the games of chess, shogi (also known as Japanese chess), and Go by defeating the respective world-champion programs – Stockfish, Elmo, and the 3-day version of AlphaGo Zero. Note that both AlphaGo Zero and AlphaZero both ran on a single machine with 4 TPUs.
The Most Recent Milestone
While both chess and Go have their own respective levels of complexity, they both involve perfect information. In other words, both players can see all the pieces on the board at all times. The game of heads-up (or two-player), no-limit Texas Hold’em, on the other hand, is a game of imperfect information. A player cannot see the two face down cards dealt to the other player and the three cards turned up over the three rounds of play after the deal are unknown. To provide a sense for this complexity, when you’re dealt into a game, the cards you’re dealt and the communal cards that appear are one possibility of 10^160. That’s one followed by 160 zeroes, or more than the number of atoms in the universe. This represents a problem that cannot readily be brute-forced by simply throwing compute power at it.
Around the same time that AlphaGo Master was making strides against its predecessor AlphaGo Lee in January 2017, an AI program named Libratus was pitted against four top human poker players – Jason Les,
Dong Kim, Daniel McAulay and Jimmy Chou. After 20 days of play and 120,000 hands of poker, Libratus emerged as the winner.
Carnegie Mellon Professor Tuomas Sandholm and his graduate students developed Libratus as a successor to a prior version called Claudico (originally called Tartanian). Libratus used three different approaches that worked together, and this was its key differentiator.
First, it used a technique called reinforcement learning in which the program used random trial and error to learn by playing a game against itself, via an algorithm known as counterfactual regret minimization. Note that this technique appears to have subsequently been adopted by the latest versions of AlphaGo (AlphaGo Zero and AlphaZero). The technique ends up testing such a wide range of approaches that it finds some optimized strategies that humans would otherwise not think to try. In certain cases, this actually ended up throwing human opponents off.
The second approach was an “end-game solver” that would look at the current state of play and then help focus the counterfactual regret minimization algorithm. This was important because the primary algorithm no longer had to run through all possibilities. Libratus didn’t just learn from modeling prior match results; it also learned while it was playing.
The third approach dealt with scenarios where the opponent recognized certain patterns of play and began to exploit them. The third algorithm identified those patterns and removed them. This three-pronged approach was both innovative and effective, producing a system that could actually out-bluff a human.
Where Else Might This Apply?
These advances in AI are exciting for a few reasons. For one, they keep tackling more difficult problems. In addition, the speed at which these advances are being made appears to be accelerating. Finally, the algorithms are becoming more efficient so they require less compute resources to run.
There are numerous problems that can now be targeted with these new approaches, and for the purpose discussing these we will look at them in three distinct categories: Pattern Recognition, Human Behavior and a Combination of the two.
Pattern recognition is a key component of human perception and it occurs within the neocortex. Any data where the recognition of known patterns and identification of anomalies to those known patterns can be tackled with these techniques.
Some examples include:
Cybersecurity – In this domain, network traffic represents vast amounts of data streaming in real time. Identifying known and acceptable patterns in this flow of data as well as patterns that are not recognized can help flag potential security breaches and take action to stop them immediately.
Medical Diagnoses – Whether it might be the identification of patterns of symptoms across medical records or recognizing potential problems from medical imaging that may escape detection by the human eye, these new AI techniques can assist medical professionals to provide even better levels of services that can help save lives.
Legal – Think of all the time lawyers spend reviewing contracts. This is a perfect problem to be solved by AI, where known patterns of legal verbiage can be identified and exceptions can be highlighted for review by human lawyers.
Poker represented a problem with imperfect information and the need to identify patterns of human behavior. Below are some examples where modern AI might be applied in similar situations:
Capital Markets Trading – This is a place where historical data is combined with human traits such as fear and greed, with a little herd mentality added to the excitement. Some of the new AI techniques have the potential to improve performance and this could lead to significant profits.
Psychoanalysis – This is a field in which new services are appearing online to provide patients with more timely and cost effective solutions. AI techniques can be used to identify behavior patterns, coax information from patients and offer everything from advice to recommendations of appropriate medical professionals to consult with.
Business and Political Negotiations – There are lots of similarities between these situations and poker, including imperfect information, bluffing and attempting to maximize gain. Modern AI could be used to assist the humans involved in negotiations to help optimize individual outcomes.
Some problems involve a complex combination of pattern recognition, imperfect information and human behavior. One example involves the realm of Intelligence Collection. Many nations and non-governmental organizations are actively engaged in intelligence operations to collect information about other countries for the overall purpose of national security. This data can take many forms, including satellite imagery, voice/text/email communications and direct surveillance, to name just a few. The information is inherently imperfect as nations do their best to keep certain information secret. It also involves the added complexity of tracking human behavior, where in some cases that behavior is purposely misleading.
These newer advances in AI are well suited for making progress in this area. Digesting massive amounts of data and making sense of the patterns, identifying and initiating deceptive practices (bluffing) and helping to discern positive correlations from false positives are all helpful in making our intelligence services more effective in keeping us safe.
The latest advances in AI have provided innovative techniques that apply multi-pronged approaches to solve problems. As we’ve gained more insights along these lines, we’ve also made significant progress in making algorithms that are more generalized and can run effectively with less pure compute power.
In this article we’ve explored some of these recent advances in the context of prior breakthroughs. We’ve also taken a look at some of the many possibilities that present themselves as problems that can be addressed with some of these new approaches.
The results of these breakthroughs have been promising and exciting, both in terms of the capabilities achieved as well as the speed at which these improvements have emerged. Seems the better we get, the faster we get better!