Artificial Intelligence and Machine Learning: A Tale of Summers and Winters
The journey of artificial intelligence (AI) and machine learning (ML) through time unfolds like changing seasons: peaks of excitement and troughs of scepticism. These cycles, metaphorically termed "AI summers" and "AI winters," encapsulate the dynamic nature of the field.
Listen to this article
Audio copyright: immunitoAI
Timeline of the evolution of artificial intelligence, machine learning and deep learning. (Source: NVIDIA)
The First Summer: Dawn of AI
The roots of AI can be traced back to the mid-20th century. The period from 1940 to 1960 saw a technological revolution, accelerated by World War II. By the 1950s, scientists, mathematicians, and philosophers had assimilated the concept of AI into their cultural and intellectual framework, combined with a desire to bring together machines and living beings. In 1950, Alan Turing proposed a “learning machine”, laying the conceptual foundation for AI. A Turing machine is an abstract mathematical model for understanding the limits and capabilities of computation. Turing test assesses a machine's ability to exhibit intelligent behaviour indistinguishable from that of a human. The test remains a pivotal point of philosophical discussions about intelligence, consciousness, and mind. However, computers lacked a crucial element for intelligence— they could execute commands but couldn't store them. Additionally, computing was prohibitively expensive in the early 1950s.
In 1956, John McCarthy coined the term "artificial intelligence" at the Dartmouth Conference, often considered the birthplace of AI. The pioneers of AI were inspired by the idea of creating machines that could replicate human intelligence and perform tasks that typically required human intelligence, such as problem-solving, reasoning, and decision-making. Early successes and advocacy of leading researchers led to a surge in funding and AI flourished.
The optimism reached its pinnacle in the late 1950s and 1960s, often referred to as the first AI summer, a time of grand expectations. During this period, researchers were fueled by the belief that machines could replicate human intelligence. At the same time, integrated circuits were developed which combined multiple transistors on a single semiconductor chip. Computers could now store more information, and became faster and cheaper. There was abundant funding from the government, particularly in the US and UK for AI research. Researchers worked on symbolic AI, using rules and logic to create intelligent systems. Some successes included the creation of programs that could play chess and the earliest examples of a chatbot.
In the late 1950s, an early type of neural network designed for pattern recognition, the perceptron, was designed. The groundwork for neural networks was laid in the 1940s with concepts of converting continuous input to discrete output and strengthening connections between neurons. It was one of the earliest attempts to create a machine that could learn from experience. The development of perceptron was considered a breakthrough that gained wide media attention.
Marvin Minsky, Claude Shannon, Ray Solomonoff and other scientists at the Dartmouth conference (Photo: Margaret Minsky)
Initial enthusiasm surrounding the perceptron faded in 1969 when a book arguing limitations of single-layer perceptrons in solving complex problems was published. Despite initial successes, rule-based systems struggled to handle the complexity and ambiguity of the real world. In 1973, the Science Research Council in the UK commissioned a comprehensive assessment of the state of AI- the Lighthill Report. The report was critical of the progress in AI, emphasising the failure to meet commercial goals and the lack of practical applications. Funding for AI research dwindled as scepticism grew about the feasibility of creating truly intelligent machines. US defence funding had contributed to the foundational work in the 1960s and 1970s, but withdrew their funding of new AI research. The craze of AI had fallen due to mismatch between expectation and the cold reality. This setback contributed to the first AI winter from 1974 to 1980.
The 1984 conference of Association for the Advancement of Artificial Intelligence (AAAI, founded in 1979) involved prominent figures in the AI community discussing the challenges and future prospects. The discourse highlighted the existing concerns about the field and a cautious outlook on the potential of artificial intelligence which played a role in reshaping perceptions and welcoming change.
The Second Summer: Rediscovering Neural Networks
The 1980s witnessed a newfound promise of neural networks. This rediscovery was supplemented with the advent of the first microprocessors at the end of 1970. Neural networks, inspired by the structure and function of the human brain, known as connectionism, were gaining attention. A Multilayer Perceptron (MLP) is a type of artificial neural network characterised by multiple layers of interconnected nodes, including an input layer, one or more hidden layers, and an output layer. MLPs were trained using supervised learning and the backpropagation algorithm, adjusting weights to minimise the difference between predicted and actual outputs.
The second AI summer popularised deep learning techniques which allowed computers to learn using experience. In 1982, the Hopfield network, a recurrent neural network (RNN) that could learn and remember patterns was created.RNNs could handle sequential data by maintaining a hidden state that retains information from previous steps. In 1987, convolutional neural networks (CNN) were invented. CNNs were able to extract features from input data, making them particularly effective in image-related tasks.
In the 1980s, expert systems were widely embraced. Expert systems are computer programs designed to emulate the decision-making ability of a human expert in a specific domain. Many companies and organisations invested heavily in developing expert systems in medicine, finance and engineering. There was a spark of resurgence for AI. Unfortunately, inflated expectations once again collided with underwhelming returns on investment. Limitations of Expert Systems became apparent. Interest and funding shifted to other fields, once again leading to the disillusionment. Thus, the second AI winter started in 1987 and ended as the 1990s progressed.
Garry Kasparov versus Deep Blue, 11 May 1997 (Photo: Adam Nadel)
Even with the lack of funding, some landmark achievements were made in the 1990s. Graphics Processing Units (GPUs) were developed and were well-suited for computations in machine learning, leading to significant speedups for algorithms like neural networks. In 1995, Random Forest Algorithm and Support Vector Machine were discovered. Long short-term memory (LSTM) recurrent neural networks were developed in 1997, forming the basis for advanced recurrent and sequential models. In the same year, the reigning world chess champion Garry Kasparov lost to Deep Blue, a computer, in a highly popular match.
With lessons learnt from previous winters, there was a keenness for adapting approaches and refining expectations. Technological progress towards the end of the 20th century helped reshape the landscape. Parallel processing had gained prominence due to GPUs, solving the challenge of speed and scalability. Growing presence and accessibility of the internet made it possible for researchers to gain access to vast amounts of training data through cloud and public databases. These changes became the catalyst for a period of sustained thaw in the years to come, marking the beginning of a new AI summer.
The Glorious Summer of the 21st Century
The year 2009 proved to be a turning point with the introduction of deep learning into the mainstream. An MLP with three or more layers is considered a shallow neural network, while those with a more extensive hierarchy of hidden layers are categorised as deep neural networks. Geofrey Hinton had been advocating for neural networks with multiple layers, commonly known as deep neural networks, for many years. His ideas allowed each layer of the deep network to be trained independently and then fine-tuned collectively. This technique proved to be a game-changer, overcoming the longstanding challenges.
This paradigm shift focused on building systems that could learn from data, allowing them to improve and adapt without being explicitly programmed. In 2009, ImageNet, a large visual database was created with over 14 million labelled images reflecting real world data. This was the beginning of a watershed moment in computer vision and machine learning. Alexnet, a deep convolutional neural network (CNN) without the layer-by-layer pre-training was developed in 2012, demonstrating unprecedented results in image recognition. In the same year, Andrew Ng led the Google Brain team in building a deep learning architecture using CNN that explored unsupervised learning. 10 million unlabeled images were taken randomly from YouTube, and the neural network demonstrated an inherent capability to autonomously learn and identify objects, such as pictures of cats. These achievements were instrumental in bringing deep learning into the spotlight.
Visual representation of a neural network (Copyright: immunitoAI)
In this era of innovation of the 2010s, generative AI emerged. Variational Autoencoders (VAE) were introduced as a generative model in 2013. VAEs enabled generation of new, diverse outputs and were used in diverse fields, from image generation to drug discovery. In 2014, Ian Goodfellow and his colleagues developed Generative Adversarial Networks (GANs), a powerful tool for generative AI, opening new horizons for image generation and data augmentation. Diffusion models were introduced in 2015 to sample from a highly complex probability distribution and found applications in image processing, signal denoising, and understanding dynamic processes.
Reinforcement learning took centre stage in 2015 when Google’s AlphaGo defeated Go grandmaster Lee Sedol. This victory showcased the prowess of machine learning in mastering complex games. Graph Neural Networks (GNN) were introduced in 2009 but it was only in the mid-2010s that researchers began exploring how neural networks could be adapted to handle data represented as graphs. Graphs play a crucial role in maps and weather forecasts by analysing complex spatial and meteorological data for accurate predictions and real-time insights. GNNs are used to model user connections in social media platforms for personalised content suggestions.
In 2017, Google introduced the Transformer architecture, a novel network design which replaced RNNs and revolutionised Natural Language Processing (NLP) tasks. Transformers became the backbone of state-of-the-art models like BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) in 2018. GPT-3, the largest language model at the time, capable of performing a wide range of language tasks, was released by OpenAI in 2020. A year later, OpenAI extended this architecture to DALL-E, a model that generates images from textual descriptions. Large Language Models like GPT-4, released in 2023 exhibit advanced language understanding and can be fine-tuned for specific applications. These sophisticated models are at the forefront of the next wave where AI is reshaping the way we interact with the world.
Read more about computational approaches in solving the protein-folding problem and designing novel proteins here.