For centuries, scientists have formulated universal laws in the form of mathematical equations, as a set of inputs producing an output. Biology, in its early days, did not adhere to this approach, until the revolutionary works of Darwin and Mendel in the 1800s. Their work demonstrated that biological systems shared common traits and patterns, expressed as “models”. In the early 20th century, mathematical and statistical methods were used to integrate Mendelian genetics with Darwinian evolution. Prof. Mukund Thattai, computational biologist at the National Centre for Biological Sciences, says, “There is a long tradition of using mathematical models in biology, for example in codifying the rules of evolutionary biology.” For instance, there are models that predict how a new mutation will affect a population. “The exciting thing is that now we can do experiments that track what these mutations do, and it fits with these theories”, he adds.
The models of evolution and heredity set the stage for thinking about biological processes in a systematic manner. While the concept of discrete units of inheritance had been developed, the actual nature of these units remained a mystery. The tools available at the time were inadequate to study fundamental biological components, such as proteins and DNA, but the nature and role of these molecules was actively under investigation. This created a pressing need for innovative approaches. As the mid-20th century unfolded, the motivation to bring together computation and biology intensified with our understanding of biological molecules.
The Code of Life
In 1950, Alan Turing, a young polymath, proposed the concept of machine intelligence, exploring the potential of computers thinking like humans, laying the foundation for artificial intelligence (AI). The 1950s was an exciting era.The role of DNA as a genetic information encoding molecule had been firmly established. The analogy with a code was contributed by Gamow, a physicist who inspired the biologists to develop a framework describing the flow of information. The word ‘code’ highlighted that the sequence of the DNA carries instructions for the synthesis of proteins, much like a code carries information in a language. The actual deciphering of the genetic code came in 1961, revealing the rules that dictate the translation of genetic information into proteins.
By that time, the first protein structure was deciphered and the first protein sequence, that of insulin, was published. As the size and variety of data grew, there was a need for repositories organising information, giving birth to "The Atlas of Protein Sequence and Structure" by Margaret Dayhoff and team in 1962. Around the same time, the notion of protein folding “problem” emerged.