You’re not the customer, you’re the product!

The attention that each one of us pays to an item and the time we spend on a site, article, or application is the most valuable commodity in the world, as witnessed by the fact that the companies that sell it, wholesale, are the largest in the world. Attracting and selling our attention is, indeed, the business of Google and Facebook but also, to a larger extent, of Amazon, Apple, Microsoft, Tencent, or Alibaba. We may believe we are the customers of these companies but, in fact, many of the services provided serve, only, to attract our attention and sell it to the highest bidder, in the form of publicity of personal information. In the words of Richard Serra and Carlota Fay Schoolman, later reused by a number of people including Tom Johnson, if you are not paying “You’re not the customer; you’re the product.

Attracting and selling attention is an old business, well described in Tim Wu’s book The Attention Merchants. First created by newspapers, then by radios and television, the market of attention came to maturity with the Internet. Although newspapers, radio programs, and television shows have all been designed to attract our attention and use it to sell publicity, none of them had the potential of the Internet, which can attract and retain our attention by tailoring the contents to each and everyone’s content.

The problem is that, with excessive customization, comes a significant and very prevalent problem. As sites, social networks, and content providers fight to attract our attention, they show us exactly the things we want to see, and not the things as they are. Each person lives, nowadays, in a reality that is different from anyone else’s reality. The creation of a separate and different reality, for each person, has a number of negative side effects, that include the creation of paranoia-inducing rabbit holes, the radicalization of opinions, the inability to establish democratic dialogue, and the diffiulty to distinguish reality from fabricated fiction.

Wu’s book addresses, in no light terms, this issue, but the Netflix documentary The Social Dilemma makes an even stronger point that customized content, as shown to us by social networks and other content providers is unraveling society and creating a host of new and serious problems. Social networks are even more worrying than other content providers because they create pressure in children and young adults to conform to a reality that is fabricated and presented to them in order to retain (and resell) their attention.

Decoding the code of life

We have known, since 1953, that the DNA molecule encodes the genetic information that transmits characteristics from ancestors to descendants, in all types of lifeforms on Earth. Genes, in the DNA sequences, specify the primary structure of proteins, the sequence of amino acids that are the components of the proteins, the cellular machines that do the jobs required to keep a cell alive. The secondary structure of proteins specifies some of the ways a protein folds locally, in structures like alpha helices and beta sheets. Methods that can determine reliably the secondary structure of proteins have existed for some time. However, determining the way a protein folds globally in space (its tertiary structure, the shape it assumes) has remained, mostly, an open problem, outside the reach of most algorithms, in the general case.

The Critical Assessment of protein Structure Prediction (CASP) competition, started in 1994, took place every two years since then and made it possible for hundreds of competing teams to test their algorithms and approaches in this difficult problem. Thousands of approaches have been tried, to some success, but the precision of the predictions was still rather low, especially for proteins that were not similar to other known proteins.

A number of different challenges have taken place over the years in CASP, ranging from ab-initio prediction to the prediction of structure using homology information and the field has seen steady improvements, over time. However, the entrance of DeepMind into the competition upped the stakes and revolutionized the field. As DeepMind itself reports in a blog post, the program AlphaFold 2, a successor of AlphaFold, entered the 2020 edition of CASP and managed to obtain a score of 92.4%, measured in the Global Distance Test (GDT) scale, which ranges from 0 to 100. This value should be compared with the value 58.9% obtained by AlphaFold (the previous version of this year’s winner) in 2018, and the 40% score obtained by the winner of the 2016 competition.

Structure of insulin

Even though details of the algorithm have still not been published, the information provided in the DeepMind post provides enough information to realize that this result is a very significant one. Although the whole approach is complex and the system integrates information from a number of sources, it relies on an attention-based neural network, which is trained end-to-end to learn which amino acids are close to each other, and at which distance.

Given the importance of the problem on areas like biology, medical science and pharmaceutics, it is to be expected that this computational approach to the problem of protein structure determination will have a significant impact in the future. Once more, rather general machine learning techniques, which have been developed over the last decades, have shown great potential in real world problems.