Democratizing Bioinformatic And Biological R&D

At some point in their careers, most innovative founders talk about their lightbulb moments—breakthrough discoveries in the lab; buried results from a data report you've reviewed dozens of times already; the clarity when a product problem translates to market opportunity.

For me, it wasn't a lightbulb - it was when I combined my two passions of molecular biology and computer science to make plants glow.

When I was at university, one of my first projects was to genetically modify plants to express genes capable of enabling bioluminescence. For a budding molecular biologist, this was a catalytic moment, a scientific project that demonstrated how truly remarkable plants are, and an attempt to push the boundaries on what is possible in the synthetic and plant biology spaces.

This project also demonstrated that one of the fundamental issues with natural bioluminescence is that the metabolic pathway necessary for the emission of light is largely inefficient; that is why most bioluminescent organisms in nature are quite dim and difficult to see with the naked eye.

These metabolic reactions are catalyzed by special proteins known as enzymes. By optimizing these enzymes it is possible to achieve greater light output and therefore a brighter plant.

The challenge was that for the average enzyme, there are more possible mutations than there are atoms in the universe, and most of the mutations either make the enzyme worse or completely non-functional.

Most traditional approaches like 'Deep Mutational Scanning' basically involve making random mutations until you get something satisfactory - This process can cost hundreds of thousands of dollars and finding a satisfactory mutation can take years.

It dawned on me that a lot of the research that I was doing could be streamlined using machine learning models and tools. Moreover, I also noticed a shortage in high accuracy AI-based methods for enzyme optimization - This greatly piqued my interest.

This recognition led me to try to apply my knowledge of data and computer science—another passion of mine—to the large data sets obtained in molecular biology – notably in protein engineering and genetically engineering organisms.

And that directly led to the creation of the NeuroFold model, which was designed to make precise mutations that lead to enzymes with specific and desired properties.

The two key innovations behind NeuroFold are its multimodal approach to understanding the protein fitness landscape, as well as leveraging a functional baseline.

NeuroFold goes beyond traditional approaches to protein fitness prediction and enzyme optimization, which typically only focus on a single modality, such as a sequence, evolutionary information, or structure. It strategically leverages information from all three modalities in a concurrent way without "leaking" information from the other modalities. This gives NeuroFold a substantially greater understanding of the protein fitness landscape that no previous models could capture. In doing so, we dramatically shortened the time and cost associated with finding the needle in the haystack.

To make NeuroFold even more efficient, we learned to "bias" the model using an existing template. Other protein-related models, especially protein language models, can at best generalize to a very select few protein families (without fine tuning). The NeuroFold model, though, operates in a unique way where constant comparisons to the template are critical to properly constraining the model into accurately understanding the intricacies of the input structure. It was this tweak that enabled NeuroFold to achieve a 40-fold increase in performance over Meta's ESM-1v model.

Today, Neurosnap's second generation Biology Suite includes over 50 innovative artificial intelligence-based tools and models designed to accelerate research across a broad number of tasks in molecular biology. Some of the most prominent changes consist of improvements and optimizations like AlphaFold2, and the addition of new tools for drug and protein design.

Tools like Google DeepMind's AlphaFold2 were revolutionary, as they not only drastically improved scientists' ability to quickly reason about a protein's structure but also invigorated interest in computational biology.

But AlphaFold2 as designed was prohibitively expensive for larger proteins and complexes and far too technical for most researchers to use effectively. Additionally, it could also require very specialized personnel and equipment that would then need to be maintained, further adding to costs.

Using Neurosnap's AlphaFold2 implementation, which adds additional confidence metrics on top of its own metrics, scientists are able to reliably assess whether or not the production is accurate in a much shorter time frame. Moreover, scientists using the Neurosnap biology suite do not have to have sophisticated computer coding skills.

In barely a year after forming the company, Neurosnap's list of bioinformatic tools and services has grown to include drug design; inverse folding; molecular docking and dynamics; protein annotation, clustering, conformations, design, expression, folding, localization, solubility, and stability; RNA sequencing and transcriptions analysis; signal peptide detection; and toxicity prediction.

Already our tools are enhancing research in drug design, development of antibodies against specific targets, enzyme development (including industrial enzymes used in the food industry), and other life-saving and life-extending research.

It is truly humbling to know that the work resulting from a university project to make plants glow is lighting the way to democratizing essential research -- the kind that is enabling the discoveries of tomorrow.

Keaun Amani is the CEO and Founder of Neurosnap.

Medicine