We have tested how introducing an unordered beam search algorithm, a staple in computer science, to modify the order of sequence editing, is compared to a more flexible, random ordering approach. We also create a new hybrid, gradient EVO, that enhances the indicated evolutionary algorithm by using model gradients to guide its mutations, and independently assess how important the gradient is to select a particular edit and how important it is to select a particular edit.
We also developed Adabeam, a hybrid adaptive beam search algorithm that combines the best-performing non-gradient design algorithm, Adalead, with the most effective elements of unordered beam search. Adaptive search algorithms are usually not considered randomly. Instead, their actions change as a result of searching to focus their efforts on the most promising areas of the sequence space. Adabeam's hybrid approach maintains a “beam”, or collection of the best candidate sequences ever found, greedily expanding particularly promising candidates until they are thoroughly investigated.
In reality, Adabeam starts with a group of candidate sequences and their scores. In each round, select a small group of highest score sequences that will first act as “parents.” For each parent, Adabeam generates a new set of “child” sequences by randomizing a random number of random but induced mutations. Then, following a short, greedy search path, the algorithm can quickly “walking” uphill in fitness situations. After thorough research, all newly generated children are pooled together, and the algorithm selects the absolute best child, forms the starting population for the next round, and repeats the cycle. This process of adaptive selection and target mutations allow Adabeam to efficiently concentrate on high-performance sequences.
Computer-aided design tasks create difficult engineering problems due to the very large search space. These difficulties become more severe as they design longer sequences, such as mRNA sequences, and guide the design using modern, large-scale neural networks. Adabeam is particularly efficient in long sequences by using stochastic sampling of fixed calculations rather than calculations that scale by sequence length. To enable Adabeam to work on large models, we reduce peak memory consumption during design by introducing a trick called “Gradient Concatenation.” However, existing design algorithms that do not have these capabilities are difficult to scale to long sequences and large models. Gradient-based algorithms are particularly affected. To facilitate fair comparisons, limit the length of the designed sequence, even if Adabeam can scale more. For example, even if the DNA expression prediction model Enformer is run with ~200k nucleotide sequences, it will limit the design to just 256 nucleotides.
