Automating Intact Mass Deconvolution: It’s About Time
Since Dr. John Fenn put the “wings on elephants” that described his early work on electrospray mass spectrometry of macromolecules, we have been challenged to identify the most effective means of simplifying the multiply charge spectra that these large molecules generate. From a simple algebraic algorithm that he and his student Matthias Mann first demonstrated in the seminal 1989 protein electrospray mass spectrometry (ESI-MS) paper, to the use of more sophisticated maximum entropy and Bayesian processing algorithms, there have been continued attempts to produce the most accurate and comprehensive deconvoluted zero-charge mass spectra from the electrospray LC-MS analysis of biological macromolecules.
The Journey to Automating Biopharmaceutical LCMS Analysis
Waters has been progressing intact mass analyses for an equally long time. In John Fenn’s 2002 Nobel Award lecture (John B. Fenn – Nobel Lecture – NobelPrize.org), he acknowledges Brian Green (*) of VG Analytical (a progenitor of Waters Mass Spectrometry), for inspiring the efforts to look at larger molecules, and arranging “for VG to lend us a used quadrupole analyzer that could weigh ions with mass/charge ratios up to 1500” so they could investigate this possibility. Brian was the driving force for ESI development within VG and would subsequently form a key collaboration with Dr. John Skilling of the Cavendish Institute at the University of Cambridge and a founder of MaxEnt Solutions Ltd. A resulting publication in 1992 entitled “Disentangling ESI Spectra with Maximum Entropy” laid the basis for the first commercial maximum entropy-based macromolecule spectral deconvolution.
This effort to advance spectral deconvolution continued as Micromass emerged from VG/Fisons, and finally evolved into a fully integrated part of the Waters Corporation, where Principal Research Scientist Dr. Keith Richardson continued the work with Dr. Skilling, with modernizing the MaxEnt1 and MaxEnt3 algorithms for modern computing systems and optimizing them to support higher resolution time-of-flight (TofF) MS spectral data. These improvements became available not only in MassLynx, but also BiopharmaLynx, the first software tool designed to streamline and automate the processing of intact protein and peptide mapping data.
The First Universal Deconvolution Algorithm
The collaboration between Dr. Skilling and Dr. Richardson, and the continual improvements in computing power have also resulted in the development of the BayesSpray deconvolution algorithm – the first “universal deconvolution algorithm.” This made way for the advent of true Bayesian deconvolution process, capable of deconvolving mass spectra of both lower and higher molecular weight species – for lower and higher resolution MS data. First appearing in an American Society for Mass Spectrometry (ASMS) 2010 poster, this algorithm was subsequently incorporated into UNIFI informatics platform for processing of top-down and intact mass data.
Advancement in this area has not just been a story coupled to the proficiencies in computing power. In recognizing the need for these increasingly powerful algorithms to be self-sufficient and self- optimizing, we eliminated the need to rely on the use of user-entered parameters to achieve accurate and meaningful results.
The following sections detail the approach we’ve taken to not only speed up the processing of macromolecule electrospray MS data but to build smarter deconvolution workflows that:
- achieve this goal of automatically recognizing chromatographic peaks
- determining the best parameters for processing the spectra under these peaks
- returning a result that can be meaningfully interpreted by those not expert in protein mass spectrometry
Developing the New waters_connect INTACT Mass App: User-Centric Process
Before starting development of the new waters_connect INTACT Mass app, we talked to users of deconvolution algorithms in Biopharma, and met scientists who were dealing with hundreds to thousands of different macromolecules every week. We set ourselves the target of supporting users who need to provide a high capacity, high throughput service for mass confirmation and purity determination. We were challenged to provide a way to analyze a 384 well plate of samples where each sample is a unique macromolecule; make the results for the whole plate available a minute or two after the LC-MS data acquisition is completed; and to make results immediately available for urgent samples even though data for other samples in the plate are still being acquired. To meet this goal, we made some major architectural changes to how we implement our deconvolution algorithms:
- Making use multithreading for deconvolution, so that multiple mass spectra can be deconvolved simultaneously. We saw this as key to ensuring that multiple peaks could be deconvolved within the timespan of an acquisition.
- Enabling data processing to occur in parallel with data acquisition. This means that the results for a 384 well plate can be ready shortly after the data acquisition has finished. It also means that scientists can provide urgently needed results for samples when other samples in the same plate are still being analyzed.
If turnaround times are to be fast, it is important that the speed of deconvolution is maintained during large runs. When analyzing a 384 well plate, significant quantities of data are being written to and read from a database, while mass spectra are being deconvolved. To further complicate matters, the analysis is getting bigger and bigger throughout the run as more data is acquired which can further impact read and write times. To ensure scientists can offer a fast turnaround time for deconvolved mass workflows, we extensively tested the integrated INTACT Mass workflow on a live system throughout development checking the rate of deconvolution as the analysis proceeds.
User Challenges That We Addressed
- Challenges requiring new capabilities in routine intact mass workflows: supporting impurity analysis, unexpected components in samples, human error, artefacts in deconvolved spectra, and the high level of expertise required to use deconvolution algorithms.
- New implementation: we set ourselves another target of automating the production of correctly deconvolved spectra without the user to supply expected masses, expected mass ranges, or the input m/z range. We had in mind impurity analysis, degradation studies, the ability to use generic methods and the ability to perform an untargeted analysis when we automated the setting of parameters to deconvolve mass spectra.
- Challenges arising from next generation therapies: scientists in laboratories dealing with new modalities such as novel custom chemistry oligonucleotides, novel custom chemistry peptides, and conjugates explained challenges with using existing deconvolution algorithms owing to a limited selection of isotope models: usually just types of natural macromolecule.
- New implementation: we decided to allow users to create their own isotope models. This improves mass accuracy, and when custom chemistries which include elements such as chlorine are used, the scientist has a simpler spectrum to review.
- Challenges arising from the diversity of biotherapeutic molecule classes: scientists also spoke to us about the pros and cons of different deconvolution algorithms and their preferences for monoisotopic and average mass spectra. It was clear that a choice of algorithms was needed.
- New implementation: we decided to offer users a choice of automatically created MaxEnt1 and BayesSpray spectra. The widely used MaxEnt1 algorithm, based on maximum entropy, produces excellent spectra for larger macromolecules and is important for comparisons to legacy results.
- Challenges of data quality: the creation of artefacts which users reported as a drawback with this algorithm. A newer nested-sampling algorithm, BayesSpray, has important advantages for isotopically resolved data while still performing well for non-isotopically resolved data from large macromolecules.
- New implementation: we now offer users the choice of monoisotopic and average mass spectra for BayesSpray and considered processing speed in this new implementation.
It is About Time
The realization of the goal to automatically process macromolecule electrospray MS data sets in the waters_connect INTACT Mass app has been a long time coming, now 30 years from the publication of protein spectral data that opened our eyes to the possibilities. The improvements in personal computer processing power, the advancements in deconvolution algorithms from simple algebraic processing to state-of-the-art Bayesian analysis, and the constantly improving capabilities of electrospray MS systems have brought us to this point where the creation of a zero-charge spectrum from multiply charged spectral data can be accomplished successfully, and no longer with the need for expert user intervention.
In the end it is about time. It’s about the time saved in conducting an analysis and communicating the result. It’s about the time required to take someone unpracticed in the art of large molecule mass spectrometry and making them capable of generating quality results. It’s about the time saved by getting the right result the first time you analyze the sample. It is most certainly about time.
*We note with sadness the passing of Brian Green, OBE in December of 2021, and refer readers to the 1996 article in Rapid Communications that captures details of his early career – a scientific career that continued to progress long after his supposed retirement.