Recently, there has been a lot of talk about ChatGPT, OpenAI’s large-scale language model (LLM) generation artificial intelligence engine. ChatGPT-3 (2020), based on 45 TB (terabytes) of text-based training data, is the first LLM with over 100 billion parameters (variables learned from the training process), changing the way we do things A powerful and highly visible tool. We use digital technology for learning and information retrieval. Or so it is said.
For the purposes of this exercise, I visited ChatGPT-4 (March 2023) between the 25th and 25th.th May and 3rd, 2023rd June 2023. The training dataset size and number of generated parameters for this version are not published.
But what has LLM done for chromatographers? And can a digital refugee like me effectively use this tool to advance the science of separation research?
I’m really new to this technology and recently started working on it for the first time as a physics undergraduate under the tutelage of my son who “couldn’t work without it”. Apparently, this is a great assistant for creating new computer code. Anyway, after a few basic lessons, I realized that conversations with ChatGPT should be more of a conversation than a “one-and-only” question, so I started investigating. Unlike other articles on the subject that present questions, answers from ChatGPT, and commentary on suitability of the answers, I would like to see what his LLM can offer the chromatography community and what needs to be done to do so. I have been conducting research with a focus on whether there is There. Of course, if you want to see the answer to each question, just type in the questions and conversations I’ve documented. More importantly, I intentionally did not make this a “narrow and deep” quantitative search. Rather, we wanted to explore different types of interactions that could yield useful information for chromatographers, which should become apparent as work progresses.
My initial searches were related to troubleshooting issues and the questions were:
- What causes peak tailing for late-eluting compounds in gas chromatography?
- Specifically, what causes peak tailing only in late-eluting compounds?
- What if the compound is a polycyclic aromatic hydrocarbon?
- Does the manufacturer have any advice on dealing with peak tailing when analyzing acidic compounds?
- What are the recommended analytical conditions for 2-ethylhexanoic acid?
The answers were very general and nothing specific I could do if I had this problem in my lab. Each question needed to be more specific to pull out actionable content. There was nothing here that I couldn’t get from a similar Google search.
- How often should the liners in my GC system be replaced?
- Do you have any specific advice on this subject regarding food analysis?
- Which liner do you recommend for pesticide screening of vegetable extracts?
- Could you tell me the specific part number?
These questions gave me some general information, but again nothing actionable. A search for a specific liner part number yielded only the names of major GC manufacturers and suggestions to contact them.
- How can I check the GC inlet for leaks?
- Will gas still flow if the GC is turned off?
- I asked a question referring to the answer of the leak test.
- How can a leak detection solution detect a gas leak if there is no gas flow?
Here, it was clear that LLM was aggregating information from various elements of the dataset (as of September 2021) and “building responses” on the fly. Essentially, I was asked to turn off the GC (and thus the gas flow) and use a leak detection liquid around the inlet. The last suggestion would have been unusable from a practical process point of view for GC inlet leak checking. In fact, I worry about the safety of some of the recommendations.
- What extracts might be observed when filtering 0.1M hydrochloric acid using a cellulose syringe filter?
- What extractables might be observed when filtering hexane using a cellulose syringe filter?
Although the answers to these two questions were still general, I must say that there were some useful suggestions about the types of extractables that could be seen when using each of the different solvents. I know this is a niche question. However, it is the kind of information that is probably difficult to access through a typical web search.
There is no advantage to using AI when it comes to providing something to help troubleshoot problems in the lab. ChatGPT-4 says it does not have access to specific tables of information or specific literature to help give a more specific response. Answers may be good for inexperienced chromatographers to give “directions for further investigation”, but they tend to be broad and sometimes contradictory, and must be established between user and technology. No trust level has been established. established in these early exchanges. As a colleague recently explained to me, people tend to give more weight to someone’s reaction if they know what they’ve said to them in the past was right or helpful. . I don’t feel that way with ChatGPT at the moment. Additionally, LLMs seem to “fold” very quickly when challenged. Even if the answer was more or less correct, when rebutted you get an apology and another suggestion in return, which is often less accurate than the previous answer.
See what happens when you search for information that can help you plan your separation and analyze your data.
- What is the SMILES string for Loratadine?
- Are you sure this is correct? Is ChemSpider saying something different?
- If the molecular formula is C, how is it correct?twenty twoH.twenty threeN.2○2Kure?
- I think this is still wrong. Can you recalculate?
What I got back was mostly nonsense and “coaching” didn’t improve it. Let’s try something simpler.
- What is the SMILES string for octadecanoic acid?
Bingo – Get it right the first time. Therefore, further research needs to be done on the limits of the model’s ability to generate useful SMILES information for more complex molecules.
- What is this compound? CN(C)C1=CC=C(C=C1)O
For the reader, this is the SMILES string for 4-dimethylaminophenol.
- Is this correct?
- Are you sure the SMILES string corresponds to norepinephrine?
Norepinephrine is (R)-4-(2-amino-1-hydroxyethyl)benzene-1,2-diol (noradrenaline) and does not look like the compound I was interested in at all. Once again we are losing faith in the model. ability.
- What compound has CAS number 375-22-4?
- is that really correct?
- I thought the CAS number was for heptafluorobutyric acid.
I’ve found that this model isn’t very good at looking up CAS numbers either. After suggesting dipropylene glycol and pyrroloquinoline quinone disodium salt in his two previous answers, he finally agreed with my opinion. So I thought it would be fairer to use physicochemical data.
- What is the pKa of dodecanoic acid?
This is close to the literature value I found (Barratt MD; Toxicol In Vitro 10: 85-94 (1996)).
This is close to the ACD Labs prediction found in ChemSpider (5.03).
- What is the LogD value at pH 7.4?
ACD Labs predictions were obtained from ChemSpider (2.09) and Chemaxon (2.06).
- Are you sure about the last answer?
The response indicates that the correct value cannot be calculated.
- Considering the pKa value is 4.88, is it not possible to calculate the LogD value at pH 7.4?
- So what is the LogD value at pH 5.5?
ACD Labs predictions were obtained from ChemSpider (3.89) and Chemaxon (3.82).
I chose to include the full response from ChatGPT-4 here.
“To estimate the LogD value of dodecanoic acid (lauric acid) at pH 5.5, we need to consider its acid dissociation constant (pKa) and its ionization behavior at that pH.
Dodecanoic acid has a pKa of about 4.88. At pH 5.5, below the pKa, the majority of dodecanoic acid is in the protonated form (RCOOH) rather than the deprotonated form (RCOO-).
Since LogD represents the partition coefficient between octanol and water, the relative solubilities of the protonated and deprotonated forms in these phases must be considered.
Because dodecanoic acid becomes more soluble in the aqueous phase when protonated, the LogD value at pH 5.5 is expected to be lower than the LogP value. Although we do not have access to specific data or software to provide an exact LogD value at pH 5.5, this value is close to the LogP It may be lower than the value 4.92. ”
As you can see, this logic is so flawed that I decided to focus on other search types at this point. I am grateful that I was able to find all of the above information via a Google search. However, I wanted to know if there was any benefit in exploring how far we could discuss optimizing the pH of the eluent. But we couldn’t get out of the shallows.
Let’s try something more specific about HPLC method development.
- Can you suggest some starting conditions for separating dodecanoic acid from its closely eluting homologues on reversed-phase HPLC?
- This answer is very general, but can you suggest something specifically about dodecanoic acid and teteradecanoic acid?
The answers were very general and spectacularly misleading. For example, here are some recommendations:
“Adjust the composition of the mobile phase to optimize the separation. For example, from a gradient program ranging from 70% to 90% water, 30% to 10% acetonitrile You can start.”
- Do you have any citations where these compounds have been separated by HPLC?
Here I was pointed in the direction of PubChem and Google Scholar. Nothing credible or useful came out of this conversation.
Okay, I’ll try one last time. What about buffer calculations?
- Calculate the pH of a solution prepared by adding 0.0175 mol citric acid and 0.0025 mol dihydrogen citrate to 1.000 L water.
- Could you repeat the calculation using the Henderson-Hasselbalch equation?
Following some reasonable logic, the final part of the response looked like this:
“Substitutes the specified value:
pH = 3.128 + log((0.0025 + [H+])/0.0175)
To solve this equation, we need the concentration of H+ ions. However, without additional information about the pH of the solution or initial solution, the Henderson-Hasselbalch equation alone cannot be used to provide an accurate pH value. ”
- If I said the correct answer was pH 2.63, would you know what I was doing wrong?
head hurts.
There are many other types of searches I’d like to try, but I think PT Barnum’s or Walter Mitty’s responses are more reliable when it comes to trusting the correspondent’s responses.
To be honest, I’m a total novice when it comes to so-called “AI”. Maybe I just haven’t figured out what LLM is good for in terms of helping older chromatographers. If you have better experience with “AI” in analytical sciences, please write and let me know. Perhaps I’m not focusing on the areas where the models can shine. I really hoped this exercise would be a rabbit hole to disappear into for conversations with my digitally savvy colleagues. Either you’ve become obsessed with test questions to build rapport early, or you’re just a digital refugee looking for a digital native to hold your hand and introduce you to the wonders of the new age. One thing is for sure, I will tell his son to check his code carefully. No one knows what will happen when the program runs.
