In my experience there's really only three true prompt engineering techniques:
- In Context Learning (providing examples, AKA one shot or few shot vs zero shot)
- Chain of Thought (telling it to think step by step)
- Structured output (telling it to produce output in a specified format like JSON)
Maybe you could add what this article calls Role Prompting to that. And RAG is its own thing where you're basically just having the model summarize the context you provide. But really everything else just boils down to tell it what you want to do in clear plain language.
Dunno. I was working on a side project in TypeScript, and couldn’t think of the term “linear regression”. I told the agent, “implement that thing where you have a trend line through a dot cloud”, or something similarly obtuse, and it gave me a linear regression in one shot.
I’ve also found it’s very good at wrangling simple SQL, then analyzing the results in Bun.
I’m not doing heavy data processing, but so far, it’s remarkably good.
Linear regression is a non-niche, well understood topic that's used in many other domains other than data science.
However, asking it to implement "that thing that groups data points into similar groups" needs a bit more context (I just tried it) as K-means is very much specific to machine learning.
I see that as applying to niche platforms/languages without large public training datasets - if Rust was introduced today, the productivity differential would be so stacked against it that I’m not sure it would hypothetically survive.
Even role prompting is totally useless imo. Maybe it was a thing with GPT3, but most of the LLMs already know they're "expert programmers". I think a lot of people are just deluding themselves with "prompt engineering".
Be clear with your requirements. Add examples, if necessary. Check the outputs (or reasoning trace if using a reasoning model). If they aren't what you want, adjust and iterate. If you still haven't got what you want after a few attempts, abandon AI and use the reasoning model in your head.
It's become more subtle but still there. You can bias the model towards more "expert" responses with the right terminology. For example, a doctor asking a question will get a vastly different response than a normal person. A query with emojis will get more emojis back. Etc.
This is definitely something I’ve noticed — it’s not about naïve role-priming at all, but rather about language usage.
“You are an expert doctor, help me with this rash I have all over” will result in a fairly useless answer, but using medical shorthand — “pt presents w bilateral erythema, need diff dx” — gets you exactly what you’re looking for.
I get the best results with Claude by treating the prompt like a pseudo-SQL language, treating words like "consider" or "think deeply" like keywords in a programming language. Also making use of their XML tags[1] to structure my requests.
I wouldn't be surprised if in a few years from now some sort of actual formalized programming language for "gencoding" AI is gonna emerge.
One thing I've had a lot of success with recently is a slight variation on role-prompting: telling the LLM that someone else wrote something, and I need their help assessing the quality of it.
When the LLM thinks _you_ wrote something, it's nice about it, and deferential. When it thinks someone else wrote it, you're trying to decide how much to pay that person, and you need to know what edits to ask for, it becomes much more cut-throat and direct.
I notice this to affect its tendency to just make things up in other contexts, too. I asked it to take a look at "my" github, gave it a link, then asked it some questions; it started talking about completely different repos and projects I never heard of. When I simply said take a look at `this` github and gave it a link, its answers had a lot more fidelity to what was actually there (within limits of course - it's still far from perfect) [This was with Gemini Flash 2.5 on the web]. I have had simlar experiences asking it to do style transfer from an example of "my" style versus "this" style, etc. Presumably this has something to do with the idea that in training, every text that speaks in first person is in some sense seen as being from the same person.
The main thing I think is people just trying to do everything in "one prompt" or one giant thing throwing all the context at it. What you said is correct but also, instead of making one massive request breaking it down into parts and having multiple prompts with smaller context that say all have structured output you feed into each other.
Make prompts focused with explicit output with examples, and don't overload the context. Then the 3 you said basically.
>We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.
Short answer: It’s a way to generate structured databases for (most) scientific topics. Why? Apply data driven methods to these databases. So what? It’s a powerful way to ask and investigate scientific questions/trends otherwise hidden inside a million scientific papers.
Example: Consider what PDB has done for our understanding of protein folding, as well as the ML/computational techniques they’ve enabled (eg, Alphafold). Most scientific questions and properties are not as data-rich as protein folding. What if they could be?
Longer answer: The last 15 years in computational/ML + science have shown that structured databases open up entirely new frontiers in discovery (eg Protein Data Bank, Materials Project). But most scientific topics/properties are NOT in structured DBs, they’re scattered about in millions of papers. It’s especially a huge problem in some topics in materials science. It’s not that these problems are data scarce, but that it’s hard to actually collate their data in a structured format. You literally cannot use most ML methods because structured DBs do not exist.
This paper is a way to generate massive structured databases of specialized, intricate, and hierarchical knowledge graphs from scientific literature. Fine tuning works, prompt engineering does not (at the time, perhaps this has changed). Once you have a database, you can analyze an entire subfield or topic in science with ML or stats methods.
Chain Of Thought prompting loses much of its effectiveness on newer reasoning models like GPT “o” series and Claude Sonnet.
As an exercise for the reader, I encourage you all to try the examples vs. control prompts in prompt engineering papers for chain of thought prompting, and you’ll see that the latest models have either been trained to or instructed to reason by default now - the outputs are close enough to equivalent.
CoT prompting was probably much more effective a few years ago on older, less powerful models.
You may find some benefit in telling it exactly how you want it to reason about a problem, but note that you may actually be limiting its capabilities that way.
I’ve found that most of the time, I will let it use its default reasoning capabilities and guide those rather than supplying my own.
You use a two-phase prompt for this. Have it reason through the answer and respond with a clearly-labeled 'final answer' section that contains the English description of the answer. Then run its response through again in JSON mode with a prompt to package up what the previous model said into structured form.
The second phase can be with a cheap model if you need it to be.
You can do this conversationally, but I've had the most success with API requests, since that gives you the most flexibility.
Pseudo-prompt:
Prompt 1: Do the thing, describe it in detail, end with a clear summary of your answer that includes ${THINGS_YOU_NEED_FOR_JSON}.
Prompt 2: A previous agent said ${CONTENT}, structure as JSON according to ${SCHEMA}.
Ideally you use a model in Prompt 2 that supports JSON schemas so you have 100% guarantee that what you get back parses. Otherwise you can implement it yourself by validating it locally and sending the errors back with a prompt to fix them.
- In Context Learning (providing examples, AKA one shot or few shot vs zero shot)
- Chain of Thought (telling it to think step by step)
- Structured output (telling it to produce output in a specified format like JSON)
Maybe you could add what this article calls Role Prompting to that. And RAG is its own thing where you're basically just having the model summarize the context you provide. But really everything else just boils down to tell it what you want to do in clear plain language.