Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI 'hallucinated' fake legal cases filed to B.C. court in Canadian first (globalnews.ca)
43 points by luu on Jan 24, 2024 | hide | past | favorite | 112 comments


> In one case, a judge imposed a fine on New York lawyers who submitted a legal brief with imaginary cases hallucinated by ChatGPT — an incident the lawyers maintained was a good-faith error.

They need to be disbarred. Submitting legal filings that contain errors because you used ChatGPT to make up crap is the opposite of a "good-faith" error.


> They need to be disbarred.

In that NY case, they were only fined $5000 each, which seems like a slap on the wrist.

I think the court was sympathetic to them, as was I. They were the first lawyers to get burned by ChatGPT. How were they supposed to know that a highly publicized product from a major tech company would just make shit up?

But the penalty for lawyers who do it now, after the first case got so much publicity, should be more severe.


Hard disagree.

They should have taken a look at what they were submitting. That is literally their job.


When we had to register the ownership transfer of our family home, despite taking YEARS to prepare the documents, the notary hadn't even bothered to look up the name of the street on google maps and wrote the wrong name on the contract. So we had to waste 1h just to fix that.

They get paid in % of the value of the property being transferred, and can't even spend 2 minutes to write the name of the correct street.


By "taken a look at what they were submitting", I assume you mean use a different tech product, like LexisNexis? In the end you're still trusting technology.

And technology was usually trustworthy, until AI. If Google Scholar cited a legal case, I wouldn't doubt its existence.

It took some time for people to recalibrate for this new world where technology lies.


Whatever tool they used, even if it’s microfiche or a roll of sheepskin, their eyes should have looked at the contents of the case they were citing, before they submitted it to a judge.

If I had to submit code to a judge that decided the life of a person, I’d be reading through my NPM dependencies file by file, if not line by line.


In an idealized world, with infinite time and no financial constraints. In the real world, clients get screwed over all the time because they can't afford to pay lawyers that much.

It's a shame that ChatGPT isn't trustworthy, because if it were, it could really help reduce the cost of legal representation and create a fair legal system.


A small fish can get investors that put up the cash necessary to fight a big fish in a civil suit, as long as the ROI makes it worthwhile!


This is not hypothetical. This is a regular occurrence in civil suits in the US.

Don’t shoot the messenger?


> I think the court was sympathetic [...] They were the first lawyers to get burned by ChatGPT.

This makes it even more important to strongly discourage such conduct. Unless (cynic me) considers all the $5K fines they'll collect.


I'm not sure if disbarring is appropriate here, but then again I'm no legal professional.

I don't know what the appropriate response would be to a lawyer lying to the court and making up facts. I don't think something as bad as making up lawsuits even happens in normal legal proceedings. I'd presume fines and other types of punishment, depending on if the lawyer is stupid enough to lie about using ChatGPT like in the American case.

The person who made the mistake of hiring this lawyer will probably have grounds to sue them for malpractice, especially if they end up losing this case. I know I'd want my money back if my lawyer didn't even bother to read the paperwork they were filing.

This lawyer will now have "lawyer lied to the court" show up the moment you Google their name. I think that, plus a hefty fine, is more than enough punishment. Whether or not their future clients will trust them after this is up to them.


A fine is appropriate. There's no reason to destroy someone's life because of this. There are numerous forms of disciplinary actions available that don't involve needlessly and permanently destroying a person's livelihood.


Lawyers submitting fake briefs that they could be bothered to review destroys someone's life.

This would just mean they can't be a lawyer.


Does disbarment really destroys someone's life? It considerably reduces their career options and it means they'll probably get much less wealthy than they would otherwise have been, but they can still find another job.

By contrast, if a lawyer makes a mistake that gets someone a criminal record they didn't deserve, that person has much more ground to say it destroyed their life.


[flagged]


I don't think cases like these are comparable to bumping into another car. Legal proceedings can have life altering effects, and lawyers are trusted to take that responsibility. In this case, we're talking about a family that wants to visit China, and I'm not sure if they'll be punished for their lawyers incompetence, but civil lawyers will also deal with life-changing amounts of money. This isn't just their own reputation and livelihood they're putting on the line.

I don't think this particular lawyer should be disbarred, but I do think submitting lies and confabulations to the court should be punished strictly. Attempts to deceive the court should not be tolerated, especially not when the lawyer didn't even do to the work they put their signature under.

The justice system is screwed up more than enough, we don't need professionals getting away with this crap to make it even worse.


Imagine if doctor did some operation from first result in google search. Without verifying that it is correct one or even presented correctly. Should they not lose their medical license?

Using LLM to produce document and then not verifying each part is wilful negligence. Either they do not care to do right thing. Or they are too ignorant. In both cases disbarring seems reasonable. They can always go to fast food or something after it.


I found out that people are more evil and cruel in general than what I believed. The internet just made it first explicitly and later, socially acceptable.


If the user didn't understand that ChatGPT make up crap sometimes, despite the warnings everywhere about it that they may not read, it could still be a good-faith error to me. ChatGPT was just released.


If the user doesn't understand their tool then they should review what it spits out.

> I found a guy who claims to have the sum of all knowledge, and I've just copied and pasted code from him that I haven't reviewed.

Then either your an idiot and should be disbarred or you are negligent and should be disbarred. As a lawyer you have people's lives in your hands, this is not the time for "woops, sorry I just couldn't be arsed to read what I submitted".


If the user is a lawyer who officially referenced a case that a) they never read and b) never existed, then the user can’t be allowed to practice law anymore.


Every single lawyer references cases they never actually read in every filing.


Then we're discovering together that they all should be disbarred then....


The demo: AI so powerful it's a literal existential risk for humanity!

v1.0: You should be disbarred for using our product and assuming the output would make any sense whatsoever.


If the user doesn't understand their tools maybe they should delegate to someone that does before they rely on the results

Especially if they're a lawyer! Llmao


The danger of "AI" is that we actually believe the plausible fabrications it produces are "intelligent". The other day, I debated a guy who thought that the utopian future was governments run by AI. He was convinced that the AI would always make the perfect, optimal decision in any circumstance. The scary thing to me is that LLMs are probably really good at fabricating the kind of brain dead lies that get corrupt politicians into power.


> The danger of "AI" is that we actually believe the plausible fabrications it produces are "intelligent".

100% this!

I'm fed up with seemingly rational people who just can't comprehend that the AI hype is just sales talk.

I had to convince a customer the other day we cannot write whole apps with ChatGPT. As far as I know, not a single example exists of a full app written by ChatGPT. It just can't be done, because ChatGPT is not reasoning!!! It is not intelligent.

It can just spit out seemingly coherent, but often incorrect, snippets of code.

I think it's best described as a word-calculator, or autocomplete on crack, in the sense that it is great on guessing what comes next. But it has no reasoning behind it's predictions, only statistics.

EDIT: The AI apologetics are downvoting me LOL


Are we using the same ChatGPT?

It cannot write a full app, but that is a matter of context size.

It absolutely can "reason" within bounds, attention is a much more powerful system than you realize.

Chain-of-Thought is a well-established technique for LLM prompting which can produce great results. And the snippets of code GPT-4 generates for me are usually on point, at least they were until OpenAI dumbed the model down in the last few months with GPT-4-Turbo.

I get it not only to write novel code, but explain existing code all the time and it reasons about my code very well.


Ahh.. I didn't pay for the premium version, that is why I am not convinced. /s


I don't know what to tell you if you think you're entitled to a completely free cutting-edge service that takes multiple dedicated GPUs to serve your requests.


>autocomplete on crack

I have had some really frustrating conversations about this very topic with people insisting on using it for complex tasks like designing business strategies. It doesn't know how to actually answer your question, it's just spitting out probable words associated with the topic in a plausible sequence! It's like Quora with spellcheck and an aura of legitimacy which makes it infinitely more seductive.


I’ve actually used a mates replit chained gpt-4 app to build short single function apps from a single command that function on first run, so it’s definitely possible, and will only get better - but building code is just following a logical set of instructions down a pathway


"Single function app".. how is that different from a snippet?

Show me a proper app generated by ChatGPT.

You can't, because ChatGPT can't.


It will only get better, based on what? A true believer just knows.


Reminds me of the latest excuse from OpenAI: https://news.yahoo.com/ai-needs-nuclear-power-breakthrough-1...

(its not the tech that are wrong, we just don't have enough energy in the world).


Sam Altman is also a non-trivial investor in various nuclear energy R&D companies, e.g. $375 million into Helion, so take that "need" with a pinch of salt. He made a bet, presumably because he already believes, but it's human nature to try to justify your decisions after the fact and not just before.


> I'm fed up with seemingly rational people who just can't comprehend that the AI hype is just sales talk.

The reason why you see rational people believe the AI hype is that AI is a lot better than you give it credit for. They are right.

> I had to convince a customer the other day we cannot write whole apps with ChatGPT. As far as I know, not a single example exists of a full app written by ChatGPT. It just can't be done

It can. When it first came out, I wanted a macOS menu bar app so that I could access ChatGPT conversations from the menu bar. I’d never written a macOS app before. I told it what I wanted, it wrote the whole thing in one go. There was one minor compile error (a type signature had changed from a previous version, if I remember correctly), which was a one-line fix. I iterated a couple of times, telling ChatGPT what improvements I wanted to make to the app. It did them.

Would I use it to build a complex app? No. But it is capable of building a whole app.


> The reason why you see rational people believe the AI hype is that AI is a lot better than you give it credit for. They are right.

I am sorry, but you are mistaken. There are no AI. The "AI" that is being mongered today is just a salespeople buzz word for Large Language Model technology, which is just that, a language model. It can be used to generate seemingly coherent text, but it can not reason.

> Would I use it to build a complex app? No. But it is capable of building a whole app.

It only works for you as you accept lowering the bar of that would pass as an "app".

I mean of course any kind of normal app. Say Microsoft Paint for example.

Show me ChatGPT able to create a full app like that based on business logic you prompted.


> it can not reason.

As someone with an A-level in philosophy: What is reason?

> I mean of course any kind of normal app. Say Microsoft Paint for example.

The original, or the current one? Ironically, I think either would be a case where ChatGPT works best. The features in a raster graphics tool are very well isolated from each other.

I'd expect ChatGPT to face-plant much more with any meta-level stuff, so a database that doesn't have a 1-1 correspondence with whatever appears in a UI for example.

Or physics sims where you've got several layers of indirection that all need to be correct. I've never been able to get a complete working Navier–Stokes simulator out of it, for example.


> As someone with an A-level in philosophy: What is reason?

Maybe you are better off answering that question then. But I would define ability to reason as the ability to make logic conclusions based on input from a feedback loop.

Let's use programming as an example.

You write some code you think will work. You run it. It outputs the wrong values. You reason about the code, realize your mistake. Fix the code. Run again. It outputs the correct values.

ChatGPT cannot do any of this. All it can do, is spit out text that it deems probable to belong to previous text it already generated. This is based on probability, but it completely lacks a reasoning part and a feedback loop.

> The original, or the current one? Ironically, I think either would be a case where ChatGPT works best.

Show me a prompt that produces either one.


> ChatGPT cannot do any of this. All it can do, is spit out text that it deems probable to belong to previous text it already generated. This is based on probability, but it completely lacks a reasoning part and a feedback loop.

50% of the time, if you give it source code and an error, it can fix the source code and tell you why. The other half I still get paid for.

> Show me a prompt that produces either one.

K, if I remember by the time I've done the higher priority things, I'll send you a repo with as much of it as I can be arsed to ask for to prove the point, along with the share link to the chat session.


> Show me a prompt that produces either one.

Note how you only need to ask for each feature and in it goes. Sometimes there's a bug or three, but you can generally fix them by describing them.

https://chat.openai.com/share/e2a6b513-c088-449d-8d8f-4f4c73...

I won't ask for the rest of the features, because it's boring.

What I bring to the market right now, might be merely knowing what to ask.

--

Oh yeah, I said repo as well. Note the commits are copy pasting blindly. I don't do JS professionally, I'm an iOS dev: https://github.com/BenWheatley/JSPaint


You must be kidding right?

Your chat demonstrates most of what I brought up:

* it only spits out snippets

* it cannot produce an app

* it is hallucinating

* it is creating more work than just doing the work yourself

And its a web app, but that's on you for requesting that for some reason.

I asked for a MS Paint app. That would mean something that produce a Windows compatible EXE, not a some html fest.

> I won't ask for the rest of the features, because it's boring.

Or, because it's infeasible and you know it.


Doesn't need answering. All we need to know is that reason requires logic. LLMs do not employ logic, thus they are not able to reason. They sometimes act as if reasonable by happenstance, but fundamentally they are not.


Transformer models are universal, which isn't very surprising given they're made of things which are also universal when configured right, so they absolutely can learn the rules of logic.

Some say that reason requires consciousness, which I'm… frustrated by given the 50 common meanings of the word "consciousness"; but to merely use logic as your standard here?

Why, of all the things people could object to in an LLM, why is logic what people want to pick on? It's the weakest possible objection IMO.


The set of models we're discussing haven't.

You were derailing with "reason" for no reason, so I pointed it out. That doesn't mean logic should be applied as some sort of universal standard.


> The set of models we're discussing haven't.

Haven't what?

> You were derailing with "reason" for no reason

I was quoting someone else who said LLMs can't reason, and I'm asking them what they meant by that because ChatGPT sure acts like it reasons no matter what's going on "inside". I assume the inside to be a Transformer model because otherwise the naming is weird, but whatever it is, it acts like it learned to reason.

And I'm saying that despite wanting this to be a repeat of Clever Hans so I can go back to feeling optimistic about my economic future.


What is reason? No matter.

and what is matter? Never mind.


> It only works for you as you accept lowering the bar of that would pass as an "app".

I asked it to create an app, it created the app I wanted. That’s not “lowering the bar”. You’re just redefining words to prop up your incorrect claims.


Okay. Enjoy your AI infused career where you can knock out hello world apps in just a few minutes.


Prove it.


Why didn't the customer write the whole app themselves with chatgpt?


They tried to show me what they meant.

ChatGPT proceeded to spit out a snippet of rust code, starting with a dependecy outdated by 4 years, proceeeded to mix API from several versions of the dependency and completely ignored most of the request.

Customer do not know rust so it looks amazing to them.

ChatGPT is a great con-man for the gullible.


> ChatGPT proceeded to spit out a snippet of rust code, starting with a dependecy outdated by 4 years, proceeeded to mix API from several versions of the dependency and completely ignored most of the request.

If that's what you meant, I think everyone here, including the people disagreeing with examples (e.g. me), would have agreed.

What I mean with my examples is: it can do all the bits. Is it perfect? Nope. Do you still need a human in the loop to understand the failure modes? Yup. That MS paint app you asked for 45 minutes ago? Features are decomposable, so it can do this step by step, so long as you're OK adding one feature at a time instead of trying to do it all in one go — 75% of the results are fully functional and do what you asked for (which isn't always what you wanted, "tool selector" can be the CSS or the widget itself).


Some people used to write crap code stitching together snippets from stack overflow, now the same people write crap code stitching together what ChatGPT spit out after trained on said stackoverflow clusterfuck of buggy or flat out incorrect solutions. You do you.


And those people are now your competitors and customers.

Worse (for all of us in this industry) sometimes there's a bug or three, but it can generally fix those bugs by you giving the source code (if the source didn't already come from the LLM) and simply describing the issue. Even when the bug is graphical in nature, which is pretty wild for a text based chat engine that can't see (like 3.5 is).

And that's without applying my experience of any language or frameworks, e.g. when asking it for solutions in languages I don't have meaningful professional experience in, like JavaScript.

When I can apply my experience, I cringe at the choices it comes up with, and find I want to use it as a rough guide rather than a complete answer. But you know who doesn't care about cringy code? Everyone who isn't a developer. Even QA only cares if it passes the tests.


> And those people are now your competitors and customers.

For now, yes.

> When I can apply my experience, I cringe at the choices it comes up with, and find I want to use it as a rough guide rather than a complete answer. But you know who doesn't care about cringy code? Everyone who isn't a developer. Even QA only cares if it passes the tests.

This is a race to the bottom, surely you must understand that?

I expect some of the biggest tech companies to start limiting developer use of LLM:s soon due to the monumental technical debt they are creating.

More context: https://visualstudiomagazine.com/articles/2024/01/25/copilot...


What matters is if the tool is useful or not.

You cannot claim that this tool is not useful to people and use cases you are entirely unfamiliar with.

Debates of what is real intelligence are akin to running around in circles.


"""That's one of those irregular verbs, isn't it? I give confidential security briefings. You leak. He has been charged under section 2a of the Official Secrets Act.""" - Yes Minister.

To rephrase for the subject: We're only human, mistakes are to be expected; They are a idiots, mistakes are to be expected; that thing is just a glorified calculator, mistakes are to be expected.

(There's a short story I found I couldn't bring myself to finish, “Zero for Conduct” by Greg Egan, where the lead character is bullied by someone who has a similar disregard for her intelligence; I know one cannot use fiction to learn about reality, so I will instead say that this disregard of human intelligence by other humans happens a lot in real life too, the racism and xenophobia of βαρβαρίζω can still be found today in all the people who insist that ancient structures like the pyramids couldn't possibly have been built by the locals and therefore it must have been aliens).

> AI hype is just sales talk

But where does the hype end and the reality begin?

> I had to convince a customer the other day we cannot write whole apps with ChatGPT. As far as I know, not a single example exists of a full app written by ChatGPT. It just can't be done, because ChatGPT is not reasoning!!! It is not intelligent.

I will agree ChatGPT does indeed make incoherent solutions — one test project was making a game in JS, it (eventually) gave me a vector class with methods for multiply(scalar) etc., but then tried to use mul(scalar).

But ironically, I've also made a functioning (basic, but functioning) ChatGPT API interface… by bolting together the output of ChatGPT. I won't claim it's amazing or anything, because I'm an iOS developer and the thing "I" made is a web app, but it works well enough for my needs (just don't paste HTML into the query section, because I stopped adding to it when it was good enough for my needs and therefore only have a very basic solution to code being shown in the chat list, there's a lot of stuff that would be improved by using simple libraries but I didn't want to).

> I think it's best described as a word-calculator, or autocomplete on crack.

And that's the same error in the opposite direction.

If I understand right, GPT-3 is literally the complexity of a mid-sized rodent. Thus the metaphor I use is:

Imagine a ferret was generically modified to be immortal, had every sensory input removed except smell, which was wired up to a computer. The ferret and computer then spend 50,000 years going through 10% of all the text on the internet, where every sequence is tokenised and those tokens are turned into a pattern of olfactory nerves to stimulate, and the ferret is rewarded or punished based on how well it imagined the next token.

You're annoyed that this specific ferret's jokes are derivative, their code doesn't always compile, that they make mistakes when trying to solve algebraic problems, that their pecan pie recipe needs work, and that they make mistakes when translating Latin into Hindi.

I'm amazed the ferret can do any of these things.


> If I understand right, GPT-3 is literally the complexity of a mid-sized rodent.

You are not understanding things right, then.

Intelligence is comprised of multiple complex systems. ChatGPT only ever claimed to focus on the language part. It does not contain reasoning.

Even a rodent can reason.

Much less complex organisms can reason, too.


> You are not understanding things right, then.

GPT-3 has about the same number of free parameters as the number of synapses in a mid-sized rodent brain.

> Intelligence is comprised of multiple complex systems. ChatGPT only ever claimed to focus on the language part. It does not contain reasoning.

It (ChatGPT) is doing abstract symbol manipulation, unless you want to expand the idea of language to include chess positions (even if mediocrely), algebra, and the application of the rules of formal logic to the symbols used to represent those rules when it is so prompted.

> Even a rodent can reason.

> Much less complex organisms can reason, too.

For which values of the meaning of the word "reason"? From your other comment "But I would define ability to reason as the ability to make logic conclusions based on input from a feedback loop."

1. This is a terrible waste of computer power, in the order of multiple-trillions-to-one ratio, given logic is the underlying thing used to represent the numbers and floating point operations that approximate linear algebra that in turn is used to build LLMs.

(A similar ratio is also found in human "logical" cognition, and is why we don't use logic all the time, hence the classic example of "a baseball bat & ball that cost $1.10 together, the bat is $1 more expensive than the ball, how much does a bat cost?" which so many get wrong).

2. LLMs can do that anyway. Even literal flow charts are still doing that anyway, and LLMs can build flow charts.

It still makes mistakes, sure, yup, but then the question is "how many compared to a human?" and that's terrifyingly close to humans and I'm just hoping it will remain on the same side, slightly worse — I mean, your own example is humans being fooled by it, so clearly you also know humans can be wildly wrong.


I've wondered if the impression isn't so much caused by the answers themselves but a trust level has formed from its behavior. A core appeal of LLM bots is they're instructed to be non-judgemental and ever-helpful, so in most scenarios one won't 'butt heads' with them.

This obviously has some real positives for learning (if the responses could be accurate) since you're more likely to continue eagerly the less ashamed you are asking 'dumb' questions to more rapidly gain an understanding of something (anonymous accounts online similarly facilitate shame-free question asking).

I wonder though if it might be having an effect where people begin to prefer AI over humans since with real humans, whether online or IRL, there's no consistency of response—people could be cranky, disinterested in responding or criticize someone for even asking.

So the more one interacts with an ever-willing, ever-pleasant, non-judgemental, human-esque answer machine—even if it's hallucinating things—I could see how it could become more of a go-to and even a trust begin to grow from familiarity and its generally polite ability to listen.


> The danger of "AI" is that we actually believe the plausible fabrications it produces are "intelligent".

the thing I find scary is you see people on here believing this

if a website full of techies can't tell the difference what chance do the general public have?

I suspect the invention of the LLM will be the final nail in the coffin of liberal democracy (after the invention of social media)


People should never have blindly trusted anything on the internet in the first place. As far as I’m aware, most of the fears like this are overrated. People will live, learn and adapt as we always have. Constantly complaining about doom and having pessimism about anything seemingly scary is not a great way to live a fulfilling life.


There are entire subreddits of people who believe that ‘AI’ is by definition infallible. I feel myself getting sucked in and going slightly crazy just spending a few minutes reading it.

Those places seem to mostly be populated by disenfranchised people who are desperate for some ‘crazy alien tech’ to come along and overturn whatever systems they feel led them to their current (miserable) lives.

“Anyone who doesn’t think that in two years’ time AGI will have ushered in a new age where we all live on UBI and are free to explore our passions isn’t paying attention/has no idea what’s about to happen”, etc. etc.

Needless to say, none of this stuff has anything remotely to do with, you know… empirical evidence or theory. But reading endless reams of it for hours a day can certainly make you think it does.


> “Anyone who doesn’t think that in two years’ time AGI will have ushered in a new age where we all live on UBI and are free to explore our passions isn’t paying attention/has no idea what’s about to happen”, etc. etc.

I'd be surprised if it takes less than 6 to fully generalise over all domains (what I expect to be hard is "learning from as few examples as a human"), and I'm not convinced there's the political will to roll out UBI fast enough, not even if AI that is sufficiently powerful as to require UBI to avoid economic collapse and popular revolution takes 20 years.

But AI may well break a lot of things in 2 years even without being either of those. Radical economic disruption doesn't need to replace all humans, even 5% would be huge, and it doesn't need to be as evidence-efficient as a human if there's lots of humans doing the same thing that the evidence-inefficient AI can learn from.

On the other hand I have just described the trucking industry, and yet even despite the ability to collect all that training data from all those drivers, Tesla has still not fully solved self-driving vehicle AI.

And also, I do mean "replacing humans" rather than just doing specific tasks that are currently done by humans: if those same humans can "just learn a different job", then so long as they can retrain faster than AI can learn the same jobs by watching humans collectively, this isn't so bad.

> empirical evidence or theory

Theory, I'd agree. But then, we don't really have any theory suitable to this task.

Empirical evidence? That's a weird take, to me. What it can already do, even despite the mistakes of the current generation, has me wondering how long I'm going to be economically relevant for — and hoping this is a repeat of Clever Hans.


The utopian future is governments run by AI.

The Culture series explores this to some degree.


Fiction isn't a good reference point for reality. The dystopian future is also governments run by AI. Terminator so much so it's a cliché, but also I Have No Mouth, and I Must Scream. Also, never read this, but people sometimes point it out as an example of "Can we, like, not do that?": https://tvtropes.org/pmwiki/pmwiki.php/Fanfic/FriendshipIsOp...


Science fiction is a great reference point for reality.

The people creating our reality today all read the same books I did as kids.

Humans make lousy governors and suck at minimizing mass suffering.

Barring I have no mouth and I must scream scenarios or Butlerian Jihad scenarios, AI can make a better world for us (or for whatever replaces us).

The alternative is for our monkey brains to keep ramming each other with cars, slamming projectiles into ships to draw political attention, getting angry at people with the wrong skin color, etc

I don’t want another 10,000 years of that.


> Barring I have no mouth and I must scream scenarios or Butlerian Jihad scenarios, AI can make a better world for us (or for whatever replaces us).

That's a fully generalisable statement: Barring ${environmental degradation and worker rights issues}, ${laissez faire economics} can make a better world. Barring ${racism}, ${colonialism} can make a better world. Barring ${dictatorships}, ${communism} can make a better world. Barring ${a Kessler cascade}, ${a Dyson swarm} can make a better world.

> The alternative is for our monkey brains to keep ramming each other with cars, slamming projectiles into ships to draw political attention, getting angry at people with the wrong skin color, etc

Our limited monkey brains are, unfortunately, also the reference example for all the AI we want to make. Human-like alignment with human-like interests and human-like values. The default is this plus bugs from the software not doing what we thought we wanted it to do, which means all those things but done at machine speed rather than biological speed plus a bunch of other things that just make no sense when they happen. The bugs in particular can be, and in some cases have already been, weird beyond human comprehension.

I hope the safety/alignment people don't get caught up in either a scandal or a philosophical cul-de-sac.


I imagine after the first car was invented if somebody described our current situation with cars most people would be as sceptical as you are about this.

If the purpose of Government is to govern as per the will of the people, AI's ability will likely surpass Humans in this task certainly in my lifetime.

Talking about current gen AI is like talking about the Benz Patent-Motorwagen, We haven't reached the Ford Model T and we can't even imagine a Tesla Model X.


Given how bad the Benz Patent-Motorwagen was, I think the DIY/"open source" LLMs would count as equivalents of that, and that OpenAI, Stable Diffusion, and Midjourney (especially given their popularity and economic disruptiveness to buggy whip manufacturers/artists) are equivalents of the Model-T. Cars have become more efficient, comfortable and safe since then, but the utility is similar.

But I also think cars are a terrible analogy. Internal combustion engines collectively are probably more apt, and for that case: OpenAI's various models may well be the 1776 Watt steam engine — a basic useful tool that displaces manual labor, which has a direct influence in its own right, but which would also see categorical replacement several times over.


> Given how bad the Benz Patent-Motorwagen was, I think the DIY/"open source" LLMs would count as equivalents of that, and that OpenAI, Stable Diffusion, and Midjourney (especially given their popularity and economic disruptiveness to buggy whip manufacturers/artists) are equivalents of the Model-T

It's difficult to put things on a scale when you don't know what the end outcome will be. There was over 20 years between the Benz Patent-Motorwagen and the Ford Model T.

I think the AI equivalent of the Model T won't be available for 5 - 10 years but it will follow some of the notes of the Model T (Mass Produced, lower cost of ownership, Changes most other industries, mass customisation, etc)


I think people under and overestimate AI at the same time. E.g. I asked ChatGPT4 to draw me a schematic of a simple buck converter (i.e. 4 components + load). In the written response it got the basics right. Drawing that schematic is completely garbled non-sense.

I was expecting something like this maybe: https://en.wikipedia.org/wiki/Buck_converter#/media/File:Buc...

I got this: https://imgur.com/a/tEqprGq


> I got this: https://imgur.com/a/tEqprGq

Are you sure you asked ChatGPT4 to draw the schematic for a simple buck converter? I'm asking because that looks like a near perfect pre-Schneider coencarbulator-control oblidisk transistor, ingeniously aligned for use with theta arrays!

I'm quite out of date with the latest VX work, but it looks impressive enough that I think you should ask the VX community [0] in case any of them may use this design to get improved delta readings.

[0] https://old.reddit.com/r/VXJunkies/


But that is my point, ChatGPT connected the rockwell retro encabulator[0] in reverse, which will of course make the resonance tank split capacitor bridge oscillate out of sync with the active input bias triodes. This will of course immediately fry the dual-bifet transistor driver stage. Not sure why they call this AI.

[0]https://www.youtube.com/watch?v=RXJKdh1KZ0w


Is GPT-4 just passing a prompt to DALL-E to create an image? The garbled diagram makes sense since DALL-E isn't supposed to be that intelligent or an AGI.


Yes it did. And this was the description ChatGPT put underneath it:

"Here's the illustration of a basic buck converter schematic. This diagram includes the input voltage source (Vin), the switch (transistor), diode, inductor (L), capacitor (C), output voltage (Vout), and the load (represented as a resistor). The connections between these components and the flow of current are also shown. This should help you understand the basic operation of a buck converter."

Which is a pretty good description of the components. However, it then gaslights me into believing this is somehow depicted in the picture Dall-E produced.


So even if GPT-4 "understands" what needs to be drawn (in the sense that the higher-level concepts and relations between them are embedded in its weights), it's difficult to get DALL-E to draw it correctly because DALL-E doesn't have that same "understanding". OpenAI will need to combine these models somehow, or train DALL-E so that it has the same conceptual understanding.


AI version is so much better!


Right. It's the AI that is the problem.

I have another use case for LLM's, I haven't thought of before: absolution of responsibility. The public is already primed to focus on AI in such cases.


In the early 90s, when every business was moving full out into computing, I'd call for an issue.

Sometimes the person helping me, would say "I'm sorry, the computer won't let me do that". This seemed to grow as a response, until, to protect myself, I started pushing back.

"I don't care, you guys owe me $x, and telling me your computer won't let you do it is absurd. Pay me."

Most people would then get a manager on the phone, who would ape the "computer was at fault" line, but after pushback.. surprise!, there was a way to resolve things.

Point is, abdicating responsibility is an easy out for the lazy and for corporate greed. And AI will surely be used in this fashion, because people love an out.

Amazon is already the worst of this, if you have an issue that doesn't fit in a premade solution, good luck getting any fix!

Taking humans out of the chain is always bad.

Just look at the UK post office scandal, for the dark side of this.


Calls very close to home...

It really feels like playing a whack-a-mole, never really progressing towards any reasonable level of responsibility


> absolution of responsibility

There are some still fighting the fight, but this is already happening¹ when it comes to failure to attribute and other licensing issues. “I didn't copy/misattribute/other your code, an AI trained in part by it was ‘inspired’ by it amongst other code”.

----

[1] unless you believe the assurances that an LLM will never regurgitate chunks of training material, like the image based models have been shown to


A hundredth the price and a quarter the quality means that this is here to stay. Might be a little early in the accuracy phase to start riding AI written briefs into court unchecked, but then I’ve never met a lawyer who didn’t try to make their billing efficient.

But logically, since all that is needed is improved accuracy it’s more likely that improved accuracy will be the answer rather than any change in human behavior.


> A hundredth the price and a quarter the quality means that this is here to stay

No, it's simply that those noobs don't know how to use LLMs. They'll eventually learn.

Basically, you don't use them to dig up new information, unless you're extremely careful about triple-checking that information. Google Scholar's legal database search is better for that. You use LLMs to write boilerplate, paraphrase, edit, and synthesize information from your own sources. Do it properly, and you'll never "hallucinate" a fake case in one of your legal filings, and you'll be able to write 'em in 5% of the time.


> Do it properly, and you'll never "hallucinate" a fake case in one of your legal filings, and you'll be able to write 'em in 5% of the time.

All fun and games for those who can and good for them, but I'm betting the majority can't or won't. The result being society pays for those later ones incompetence.


I've led a team building an LLM agent for customer service.

Our finding is that it's between 50% and 10% of the operational cost of a human for a case. This costing is based on the range of costs for offshore vs. nearshore workers and doesn't account for a lot of the overhead of a human powered service organisation (in the jargon this isn't fully loaded).

I believe that the real cost is about 20% if dev expenses are included - but that's just my view of where inbetween the bounds thing come to rest.

Now, that's not 100th. In terms of quality, there are things it can't do and despite our architecture (which is aimed at managing the deficiencies of LLM's) we still see some hallucinations creeping through. For example our encoder has problems with directionality as in it will write text like "average transaction value declined from $150 to $154 in october." We can catch (in our tests anyway) all the mistakes about the values, but the actual textual phrasing is hard to check - at the level where hard means I think that the value of the system doesn't justify it.

I think, from customer feedback, that this sort of thing will be ok for the apps we are building, but it is a real problem with this generation of models and it's not clear to me that it will be solved in the future (although like everyone else I was blindsided by the jump from GPT3 to 4 so who knows).


Really interesting insights and a really great comment.

I expect the technology to accelerate including dramatic leaps in accuracy and for LLM technology to make geometric improvements (just larger models and better hardware will improve them substantially, and that’s already coming to market in 2024-2025).


> since all that is needed is improved accuracy

Oh, is that all?

You speak as if truth and facts are just minor details, perhaps to be added in a .1 update.


Yea I believe it will progress geometrically and not linearly so basically that’s how I see it.


Isn’t “hallucination” named after a human phenomena? People too remember things that never happened.

Wouldn’t be solvable with a second AI agent which checks the output of the first one and be like “bro, you sure about that? I never heard of it”.

In my experience with LLMs, they don’t insist when corrected, instead they apologize and generate a response with that correction in mind.


I don't like "hallucination" in this context. "Confabulation" seems much more accurate.

Last time I tried to correct AI, it assured me that I'm wrong, and for a short, funny moment, it started insulting me before the text got scrapped and a generic "something went wrong" placeholder appeared. I don't think putting two AIs together will solve the problem.

The legal references all need to follow relatively standardised formats, so those should be easy to check. Of course there's no way to know for sure if the case is actually about the subject that it's being used as for support, but bare minimum you could filter our AI responses with fake cases.

Because AI can't differentiate truth from fiction, it's essential that a real lawyer double checks the documents AI will generate, including reading referenced cases and verifying that they support the supposed argument. That way, they can skip a difficult part of their job without actually lying to the court when they submit "their" documents.

For important fields like law, you will always need a human verification step.


> they don’t insist when corrected

That’s just OpenAI’s RLHF and instruct tuning though. Bing Chat’s Sydney had the temperament of a moody teenager and would often contradict or accuse people of being wrong when pushed.

The correct way to solve this is to use retrieval augmented generation (finding relevant cases using embeddings and then feeding the data along with the question), so that the model is grounded to the truth.


Sure, a second agent can do fact and sanity checking with retrieval too.


It's absolutely solvable - and presumably this is the sort of functionality that more legal-specific AI applications offer (probably mixed with some sort of knowledge graph that contains facts).

Seems like the issue here was that a lawyer was just using OOTB ChatGPT.


I'm not so sure with generic LLM-s. With what I've seen so far in how LLM-s work "hallucination" is a feature, not a bug. In other use cases it's called creativity when the user wants new ideas generated or filling in the missing spots. In these cases they are really happy that it can hallucinate, but of course they don't call it like that.

So the dilemma is that we need to eliminate the same thing in one case but improve it in others.

But I believe you are right that we can engineer a fine tuned system that can use tools to fact check using traditional techniques.


I wonder how many AI hallucinations stem from the fact that nowhere in the prompt it was said that information have to be real. And this cannot be implicitly assumed since humans like fiction in other contexts.


I haven't used or paid much attention to ChatGPT, but the other day I was reading a macOS question on Reddit, and one of the "answers" was completely bizarre, claiming that the Apple Launchpad app was developed by Canonical. I checked the commenter's bio, and sure enough, they were a prolific ChatGPT user. It also turns out that Canonical has a product called Launchpad, which was the basis of ChatGPT's mindlessly wrong answer.

The scary thing is that even though ChatGPT's response was completely detached from reality, it was articulate and sounded authoritative, easily capable of fooling someone who wasn't aware of the facts. It seems to me that these "AI tools" are a menace in a society already rife with misinformation. Of course the Reddit commenter didn't have the decency to preface their comment with a disclaimer about how it was generated. I'm not looking forward to the future of this.


It is spreading like a wildfire. Yet, the question remains about the repercussions.


Had the lawyer used a fine-tuned model with some RAG to go along with it, I assume it’d have been fine.

This is just people failing to use a technology correctly, like riding a bike without a helmet.


We never talked about the Internet's repercussions, and yet here we are.

Did we do a good job, or should we have left that stone unturned?


"Did we do a good job, or should we have left that stone unturned?"

Regarding the quality of the average Website - no , but also no to the second. The time of the internet was just there with no precedent and now it is hyped imperfect AIs we have to deal with. I think the main problem in both cases is, that most people don't have a clue at all, how it works. (And too many of them are in positions of power).


> Did we do a good job, or should we have left that stone unturned?

It's kind of a hangover, don't you think? The past decade, I mean, with the Internet.


We have been talking about them since day 1 and we still are. DMCA, the existence of the EFF, cryptocurrency regulations are all examples of this.


From TFA, "the case was a high-net-worth family matter" so probably not an existential threat to anyone.


The problem with AI is not the AI itself, it is people. Dumb people. Dumb people with credentials and power.


The title of the article should be: somebody faked a legal case using AI.


How are language models doing what they are known to do newsworthy? This feels a bit like reporting that water is wet.


I guess the value in reporting it is that for most people, for us on HN as well, computing is considered accurate. You can trust the output if you trust the input and the program that processes the input. That is what we expect and value in computing - accuracy.

For LLMs that's not really the case anymore and it needs to be highlighted that "computers" no longer necessarily produce accurate output, to make sure not too much faith is put in what they produce.


> "computers" no longer necessarily produce accurate output

This was always the case. Just because a computer executes your model, doesn't mean your model has any bearing on reality. This is not a new phenomenon.


The story isn't about LLMs doing LLM stuff. It's about lawyers using LLMs as a shortcut for proper legal work, laboring under the delusion that it is entirely accurate, honest and 'intelligent', and the ramifications for the legal system.


Fair enough, that is newsworthy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: