Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

that sounds like one of the worst heuristics I've ever heard, worse than "em-dash=ai" (em-dash equals ai to the illiterate class, who don't know what they are talking about on any subject and who also don't use em-dashes, but literate people do use em-dashes and also know what they are talking about. this is called the Dunning-Em-Dash Effect, where "dunning" refers to the payback of intellectual deficit whereas the illiterate think it's a name)




The em-dash=LLM thing is so crazy. For many years Microsoft Word has AUTOCORRECTED the typing of a single hyphen to the proper syntax for the context -- whether a hyphen, en-dash, or em-dash.

I would wager good money that the proliferation of em-dashes we see in LLM-generated text is due to the fact that there are so many correctly used em-dashes in publicly-available text, as auto-corrected by Word...


Which would matter but the entry box in no major browser do was this.

The HN text area does not insert em-dashes for you and never has. On my phone keyboard it's a very lot deliberate action to add one (symbol mode, long press hyphen, slide my finger over to em-dash).

The entire point is it's contextual - emdashes where no accomodations make them likely.


Is this—not an em-dash? On iOS I generated it by double tapping dash. I think there are more iOS users than AIs, although I could be wrong about that…

Yeah, I get that. And I'm not saying the author is wrong, just commenting on that one often-commented-upon phenomenon. If text is being input to the field by copy-paste (from another browser tab) anyway, who's to say it's not (hypothetically) being copied and pasted from the word processor in which it's being written?

The audio artifacts of an AI generated video are a far more reliable heuristic than the presence of a single character in a body of text.

For now. A year ago they weren't even Gen AI videos. Give it a few months...

Well, its probably lower false positive than en-dash but higher false negative, especially since AI generated video, even when it has audio, may not have AI generated audio. (Generation conditioned on a text prompt, starting image, and audio track is among the common modes for AI video generation.)

Thank you for saving me the time writing this. Nothing screams midwit like "Em-dash = AI". If AI detection was this easy, we wouldn't have the issues we have today.

Of note is theother terrible heuristic I've seen thrown around, where "emojis = AI", and now the "if you use not X, but Y = AI".

With the right context both are pretty good actually.

I think the emoji one is most pronounced in bullet point lists. AI loves to add an emoji to bullet points. I guess they got it from lists in hip GitHub projects.

The other one is not as strong but if the "not X but Y" is somewhat nonsensical or unnecessary this is very strong indicator it's AI.


>I guess they got it from lists in hip GitHub projects.

I see this way more often on GitHub now than I did before, though.


Similarly: "The indication for machine-generated text isn't symbolic. It's structural." I always liked this writing device, but I've seen people label it artificial.

Em-dashes are completely innocent. “Not X but Y” is some lame rhetorical device, I’m glad it is catching strays.

When I see emojis in code, especially log statements, it is 100% giveaway AI was involved. Worse, it is an indicator the developer was lazy and didn't even try to clean up the most basic slop.

No one uses em dashes

If nobody used em-dashes, they wouldn’t have featured heavily in the training set for LLMs. It is used somewhat rarely (so e people use it a lot, others not at all) in informal digital prose, but that’s not the same as being entirely unused generally.

Microsoft Word automatically converts dashes to em dashes as soon as you hit space at the end of the next word after the dash.

That's the only way I know how to get an em dash. That's how I create them. I sometimes have to re-write something to force the "dash space <word> space" sequence in order for Word to create it, and then I copy and paste the em dash into the thing I'm working on.

Option shift - in macOS (option - gives you an en dash).

Windows 10/11’s clipboard stack lets you pin selections into the clipboard, so — and a variety of other characters live in mine. And on iOS you just hold down -, of course.

Alt-0151 on the numpad in Windows.

Long-press on the hyphen on most Android keyboards.

Or open whenever "Character Map" application that usually comes with any desktop OS, and copy it from there.


You can Google search "em-dash" then copy/paste from the resulting page.

Ctrl+Shit+U + 2014 (em dash) or 2013 (en dash) in Linux. Former academic here, and I use the things all the time. You can find them all over my pre-LLM publications.

I do—all the time. Why not?

I also use en dashes when referring to number ranges, e.g., 1–9


I didn't know these fancy dashes existed until I read Knuth's first book on typesetting. So probably 1984. Since then I've used them whenever appropriate.

Except for Emily Dickenson, who is an outlier and should not be counted.

Seriously, she used dashes all the time. Here is a direct copy and paste of the first two stanzas of her poem "Because I count not stop for Death" from the first source I found, https://www.poetryfoundation.org/poems/47652/because-i-could...

  Because I could not stop for Death –
  He kindly stopped for me –
  The Carriage held but just Ourselves –
  And Immortality.

  We slowly drove – He knew no haste
  And I had put away
  My labor and my leisure too,
  For His Civility –
Her dashes have been rendered as en dashes in this particular case rather than em dashes, but unless you're a typography enthusiast you might not notice the difference (I certainly didn't and thought they were em dashes at first). I would bet if I hunted I would find some places where her poems have been transcribed with em dashes. (It's what I would have typed if I were transcribing them).

Here is an image of the original manuscript page:

https://www.edickinson.org/editions/1/image_sets/12174893

Dickinson's dashes tended to vary over time, and were not typeset during her lifetime (mostly). Also, mid-19th century usage was different—the em-dash was a relatively new thing.


When I type in English and needs an em dash, I simply switch to the Chinese IME.

Except for highly literate people, and people who care about typography.

Think about it— the robots didn’t invent the em-dash. They’re copying it from somewhere.


My impression of people that say they’re em dash users is that they’re laundering their dunning kruger through AI.

Tell me you never worked with LaTeX and an university style guide without telling me you never worked with LaTeX and an university style guide.

Approximately no one writes internet comments or even articles in LaTeX.

But many have built their writing habits about LaTeX typing, and a – or even an — are hardcoded into their text editors / operating systems, much like other correct diacritics and ligatures may be.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: