> [can’t] be reliably detected… only ~90% effective
I’m surprised to see these comments in conjunction, 90% is pretty good, and much higher than i expected. I wonder what’s the breakdown of false positives/false negatives
Edit: from the linked paper
> Of the 90 samples in which AI was used, it correctly identified 77 of
them as having >1% AI generated text, an 86% success rate. The fact that the tool is
more accurate in identifying human-generated text than AI-generated text is by design.
The company realized that users would be unwilling to use a tool that produced
significant numbers of false positives, so they “tuned” the tool to give human writers the
benefit of the doubt.
This all seems exceptionally reasonable. Of the samples with AI, they correctly identify 86%. Of the samples without AI, they correctly identify a higher proportion, because of the nature of their service. This implies that if they _wanted_ to make a more balanced AI detection tool, they could get that 86% somewhat higher.
Anyone expelling a student over a single “ai” label from turnitin alone is a complete idiot. Perhaps that happens occasionally, but that’s clearly the result of horrible decision making that isn’t really turnitins fault.
Anyone who gives 10 seconds of thought to how this could help realizes at 90% it’s a helpful first pass. Motivated students who really want to hide can probably squeak past more often than you’d like. And you know there will be false positives so you do something like:
* review those more carefully, or send it to a TA if you have one to do so
* keep track of patterns of positives from each student over time
* explain to the student it got flagged, say it’s likely a false positive, and have them talk over the paper in person
I’m sure decent educators can figure out how to use a tool like that. The bad ones are going to cause stochastic headaches for their students regardless.
That's not what 90% effective means. Tests don't work that way.
Tests can be wrong in two different ways, false positive, and false negative.
The 90% figure (which people keep rounding up from 86% for some reason, so I'll use that number from now on) is the sensitivity, or the abitity to not have false negatives. If there are 100 cheaters, the test will catch 86 of them, and 14 will get away with it.
The test's false positive rate, how often it says "AI" when there isn't any AI, is 0%, or equivalently, the test's "specificity" is 100%
> Turnitin correctly identified 28 of 30 samples in this category, or 93%. One
sample was rated incorrectly as 11% AI-generated[8], and another sample was
not able to be rated.
The worst that would have happened according to this test is that one student out of 30 would be suspected of AI generating a single sentence of their paper. None of the human authored essays were flagged as likely AI generated.
My son, who just finished his first semester at college, said the thing that surprised him the most was the blatant cheating all around him. He said it is rampant and obvious, and the professors don't seem all that eager to punish it. It pisses him off, because it puts him at a disadvantage because he doesn't want to cheat.
It's from a culture of people who cheat to get ahead, because they come from a society SO competitive, SO cutthroat, and SO obsessed with education & testing that cheating is encouraged and rewarded...because its rewarded in the workplace, in the broader economy (up to a point), and in the political body.
Of course, there's also the Chinese, who cheat because they are international students paying several multiples of the tuition and the university doesn't want to upset that gravy train in the wake of several federal funding cuts. Also because your rank-and-file Chinese students at most American colleges suck at speaking English so they, except in pure STEM, need to cheat in order to pass in the first place.
Problem is when the professors are being assessed on how the students do, instead of how honestly they assess their performance, there's a lot of disincentive to root out cheating. Universities have generally been marking their own homework on this front for a long time, and their morphing into a business which sells degrees has turned this conflict of interest into a real problem.
>90% is pretty good, and much higher than i expected.
Problem with that at scale is those that might skirt by within that 10% might one day be your doctor, your lawyer, or your accountant and you'd never know until it bit you in the ass.
I’m surprised to see these comments in conjunction, 90% is pretty good, and much higher than i expected. I wonder what’s the breakdown of false positives/false negatives
Edit: from the linked paper
> Of the 90 samples in which AI was used, it correctly identified 77 of them as having >1% AI generated text, an 86% success rate. The fact that the tool is more accurate in identifying human-generated text than AI-generated text is by design. The company realized that users would be unwilling to use a tool that produced significant numbers of false positives, so they “tuned” the tool to give human writers the benefit of the doubt.
This all seems exceptionally reasonable. Of the samples with AI, they correctly identify 86%. Of the samples without AI, they correctly identify a higher proportion, because of the nature of their service. This implies that if they _wanted_ to make a more balanced AI detection tool, they could get that 86% somewhat higher.