How prone are plagiarism-detection tools to false-positives?

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

urza4315
Posts: 40
Joined: Thu Aug 25, 2011 9:49 am UTC

How prone are plagiarism-detection tools to false-positives?

Postby urza4315 » Tue Aug 06, 2013 4:16 am UTC

First let me say I'm NOT trying to commit plagiarism...

But today my prof was talking about how everyone's assignments are assessed with plagiarism-detection tools. Which got me thinking, with regards to programming, how prone would such tools be to detecting false-positives? If you were asked to write a relatively simple program, then surely there are only so many reasonable ways you could go about doing it (supposedly these tools are smart enough to realize if someone's just changed a few variable names here and there). How many different ways could you write a program to do the same thing before they start to look like slightly modified versions of each other? Or do these tools employ other methods to detect copied work?

letterX
Posts: 535
Joined: Fri Feb 22, 2008 4:00 am UTC
Location: Ithaca, NY

Re: How prone are plagiarism-detection tools to false-positi

Postby letterX » Tue Aug 06, 2013 8:11 am UTC

Having been on the using-end of these tools, my impression is that they generally allow a fairly high false-positive rate for a low false-negative rate, so that the tool can report all plausible cases of plagarism to the instructor. Then, it's up to the human to decide if the (much smaller pool of) possible matches are actually cases of plagarism.

If there isn't a human in the loop, then something is very wrong with the process.

Also, since when is there an 'i' in plag-i-arism? Things you learn from spell-check.

User avatar
Xenomortis
Not actually a special flower.
Posts: 1456
Joined: Thu Oct 11, 2012 8:47 am UTC

Re: How prone are plagiarism-detection tools to false-positi

Postby Xenomortis » Tue Aug 06, 2013 8:16 am UTC

I remember at university, for one of our courses, all work was submitted to some online tool that did the plagiarism check. It would fail the upload if it detected "too much".
I can't remember what the consequences were, but no human was involved.

Although this was for a writing course, not a programming one.
Image

User avatar
headprogrammingczar
Posts: 3072
Joined: Mon Oct 22, 2007 5:28 pm UTC
Location: Beaming you up

Re: How prone are plagiarism-detection tools to false-positi

Postby headprogrammingczar » Tue Aug 06, 2013 11:06 am UTC

An easy way to trigger a false positive is to pepper the document with inline citations.
<quintopia> You're not crazy. you're the goddamn headprogrammingspock!
<Weeks> You're the goddamn headprogrammingspock!
<Cheese> I love you

Sheikh al-Majaneen
Name Checks Out On Time, Tips Chambermaid
Posts: 1075
Joined: Fri Jan 01, 2010 5:17 am UTC

Re: How prone are plagiarism-detection tools to false-positi

Postby Sheikh al-Majaneen » Wed Aug 14, 2013 2:28 pm UTC

A professor of mine last semester used a plagiarism detector, but didn't show interest until the plagiarism rating reached 33%. After telling us this, he followed with an anecdote in which he wrote a paper from scratch (with no sources), uploaded it, and received a 25% plagiarism rating. My submissions generally ranged from 10-15%. There was no coding in this class, though.

Poohblah
Posts: 53
Joined: Thu Feb 26, 2009 3:54 am UTC

Re: How prone are plagiarism-detection tools to false-positi

Postby Poohblah » Sat Sep 21, 2013 3:43 am UTC

As somebody else mentioned, a human is a necessary component in a plagiarism detection mechanism. This is because "plagiarism" is not well-defined. The boundary between plagiarism and more benign forms of copying is not really clear.

For instance, what if you really like a title that somebody else used for their novel, so you copy the title but write a totally different and unrelated book? Or what if you're a programmer under a time crunch, and you copy somebody else's code to bootstrap your project and get it off the ground faster? In these cases the substantial, important part of each project may not be the thing that was copied. A computer could easily detect that two titles or blocks of code are identical, but without a human to make a subjective judgement call, neither case can easily be ruled as benign or malicious copying.

On the other hand, copying somebody else's thesis and research would probably be considered plagiarism, but it would be really difficult to for a computer to detect if the one of the two resulting papers is an instance of plagiarism if the wordings and structure of the papers are not identical. However, a human could easily detect this - but it would also be up to the human to determine if one thesis was a direct copy of the other, or if the two are simply similar arguments generated by similar independent research. To give another example, what if you copy your friend's idea to write a science fiction story about a girl who rescues a boy about to commit suicide by exiting a ship's air lock without wearing a space suit, but you change the names of your characters and the setting of the story? It would take a human's judgement to determine whether the stories are similar enough to be considered copies.

Simply put, to say a plagiarism-detection tool "generates false positives" is somewhat misleading, because the job of the tool is not to determine if something is plagiarized - that's the job of a human - but rather to easily and quickly find plagiarism candidates.

benkapparate
Posts: 7
Joined: Fri Oct 11, 2013 5:05 am UTC

Re: How prone are plagiarism-detection tools to false-positi

Postby benkapparate » Fri Oct 11, 2013 5:08 am UTC

I think a simple way to think about this is -- determining a "diff" between two pieces of code is a task whose complexity depends on the syntax of the languages. For example, it's trivial to change variable names in most languages while retaining the behavior of the program. Others allow trivial variations on the AST. So to create a "good" plagiarism detector would be just as hard as a good multi-language compiler + edit distance calculator.

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: How prone are plagiarism-detection tools to false-positi

Postby skeptical scientist » Fri Oct 11, 2013 3:07 pm UTC

benkapparate wrote:I think a simple way to think about this is -- determining a "diff" between two pieces of code is a task whose complexity depends on the syntax of the languages. For example, it's trivial to change variable names in most languages while retaining the behavior of the program. Others allow trivial variations on the AST. So to create a "good" plagiarism detector would be just as hard as a good multi-language compiler + edit distance calculator.

That's often not really what you want to do though, because for a lot of simple tasks, there is essentially a "correct" solution so the only differences between solutions will be things that the compiler ignores or which result in identical program behavior, like white space and variable names. However, white space and variable names are still a strong signal for detecting plagiarism.

It sounds like what you have in mind is someone duplicating a larger and more complicated bit of code from the net and then changing the formatting and variable names to hide their plagiarism, so the only thing copied is the programmatic structure. This is definitely much harder to detect.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson


Return to “Computer Science”

Who is online

Users browsing this forum: No registered users and 8 guests