Monday, October 29, 2007

Plagiarism and technology

The following is something I wrote for RISKS forum that I thought others might be interested in. A recent discussion on the USACM (Public Policy Committee of the Association for Computing Machinery) mailing list triggered these thoughts.

It's obvious that the availability of so much information online makes plagiarism easier - it's impossible for a reader to know everything that could have been used without permission or attribution. On the flip side, things like Google make it easier to find suspected instances - as an example, when I'm reviewing an article for a journal or conference, I frequently put phrases in to Google that I suspect are stolen, and have on numerous instances found that they were in fact taken verbatim without attribution. [Hint to the plagiarist: if you're going to use someone else's words without attribution, make sure they fit with your writing style. This is particularly notable when choosing text written by someone with a different native language than your own - if your native language is English and you copy something written by a native Chinese speaker, it will be fairly obvious; the converse is also obviously true.]

For high school and college students, technology like TurnItIn is one way of finding plagiarism without teachers having to do extensive searching. Although I haven't personally seen the output, my understanding is that the student submits text which is automatically analyzed, and potential instances of plagiarism are noted in a message to the teacher. (If someone could provide a better explanation, I'd certainly appreciate it! I noticed that TurnItIn now put emphasis on improving students' writing style, perhaps as a way to give students a feeling that they're getting something out of the deal.)

There are several problems with products of this sort:

(1) False positives. When my daughter was in high school, she noted several times that TurnItIn considered her a plagiarist because it was unable to distinguish between properly quoted/referenced text, and unauthorized copying. Teachers who simply look at the overall "score" without reading the individual comments will tend to penalize those students who do the best job of citing background work! (I'm reasonably sure that TurnItIn is sufficiently cautious as not to deny that there are false positives, and to strongly encourage teachers and students to examine the results rather than simply believing them verbatim.)

(2) Copyright infringement. TurnItIn keeps copies of student papers in their database, for matching against future papers. This seems reasonable at first blush - after all, selling term papers is an old tradition, dating back well before the Web (although today's students may not believe that)! However, by keeping submissions for matching, TurnItIn may be violating copyright, as a recent lawsuit claims (see "McLean Students Sue Anti-Cheating Service", Washington Post, March 29 2007). Additionally, students have effectively no option to refuse adding their papers to the database, and are not compensated for their submissions.

So to bring this to RISKS, the issue is that we have competing risks: the risk of plagiarism being combated by TurnItIn and similar products vs. the risk of unfair accusations of plagiarism and copyright infringement - all of which is enabled by technology.


Blogger Bob said...

I am a college teacher and user of I've used it for several years for term papers, and occasionally for shorter papers. I am very familiar with what teachers see when they use this product or its competitors.

False positives should never be a problem. and its various competitors do not detect plagiarism; they detect similarity of text in the student's paper to text found elsewhere: on the Web, in certain publications, and in previously-submitted papers. The teacher must then read the paper, checking for proper citation, and where appropriate, proper quotation. A teacher who does not do this is both lazy and intellectually dishonest.

It is perhaps unfortunate that Turnitin produces a "similarity score" that's expressed as a percentage of text that is similar to text found elsewhere because it can facilitate lazy and intellectually dishonest behavior by teachers. However, it does help teachers in detecting something that's bad, but not plagiarism: the cut-and-paste paper. In such a paper, everything is cited and quoted properly, it's just that none of it, with the possible exception of some glue sentences, was written by the student. The material went through the Windows clipboard and not through the student's mind; no learning took place. I tell my students that the cut-and-paste paper is not plagiarism, but neither is it evidence of learning, and the best grade such a paper can earn is a D-minus. (I also help them to write good papers by talking and writing about the process.)

The argument that requiring submittal to violates copyright is a red herring. Does the student who solves a series of math problems assigned by the teacher hold copyright in the answers? Of course not! I assign short ethics cases and the students write answers. That's more complicated because there is both a right answer and the expression of it. I'd argue that the student who gets the right answer has exhibited evidence of learning, but has not done creative work. In the case of a term paper or creative writing assignment, the student has (we hope) done some creative work, but it is generally work that would never have been done but for the assignment. It is a work made for hire, and the payment is evaluation by the teacher and a grade.

Further, never "publishes" the papers that are uploaded, and publication is of the essence of copyright infringement. Teacher and student get to see the analysis, but no one else does. The only way to get to see what's in such a paper is to submit later a paper that is, at least in part, substantially identical. Those parts that are identical are called out, but what is highlighted is material in the *newly submitted* paper, not material in the stored paper. does provide contact information for the teacher whose student submitted the original paper, and that teacher may then possibly release a copy if allowed by the school's policies and procedures.

I have not yet had a student object to using on intellectual property grounds. If ever I do, I will ask how much money the student expects to make from the sale of the paper and whether the student would want a third party to earn a good grade by submitting a copy of the student's paper as his own.

(I am aware of the court cases. A Pennsylvania court decided that caller ID was an illegal wiretap, too. This issue is not yet decided, at least in the United States.)

The real value of a service like Turnitin is not in detecting plagiarism. I can do that better than any computer system I've seen so far because I know my students' intellectual capacities and writing styles. I have, in fact, detected plagiarism not detected by

The real value is in plagiarism prevention. Students do not believe that I can detect writing that's not their own. They do, however, believe that "the computer" can detect similarity with text on the Web, and the student who is tempted, but knows the paper will be submitted to, is more likely to make a good decision than a bad one. While I have not done a controlled study, I have observed fewer instances of plagiarism when is used in a class than when it is not, and that is what's valuable.

5:24 PM  
Blogger Gregor Ronald said...

I am the Turnitin administrator for a university in New Zealand, and I thoroughly concur with Bob's comments, especially the need for discretion and careful interpretation of the results - you cannot just do a blanket "over 25% is bad" assessment. Turnitin is an indicator that the instructor should take a closer look at the student's work, it is not infallible by any means.

I can add a couple of minor details; there is an option for instructors to ignore quoted text and bibliographies, so the instances of legitimate copying (e.g. quoting an author in English, quoting legislation in a Law course) are ignored. Secondly, we very quickly realised that when student papers are similar to ones that came from our campus, our lecturers may receive an email from Turnitin asking if they are prepared to release the text of the original. We advise staff to ignore or decline these emails, in case of copyright/privacy concerns.

If you want to see screenshots of Turnitin reports etc, visit

2:39 PM  
Blogger wells said...

At my daughter's high school in Colorado, several years back, a teacher checked the essay of a top student for plagiarism using or similar site. The essay came back as identical to another essay. After investigation, it turned out the top student's essay was perfectly legit - however the student ( he must have thought it was pretty good) had posted his essay on a site selling essays, causing the "plagiarism" hit.

Ian Wells,
Christchurch, New Zealand

7:26 PM  
Blogger Kula said...

I have a comment and a quetion. First, I am using turnitin as a student in college. Last week I researched a topic, quoed and cited two sources and paraphrased a couple others all with proper (or as best I knew how) citation. The system said I had a similarity (including my biblio and quoted) of 33%. Most of these similarities were from students a few years ago in other universities I never heard of. Some of these were cases where three words out of a sentance were picked out as similar but were not even adjacent.Example: "there...was...a...". One of the top similarity was the quoted material I used from the site I got it from! Now it would have been plagarism if I used it and didn't cite it, but in this case I did. I think the software should be able to be set to recognize properly cited material (and possibly offer help to students with citation if they make a mistake)and not count it in the total score. Also, my biblio with the citations was called similar to other students in the same school (we apparently used the same sources unknowingly) and other universities. I decided to revise my paper voluntarily afterwards because of the campus's policy of "25% and below". However, I feel that my work was invalidated just because I used some similar word phrases that other students used.

My question is if I decide to later publish my own paper on a personal web site (which I am in the process of copywriting), will I have the legal ownership to my work. I have already published online some of my papers from high school and earlier college. Tahkns for your time in considering this.

7:00 PM  

