Can the use of an online plagiarism-detection system reduce plagiarism in student research papers?

The paper "Plagiarism and Technology: A Tool for Coping with Plagiarism" describes a study in which randomly selected research papers submitted by students during five semesters were analyzed for plagiarism. For each paper, the percentage of plagiarized words in the paper was determined by an online analysis. In each of the five semesters, students were told during the first two class meetings that they would have to submit an electronic version of their research papers and that the papers would be reviewed for plagiarism.

Suppose that the number of papers sampled in each of the five semesters and the means and standard deviations for percentage of plagiarized words are as given in the accompanying table.

\begin{tabular}{|c|c|c|c|}
\hline Semester & [tex]$n$[/tex] & Mean & Standard deviation \\
\hline 1 & 36 & 6.33 & 3.75 \\
\hline 2 & 41 & 3.33 & 3.09 \\
\hline 3 & 31 & 1.77 & 3.25 \\
\hline 4 & 31 & 1.81 & 3.12 \\
\hline 5 & 33 & 1.50 & 2.38 \\
\hline
\end{tabular}

For purposes of this exercise, assume that the conditions necessary for the ANOVA F test are reasonable. Do these data provide evidence to support the claim that the mean percentage of plagiarized words is not the same for all five semesters? Test the appropriate hypotheses using [tex]$\alpha=0.05$[/tex].

1. Calculate the test statistic. (Round your answer to two decimal places.)
[tex]\[ F = \square \][/tex]

2. Use technology to find the [tex]$P$[/tex]-value. (Round your answer to four decimal places.)
[tex]\[ P\text{-value} = \square \][/tex]

3. What can you conclude?



Answer :

Sure, let's go through the steps to test if the mean percentage of plagiarized words is different across the five semesters using ANOVA (Analysis of Variance).

### Step 1: State the Hypotheses

- Null Hypothesis (H₀): The mean percentage of plagiarized words is the same for all five semesters.
- Alternative Hypothesis (H₁): At least one semester has a different mean percentage of plagiarized words compared to the others.

### Step 2: Gather the Data

Given data:
- Sample sizes ([tex]\(n\)[/tex]): [36, 41, 31, 31, 33]
- Sample means: [6.33, 3.33, 1.77, 1.81, 1.50]
- Sample standard deviations: [3.75, 3.09, 3.25, 3.12, 2.38]

### Step 3: Calculate the Overall Mean

The overall mean percentage of plagiarized words is calculated by averaging the sample means:

[tex]\[ \text{Overall Mean} = \frac{6.33 + 3.33 + 1.77 + 1.81 + 1.50}{5} = 2.948 \][/tex]

### Step 4: Calculate SSB (Sum of Squares Between Groups)

The formula for SSB is:

[tex]\[ SSB = \sum_{i=1}^{k} n_i (\bar{X}_i - \bar{X})^2 \][/tex]

Where [tex]\( n_i \)[/tex] is the sample size of the [tex]\(i\)[/tex]-th group, [tex]\( \bar{X}_i \)[/tex] is the mean of the [tex]\(i\)[/tex]-th group, and [tex]\( \bar{X} \)[/tex] is the overall mean.

[tex]\[ SSB = 36(6.33 - 2.948)^2 + 41(3.33 - 2.948)^2 + 31(1.77 - 2.948)^2 + 31(1.81 - 2.948)^2 + 33(1.50 - 2.948)^2 = 570.1 \][/tex]

### Step 5: Calculate SSW (Sum of Squares Within Groups)

The formula for SSW is:

[tex]\[ SSW = \sum_{i=1}^{k} (n_i - 1) s_i^2 \][/tex]

Where [tex]\(s_i\)[/tex] is the standard deviation of the [tex]\(i\)[/tex]-th group.

[tex]\[ SSW = (36-1)3.75^2 + (41-1)3.09^2 + (31-1)3.25^2 + (31-1)3.12^2 + (33-1)2.38^2 = 1664.28 \][/tex]

### Step 6: Calculate Degrees of Freedom

- Between Groups ([tex]\( df_{B} \)[/tex]): [tex]\(k - 1 = 5 - 1 = 4\)[/tex]
- Within Groups ([tex]\( df_{W} \)[/tex]): [tex]\( \sum (n_i - 1) \)[/tex]

[tex]\[ df_{W} = (36-1) + (41-1) + (31-1) + (31-1) + (33-1) = 171 \][/tex]

### Step 7: Calculate Mean Squares

- Mean Square Between Groups ([tex]\( MSB \)[/tex]): [tex]\( \frac{SSB}{df_{B}} \)[/tex]

[tex]\[ MSB = \frac{570.1}{4} = 142.53 \][/tex]

- Mean Square Within Groups ([tex]\( MSW \)[/tex]): [tex]\( \frac{SSW}{df_{W}} \)[/tex]

[tex]\[ MSW = \frac{1664.28}{171} = 9.73 \][/tex]

### Step 8: Calculate the F-statistic

[tex]\[ F = \frac{MSB}{MSW} = \frac{142.53}{9.73} = 14.3 \][/tex]

### Step 9: Determine the p-value

Using the F-distribution and the degrees of freedom, dfb = 4 and dfw = 171, the p-value can be found using statistical software or F-tables.

[tex]\[ P\text{-value} \approx 0.0000 \][/tex]

### Step 10: Conclusion

At the [tex]\( \alpha = 0.05 \)[/tex] level of significance, we compare the p-value to [tex]\( \alpha \)[/tex]:
- Since [tex]\( p \approx 0.0000 < 0.05 \)[/tex], we reject the null hypothesis.

Conclusion: There is significant evidence to suggest that the mean percentage of plagiarized words is not the same for all five semesters. This indicates that the use of an online plagiarism detection system may have an effect on reducing plagiarism over time.