DOCUMENT PLAGIARISM DETECTION USING DAMERAU LEVENSHTEIN ALGORITHM AND QUERY EXPANSION

Christian Sri Kusuma Aditya

Abstract


Plagiarism is considered a criminal act provided in the Law of the Republic of Indonesia Number 19 Year 2002 about Copyright, therefore plagiarism activities should be avoided. Computerized plagiarism detection needs to be done to reduce plagiarism against other people's work rapidly. One of the plagiarist's efforts in avoiding the existing plagiarism detection application is manipulating documents by replacing many words with synonyms. This research propose a design of plagiarism detection application by considering synonyms of words in documents, so that it can recognize documents even though the content is different, but the context is same. From the experiments that has been conducted, by comparing Damerau Levenshtein and Levenshtein Algorithm, the similarity values are relative similar for the handling of test case overall. But the handling of the case documents for typographical errors, Damerau Levenshtein Distance can recognize plagiarism documents better indicated by a higher similarity value 77,32%. And after integrated with query expansion, it can get significant result by detecting synonym words, so the application can detect document plagiarism more quite well Plagiarism is considered a criminal act provided in the Law of the Republic of Indonesia Number 19 Year 2002 about Copyright, therefore plagiarism activities should be avoided. Computerized plagiarism detection needs to be done to reduce plagiarism against other people's work rapidly.One of the plagiarist's efforts in avoiding the existing plagiarism detection application is manipulating documents by replacing many words with synonyms. This research propose a design of plagiarism detection application by considering synonyms of words in documents, so that it can recognize documents even though the content is different, but the context is same.From the experiments that has been conducted, by comparing Damerau Levenshtein and Levenshtein Algorithm, the similarity values are relative similar for the handling of test case overall. But the handling of the case documents for typographical errors, Damerau Levenshtein Distance can recognize plagiarism documents better indicated by a higher similarity value 77,32%. And after integrated with query expansion, it can get significant result by detecting synonym words, so the application can detect document plagiarism more quite well

Keywords


Damerau Levenshtein Distance, plagiarism document, string matching

Full Text:

PDF

References


Nurhayati, Busman, "Development of document plagiarism detection software using levensthein distance algorithm on Android smartphone". Cyber and IT Service Management (CITSM), 2017 5th International Conference on, pp. 1-6.

Christanti, Viny M., and Dali S. Naga. "Fast and Accurate Spelling Correction Using Trie and Damerau-levenshtein Distance Bigram." TELKOMNIKA 16.2 (2018): 827-833.

Pasca, Marius A., and Sandra M. Harabagiu. "High performance question/answering." Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2001.

Sastroasmoro, Sudigdo. "Beberapa Catatan Tentang Plagiarisme." Majalah Kedokteran Indonesia 57.8 (2007): 239-244.

Lanin, I. (2009). Kateglo. Retrieved 2015, from https://ivanlanin.wordpress.com/2009/06/11/kateglo/

Frederick J Damerau. A technique for computer detection and correction of spelling errors. Communications of the ACM. 1964; 7(3): 171-176.

Su, Zhan, et al. "Plagiarism detection using the Levenshtein distance and Smith-Waterman algorithm." Innovative Computing Information and Control, 2008. ICICIC'08. 3rd International Conference on. IEEE, 2008.




DOI: https://doi.org/10.22219/sentra.v0i4.2310

Refbacks

  • There are currently no refbacks.


Seketariat

Fakultas Teknik

Universitas Muhammadiyah Malang Kampus III

Jl. Raya Tlogomas 246 Malang, 65144