Detecting Text Similarity: A Machine Learning Approach to Plagiarism Checking

19 Feb

Detecting Text Similarity: A Machine Learning Approach to Plagiarism Checking

Authors- Femenca Noronha, Kaif Khan

Abstract-This research introduces Plagiarism Checker, an advanced plagiarism detection system that utilises machine learning and NLP algorithms to efficiently detect textual similarities. Using TF-IDF for feature extraction and cosine similarity, the system ensures high accuracy in identifying plagiarism within documents. Correlating with this, Jaccard similarity and n-gram analysis can highlight common word patterns in documents as well as identify paraphrased text. Built with Flask, the web interface allows for the seamless upload of documents as well as analysis. To improve these accuracy findings, understanding the text pre-processing techniques such as tokenization, stopword removal, and lemmatization is important. The error of processing AI-generated text where obfuscation techniques are used is addressed within the research. Future updates will see the incorporation of deep learning models such as LSTM and transformers to enhance the detection capabilities of Plagiarism checkers. The contribution of the research to the advancement of automated plagiarism detection is to ensure originality as well as academic integrity.

DOI: /10.61463/ijset.vol.13.issue1.156