Rethinking Literary Plagiarism in LLMs through the Lens of Copyright Laws

Huachen Tan; Moming Duan; Duo Liu; Haojie Lu; Yuexin Mu; Longyi Zhou; Ao Ren; Yujuan Tan; Kan Zhong

Rethinking Literary Plagiarism in LLMs through the Lens of Copyright Laws

Huachen Tan, Moming Duan, Duo Liu, Haojie Lu, Yuexin Mu, Longyi Zhou, Ao Ren, Yujuan Tan, Kan Zhong

Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:1000-1015, 2025.

Abstract

The swift advancement of Generative Artificial Intelligence (AI) has outstripped the development of corresponding laws and regulations, highlighting books’ copyright infringement as a significant public concern and sparking numerous legal disputes. Although fair use doctrine exemption for using copyrighted materials in training datasets without the copyright holder’s permission, content generated by such AI systems may still violate copyright laws. Previous research on copyright infringement has primarily focused on character-level analysis, which is narrower in scope compared to the comprehensive requirements of copyright law. To address this challenge, we developed a LLM-based similarity measurement mechanism. We guided the generative AI to produce relevant book content by employing carefully crafted prompts. Subsequently, we created datasets by comparing this generated content with the original texts from famous books. We conducted various experiments, including various similarity detection techniques and plot plagiarism detection. The experimental results show that the AI-generated content (AIGC) is 78.72% similar to the original text, confirming that generative AI has the potential to infringe upon copyrights. Moreover, our study examines copyright infringement issues related to the content generated by generative AI and other domains such as code, images, and licensing. Our research will provide valuable insights for refining laws and regulations about generative AI.

Cite this Paper

BibTeX

@InProceedings{pmlr-v260-tan25c,
  title = 	 {Rethinking Literary Plagiarism in LLMs through the Lens of Copyright Laws},
  author =       {Tan, Huachen and Duan, Moming and Liu, Duo and Lu, Haojie and Mu, Yuexin and Zhou, Longyi and Ren, Ao and Tan, Yujuan and Zhong, Kan},
  booktitle = 	 {Proceedings of the 16th Asian Conference on Machine Learning},
  pages = 	 {1000--1015},
  year = 	 {2025},
  editor = 	 {Nguyen, Vu and Lin, Hsuan-Tien},
  volume = 	 {260},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {05--08 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v260/main/assets/tan25c/tan25c.pdf},
  url = 	 {https://proceedings.mlr.press/v260/tan25c.html},
  abstract = 	 {The swift advancement of Generative Artificial Intelligence (AI) has outstripped the development of corresponding laws and regulations, highlighting books’ copyright infringement as a significant public concern and sparking numerous legal disputes. Although fair use doctrine exemption for using copyrighted materials in training datasets without the copyright holder’s permission, content generated by such AI systems may still violate copyright laws. Previous research on copyright infringement has primarily focused on character-level analysis, which is narrower in scope compared to the comprehensive requirements of copyright law. To address this challenge, we developed a LLM-based similarity measurement mechanism. We guided the generative AI to produce relevant book content by employing carefully crafted prompts. Subsequently, we created datasets by comparing this generated content with the original texts from famous books. We conducted various experiments, including various similarity detection techniques and plot plagiarism detection. The experimental results show that the AI-generated content (AIGC) is 78.72% similar to the original text, confirming that generative AI has the potential to infringe upon copyrights. Moreover, our study examines copyright infringement issues related to the content generated by generative AI and other domains such as code, images, and licensing. Our research will provide valuable insights for refining laws and regulations about generative AI.}
}

Endnote

%0 Conference Paper
%T Rethinking Literary Plagiarism in LLMs through the Lens of Copyright Laws
%A Huachen Tan
%A Moming Duan
%A Duo Liu
%A Haojie Lu
%A Yuexin Mu
%A Longyi Zhou
%A Ao Ren
%A Yujuan Tan
%A Kan Zhong
%B Proceedings of the 16th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Vu Nguyen
%E Hsuan-Tien Lin	
%F pmlr-v260-tan25c
%I PMLR
%P 1000--1015
%U https://proceedings.mlr.press/v260/tan25c.html
%V 260
%X The swift advancement of Generative Artificial Intelligence (AI) has outstripped the development of corresponding laws and regulations, highlighting books’ copyright infringement as a significant public concern and sparking numerous legal disputes. Although fair use doctrine exemption for using copyrighted materials in training datasets without the copyright holder’s permission, content generated by such AI systems may still violate copyright laws. Previous research on copyright infringement has primarily focused on character-level analysis, which is narrower in scope compared to the comprehensive requirements of copyright law. To address this challenge, we developed a LLM-based similarity measurement mechanism. We guided the generative AI to produce relevant book content by employing carefully crafted prompts. Subsequently, we created datasets by comparing this generated content with the original texts from famous books. We conducted various experiments, including various similarity detection techniques and plot plagiarism detection. The experimental results show that the AI-generated content (AIGC) is 78.72% similar to the original text, confirming that generative AI has the potential to infringe upon copyrights. Moreover, our study examines copyright infringement issues related to the content generated by generative AI and other domains such as code, images, and licensing. Our research will provide valuable insights for refining laws and regulations about generative AI.

APA

Tan, H., Duan, M., Liu, D., Lu, H., Mu, Y., Zhou, L., Ren, A., Tan, Y. & Zhong, K.. (2025). Rethinking Literary Plagiarism in LLMs through the Lens of Copyright Laws. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:1000-1015 Available from https://proceedings.mlr.press/v260/tan25c.html.

Rethinking Literary Plagiarism in LLMs through the Lens of Copyright Laws

Abstract

Cite this Paper

Related Material