Dokumentasi Kode Otomatis Menggunakan AI

Ryeisa Taskia, Laela Kurniawati, Muhammad Bagus Andra

Abstract


This study aims to evaluate the performance of four artificial intelligence models—CodeT5, CodeBERT, StarCoder, and GPT-4 (simulated)—in the code summarization task, which involves generating summaries or documentation for simple Python code snippets. The dataset consists of Python comment and code pairs, processed into documentation–code format to support the summarization process. The evaluation was conducted using BLEU and ROUGE-L metrics to measure the agreement between the model-generated summaries and the original documentation. The results show that GPT-4 (simulated) performed best with a BLEU score of 0.61 and ROUGE-L of 0.72, indicating superior context understanding capabilities. Among the open-source models, CodeT5 achieved the highest performance (BLEU 0.42 and ROUGE-L 0.55). CodeBERT produced an intermediate score, while StarCoder obtained the lowest score because its optimization is more geared towards code completion than code summarization. This study concludes that model selection should be tailored to the needs. CodeT5 is recommended for implementing open-source automated documentation systems, offering a good balance between performance and accessibility. Meanwhile, GPT-4 can be used as a reference model for high-accuracy applications. This research contributes to the field of software engineering by highlighting the potential of AI models to improve the efficiency and automation of code documentation processes.

Keywords


Code Summarization, Code Documentation, CodeT5, CodeBERT, StarCoder, GPT-4, BLEU, ROUGE-L.

Full Text:

PDF

References


Ahmad, W. U., Chakraborty, S., Ray, B., Chang, K.-W., 2020. A Transformer-based Approach for Source Code Summarization. Conference: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4998-5007.

Allamanis, M., Brockschmidt, M., Khademi, M., 2018. Learning to Represent Programs with Graphs. International Conference on Learning Representations

Barke, S., James, M., Polikarpova, N., 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proceedings of the ACM on Programming Languages. 7(OOPSLA1), 85-111.

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Liu, T., Jiang, D., Zhou, M., 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. Conference: Findings of the Association for Computational Linguistics, EMNLP 2020, 1536-1547.

Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L., 2016. Summarizing Source Code using a Neural Attention Model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, 2073–2083

McBurney, P. W., 2015. Automatic Documentation Generation via Source Code Summarization. Conference: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), 903-906.

McConnell, C. R., 2004, Interpersonal Skills, What They Are, How to Improve Them, and How to Apply Them. Health Care Management, 23, 177-187.

OpenAI, 2023. ChatGPT (Mar 14 Version) [Large Language Model].

Pressman, R., Maxim, B., 2020. Software Engineering: A Practitioner's Approach, 9th Edition. McGraw Hill

Papineni, K., Roukos, S., Ward, T., Zhu, W. J., 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Liguistics, 311-318

Ramadhani, F., 2024. Penggunaan Metode Natural Language Processing dalam Penerjemahan Otomatis, Logicloom.id, 1(9), 1-21

Sommerville, I., 2016, Software Engineering. 10th Edition, Pearson Education Limited, Boston.

Tufano, M., Watson, C., Bavota, G., Penta, M. D., White, M., Poshyvanyk, D., 2019. An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation, ACM Transactions on Software Engineering and Methodology (TOSEM), 28(4), 1-29

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I., 2017, Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000-6010.

Velaga, S. P., 2020. Ai-Assisted Code Generation and Optimization: Leveraging Machine Learning to Enhance Software Development Processes. International Journal of Innovations in Engineering Research and Technology. 7(9), 177-186.

Wang, Y, Wang, W., Joty, S., Hoi, S., 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. Conference: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 8696-8708.

Yang, H., Kim, J., Lee, W., 2023. Analyzing the Alignment between AI Curriculum and AI Textbooks through Text Mining. Applied Sciences, 13(18), 10011




DOI: http://dx.doi.org/10.33087/jiubj.v26i1.6385

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

ADRESS JOURNAL

JURNAL ILMIAH UNIVERSITAS BATANGHARI JAMBI (JIUBJ)
Published by Lembaga Penelitian dan Pengabdian kepada Masyarakat
Adress: Jl.Slamet Ryadi, Broni-Jambi, Kec.Telanaipura, Kodepos: 36122, email: jiubj.unbari@gmail.com, Phone: 0741-670700

Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.