Project – Empirical Analysis of Git Repositories before and after ChatGPT/LLMs

Over the past few years, the introduction of large language models (LLMs) like ChatGPT has changed coding practices and outcomes. This project aims to empirically analyze these changes by comparing various metrics in git repositories from before and after the adoption of LLMs. 

The project includes the following tasks: 

  • Literature review of existing work, approaches, and tools related to the impact of LLMs on software development. 
  • Define and select relevant metrics to analyze, such as code quality, complexity, comments/documentation, expressiveness, and general coding behavior. 
  • Collect and preprocess data from git repositories spanning multiple years, ensuring a clear division between pre-LLM and post-LLM periods. 
  • Conduct quantitative analysis to compare the selected metrics before and after the adoption of LLMs. 
  • Analyze the results to identify trends, improvements, or regressions in code quality, complexity, documentation, and coding behavior. 

 

Please contact Isabelle Cuber in case of interest, and read our collaboration guidelines beforehand: ➡️ Working on Projects/Theses with HASEL