Journal of Software Engineering Research and Development, 2023, 11:5, doi: 10.5753/jserd.2023.2582  This work is licensed under a Creative Commons Attribution 4.0 International License.. Naming Practices in Object-oriented Programming: An Empirical Study Remo Gresta  [ Federal University of São João del-Rei | remoogg@aluno.ufsj.edu.br ] Vinicius Durelli  [ Federal University of São João del-Rei | durelli@ufsj.edu.br ] Elder Cirilo  [ Federal University of São João del-Rei | elder@ufsj.edu.br ] Abstract Currently, research indicates that comprehending code takes up far more developer time than writing code. Given that most modern programming languages place little to no limitations on identifier names, and so developers are allowed to choose identifier names at their own discretion, one key aspect of code comprehension is the naming of identifiers. Research in naming identifiers shows that informative names are crucial to improving the readability and maintainability of programs: essentially, intention-revealing names make code easier to understand and act as a basic form of documentation. Poorly named identifiers tend to hurt the comprehensibility and maintainability of software systems. However, most computer science curricula emphasize programming concepts and language syn- tax over naming guidelines and conventions. Consequently, programmers lack knowledge about naming practices. This article is an extension of our previous study on naming practices. Previously, we set out to explore naming practices of Java programmers. To this end, we analyzed 1,421,607 identifier names (i.e., attributes, parameters, and variables names) from 40 open-source Java projects and categorized these names into eight naming practices. As a follow-up study to further investigate naming practices, we examined 40 open-source C++ projects and catego- rized 1,181,774 identifier names according to the previously mentioned eight naming practices. We examined the occurrence and prevalence of these categories across C++ and Java projects and our results also highlight in which contexts identifiers following each naming practice tend to appear more regularly. Finally, we also conducted an online survey questionnaire with 52 software developers to gain insight from the industry. All in all, we believe the results based on the analysis of 2,603,381 identifier names can be helpful to enhance programmers awareness and contribute to improving educational materials and code review methods. Keywords: Naming Identifiers, Program Comprehension, Mining Software Repositories 1 Introduction Reading and comprehending source code plays a vital role in software development (Allamanis et al., 2014). Evidences suggest that choosing proper names to identifiers in software systems can positively impact code comprehension (Lawrie et al., 2007b; Fakhoury et al., 2018; Oliveira et al., 2020). Although giving meaningful names to identifiers is a widely accepted best practice, coming up with proper names is chal- lenging (Deissenboeck and Pizka, 2006). As stated by Host and Ostvold (2007), even though naming is part of daily life for programmers, it entails a great deal of time and thought: names should convey to others the purpose of the code (Martin, 2008) and reflect the meaning of domain con- cepts (Marcus et al., 2004). Meaningful identifier names are key to bridging the gap between intention and implemen- tation (Wainakh et al., 2021). Therefore, given that poorly chosen identifier names might hinder source code compre- hension (Schankin et al., 2018), using meaningful identifier names is a recommended practice present in several coding style guides and conventions. According to the Java language naming conventions1, names should be “short yet meaningful”. In a similar fash- ion, Google C++ style guide2 states that names should be “as descriptive as possible”. Martin (2008) argues that pro- grammers should choose intention-revealing names as a way 1oracle.com/java/technologies/javase/ codeconventions-namingconventions.html 2google.github.io/styleguide/cppguide.html to avoid disinformation. He also advocates that names have to contain meaningful distinctions and be descriptive (not abbreviated). The GNU Coding Standards3 posit that pro- grammers should not “choose terse names – instead, [they should] look for names that give useful information about the meaning of the variable”. Although programming communi- ties and internationally renowned experts have proposed best practices related to naming identifiers, little is known about the extent to which programmers follow these naming prac- tices (Arnaoudova et al., 2016). We argue that without proper guidance, programmers are more prone to resort to less than ideal naming practices as using number series or noise words. For example, bad nam- ing practices can foster the sense that names as Person person1 and Person person2 are intuitive and understand- able. Careless naming practices might hinder not only code comprehension but also overall team communication. There- fore, we argue that it is crucial for software engineering re- searchers to learn how to support programmers by under- standing how naming practices are used “in the wild” and, through this better understanding, defining naming guide- lines for educational materials (Charitsis et al., 2021) and code review (Nyamawe et al., 2021). In our previous study (Gresta et al., 2021), we set out to investigate naming practices in the context of Java programs, thus we looked only into Java programmer’s name attributes, parameters, and variables. This article is an extension of our previous work on naming practices in which we also inves- 3www.gnu.org/prep/standards/ https://orcid.org/0009-0007-7178-6759 mailto:remoogg@aluno.ufsj.edu.br https://orcid.org/0000-0002-5768-1850 mailto:durelli@ufsj.edu.br https://orcid.org/0000-0003-1464-2314 mailto:elder@ufsj.edu.br oracle.com/java/technologies/javase/codeconventions-namingconventions.html oracle.com/java/technologies/javase/codeconventions-namingconventions.html google.github.io/styleguide/cppguide.html www.gnu.org/prep/standards/ Gresta et al. 2023 Table 1. Java programs used in our experiment. Project LoC Contributors Commits Kings Median Ditto Cognome Diminutive Shorten Index Total Total % Total. % Total. % Total. % Total. % Total. % Total. % aeron 108,442 86 14,409 606 6.34 450 4.71 5,205 54.46 933 9.76 1,932 20.21 114 1.19 318 3.33 9,558 androidutilcode 39,030 32 1,317 179 7.74 21 0.91 1,170 50.56 385 16.64 73 3.15 77 3.33 409 17.68 2,314 archunit 100,276 49 1,499 91 3.07 16 0.54 1,744 58.86 596 20.11 303 10.23 9 0.30 204 6.88 2,963 boofcv 650,019 14 4,520 7,483 23.19 1,696 5.26 1,573 4.87 266 0.82 880 2.73 1,354 4.20 19,017 58.93 32,269 butterknife 13,279 97 1,016 135 21.95 8 1.30 358 58.21 68 11.06 14 2.28 4 0.65 28 4.55 615 corenlp 581,374 107 16,280 2,372 9.53 831 3.34 4,281 17.20 3,864 15.52 610 2.45 1,622 6.52 11,310 45.44 24,890 dropwizard 74,215 364 5,789 53 1.85 14 0.49 1,993 69.64 343 11.98 269 9.40 29 1.01 161 5.63 2,862 dubbo 179,477 386 4,681 754 6.39 81 0.69 6,983 59.19 1,096 9.29 644 5.46 369 3.13 1,870 15.85 11,797 eventbus 8,369 20 507 4 1.33 0 0.00 195 65.00 59 19.67 23 7.67 1 0.33 18 6.00 300 fastjson 179,996 158 3,863 8,205 49.88 77 0.47 4,255 25.87 1,264 7.68 243 1.48 387 2.35 2,019 12.27 16,450 glide 76,418 129 2,583 105 2.77 22 0.58 2,442 64.47 629 16.61 194 5.12 45 1.19 351 9.27 3,788 guice 72,980 59 1,931 178 2.85 46 0.74 3,871 61.92 1,043 16.68 216 3.45 51 0.82 847 13.55 6,252 hdiv 30,631 11 1,086 106 9.72 11 1.01 573 52.52 63 5.77 177 16.22 31 2.84 130 11.92 1,091 ical4j 24,130 35 2,303 132 11.22 15 1.28 682 57.99 167 14.20 48 4.08 2 0.17 130 11.05 1,176 j2objc 1,810,274 75 5,284 5,523 10.13 866 1.59 9,302 17.06 4,750 8.71 1,276 2.34 3,978 7.30 28,827 52.87 54,522 jenkins 175,150 654 31,156 658 6.15 161 1.51 3,273 30.61 794 7.43 314 2.94 185 1.73 5,308 49.64 10,693 jtk 204,105 9 1,373 2,627 13.03 4,557 22.60 1,008 5.00 55 0.27 37 0.18 1,068 5.30 10,813 53.62 20,165 junit4 31,242 151 2,474 55 3.15 18 1.03 985 56.38 248 14.20 32 1.83 47 2.69 362 20.72 1,747 keywhiz 23,337 32 1,538 89 5.67 23 1.46 1,036 65.99 178 11.34 90 5.73 14 0.89 140 8.92 1,570 libgdx 272,510 505 14,661 49,315 47.83 21,653 21.00 11,800 11.44 1,831 1.78 2,041 1.98 2,252 2.18 14,215 13.79 103,107 litiengine 75,877 20 3,324 316 11.86 46 1.73 771 28.94 448 16.82 253 9.50 21 0.79 809 30.37 2,664 lottie-android 16,258 102 1,292 80 7.41 104 9.64 442 40.96 145 13.44 126 11.68 21 1.95 161 14.92 1,079 mockito 55,751 220 5,523 234 9.87 12 0.51 1,288 54.35 285 12.03 126 5.32 38 1.60 387 16.33 2,370 mpandroidchart 25,232 69 2,068 134 6.85 36 1.84 385 19.69 232 11.87 155 7.93 38 1.94 975 49.87 1,955 nutch 141,710 43 3,215 236 7.68 28 0.91 1,353 44.01 467 15.19 113 3.68 164 5.34 713 23.19 3,074 okhttp 48,465 235 4,848 455 16.01 39 1.37 1,902 66.92 161 5.67 126 4.43 21 0.74 138 4.86 2,842 orienteer 55,681 12 2,274 63 2.68 27 1.15 1,122 47.77 584 24.86 395 16.82 22 0.94 136 5.79 2,349 picasso 9,136 97 1,368 64 8.82 36 4.96 546 75.21 27 3.72 10 1.38 7 0.96 36 4.96 726 rest-assured 73,511 105 2,020 121 5.85 32 1.55 1,440 69.57 288 13.91 107 5.17 14 0.68 68 3.29 2,070 rest.li 523,972 89 2,617 2,158 9.26 533 2.29 10,054 43.16 4,712 20.23 3,458 14.84 237 1.02 2,143 9.20 23,295 retrofit 26,513 152 1,865 60 2.49 7 0.29 1,691 70.14 352 14.60 18 0.75 6 0.25 277 11.49 2,411 riptide 27,072 18 2,131 4 0.52 0 0.00 650 85.08 22 2.88 46 6.02 8 1.05 34 4.45 764 rxjava 468,957 277 5,877 2,371 10.25 34 0.15 4,275 18.48 573 2.48 115 0.50 373 1.61 15,387 66.53 23,128 spring-boot 343,138 804 32,096 443 2.74 95 0.59 10,868 67.24 1,354 8.38 3,002 18.57 91 0.56 309 1.91 16,162 tomcat 343,703 61 23,140 1,142 6.68 263 1.54 7,374 43.16 1,675 9.80 696 4.07 846 4.95 5,089 29.79 17,085 twelvemonkeys 99,418 42 1,334 379 8.43 123 2.73 912 20.28 808 17.96 588 13.07 327 7.27 1,361 30.26 4,498 unirest-java 15,979 43 1,603 12 1.75 1 0.15 310 45.19 58 8.45 23 3.35 22 3.21 260 37.90 686 webmagic 12,926 40 1,119 28 2.87 3 0.31 763 78.26 80 8.21 27 2.77 10 1.03 64 6.56 975 xchart 24,406 50 1,451 119 7.93 31 2.07 628 41.84 338 22.52 50 3.33 26 1.73 309 20.59 1,501 zxing 107,064 109 3,582 208 9.78 137 6.44 695 32.68 267 12.55 108 5.08 157 7.38 555 26.09 2,127 Total 7,111,470 5,519 217,869 87,297 20.79 32,153 7.65 110,198 26.24 31,508 7.50 18,958 4.51 14,088 3.35 125,688 29.93 419,890 tigate name practices in the context of C++ programs. To investigate how C++ and Java programmers name attributes, parameters, and variables we carried out an empirical study in which we analyzed 1,421,607 identifier names from 40 open-source Java projects and 1,181,774 identifier names from 40 open-source C++ projects. We performed reposi- tory mining to determine how often eight categories of nam- ing practices are within and across these projects. We also looked at how prevalent these naming practices are in cer- tain code contexts (i.e., ATTRIBUTE, PARAMETER, METHOD, FOR, WHILE, IF, and SWITCH). In this extended version, our results are based on two large samples of programs: the previous version of this study an- alyzed 40 open-source Java programs, and results from this extended version of the article also include the analysis of 40 open-source C++ projects. Moreover, to understand the industry practices, we conducted an online survey question- naire to gain insight from software programmers. Through- out a survey, we gathered quantitative data on programmers’ perceptions about the use and occurrence of the investigated naming practices. The online survey questionnaire ran from November 2021 to January 2022 and had 52 responses. This extended version of our study makes the following contributions: • Our results show that the naming practice categories (Kings, Median, Ditto, Diminutive, Cognome, Shorten, Index and Famed) appear in all 80 open-source projects and are prevalent in practice; • We identified the most common names across projects. The Top-3 recurrent names are: value; result; and name. Many single-letter names are also commonly used in projects (e.g., i, e, s, c). We also observed that the majority of common names are associated with integer or string values; • We perceived that programmers naming practices are context-specific. Single-letter names (Index and Shorten) seem to be more present in conditional or loops statements (IF, FOR, WHILE). In contrast, identifiers with the same name as her Types tend to appear in large- scope contexts (e.g., ATTRIBUTE); • We noted that, in general, the project’s characteristics might not impact the prevalence of one particular nam- ing category practice: there is no representative corre- lation between size, number of contributors, or number of commits and the predominance of some naming cat- egory practice; • We also noted that, in general, the project’s characteris- tics might not impact the prevalence of one particular naming category practice: there is no representative. • Finally, we observed that Diminutive is the most adopted naming category practice by survey respon- dents and Median is the least one. This result seems to align well with our observation about the prevalence of the naming practices in 80 open-source object-oriented programs. The remainder of this paper is organized as follows. The Section 2 presents the background and related work on nam- ing practices. Section 3 details how we carried out our study. Gresta et al. 2023 Table 2. C++ programs used in our experiment. Project LoC Contributors Commits Kings Median Ditto Cognome Diminutive Shorten Index Total Total % Total. % Total. % Total. % Total. % Total. % Total. % asio 196,656 53 3,034 135 3.65 27 0.73 1,664 44.99 32 0.87 657 17.76 220 5.95 964 26.06 3699 assimp 614,926 462 10,934 78 6.76 74 6.41 739 64.04 10 0.87 94 8.15 13 1.13 146 12.65 1,154 bitcoin 541,474 853 32,661 46 4.58 27 2.69 621 61.79 8 0.80 11 1.09 39 3.88 253 25.17 1,005 bluematter 812,822 2 5 3,972 29.20 1,350 9.92 1,893 13.91 1,560 11.47 506 3.72 685 5.03 3,639 26.75 13,605 calligra 1,602,456 263 101,573 47 3.41 2 0.15 743 53.92 137 9.94 267 19.38 14 1.02 168 12.19 1,378 chaste 587,473 25 5,384 2,954 40.46 882 12.08 673 9.22 667 9.14 470 6.44 14 0.19 1,641 22.48 7,301 citra 428,966 222 9,141 27 5.11 19 3.60 255 48.30 4 0.76 36 6.82 27 5.11 160 30.30 528 clickhouse 1,422,903 921 83,445 114 4.13 40 1.45 2,228 80.78 66 2.39 108 3.92 14 0.51 188 6.82 2,758 core 9,262,610 25 3,058 4,044 5.29 1,516 1.98 45,465 59.47 10,741 14.05 10,799 14.13 420 0.55 3,459 4.52 76,444 freecad 4,842,675 383 27,647 528 6.94 210 2.76 4,705 61.83 100 1.31 513 6.74 181 2.38 1,372 18.03 7,609 gacui 504,062 3 2,238 8 0.62 50 3.91 576 45.00 44 3.44 294 22.97 15 1.17 293 22.89 1,280 gecko-dev 28,303,180 4,910 785,724 1,116 4.57 1,548 6.34 11,737 48.11 2,567 10.52 4,805 19.69 311 1.27 2,314 9.48 24,398 godot 4,976,013 1,590 41,538 525 9.87 270 5.08 1,711 32.17 128 2.41 1,934 36.36 107 2.01 644 12.11 5,319 gromacs 1,680,900 74 20,825 89 5.03 104 5.88 994 56.16 38 2.15 250 14.12 54 3.05 241 13.62 1,770 grpc 717,441 708 50,493 76 3.40 49 2.19 799 35.75 68 3.04 842 37.67 44 1.97 357 15.97 2,235 kdenlive 205,469 94 15,645 4 0.43 0 0.00 671 72.93 66 7.17 36 3.91 34 3.70 109 11.85 920 kdevelop 338,648 245 42,650 52 4.70 3 0.27 723 65.37 61 5.52 93 8.41 10 0.90 164 14.83 1,106 krita 983,754 336 57,706 80 5.93 12 0.89 573 42.48 109 8.08 216 16.01 44 3.26 315 23.35 1,349 lammps 1,626,808 185 29,307 281 11.35 56 2.26 1,272 51.37 199 8.04 169 6.83 85 3.43 414 16.72 2,476 mediapipe 235,825 2 111 11 1.54 47 6.58 511 71.57 13 1.82 1 0.14 26 3.64 105 14.71 714 mlir 75,845 2,285 415,644 9 5.70 18 11.39 83 52.53 24 15.19 8 5.06 2 1.27 14 8.86 158 mongo 5,015,374 571 63,227 917 3.17 381 1.32 14,644 50.66 761 2.63 2,770 9.58 2,019 6.99 7,412 25.64 28,904 mysql-server 3,733,193 88 170,220 803 6.94 124 1.07 7,941 68.60 713 6.16 949 8.20 141 1.22 904 7.81 11,575 obs-studio 482,886 477 10,466 22 3.42 9 1.40 429 66.72 57 8.86 59 9.18 5 0.78 62 9.64 643 opencv 2,166,493 1,360 31,603 1,598 11.96 859 6.43 5,672 42.45 367 2.75 376 2.81 730 5.46 3,761 28.14 13,363 openoffice 6,894,647 21 7,657 3,977 5.82 1,703 2.49 39,683 58.06 9,796 14.33 9,453 13.83 335 0.49 3,397 4.97 68,344 percona-server 3,777,210 238 185,334 849 7.35 127 1.10 7,887 68.32 712 6.17 913 7.91 142 1.23 914 7.92 11,544 proxysql 121,989 90 4,680 7 1.38 12 2.37 219 43.20 10 1.97 46 9.07 37 7.30 176 34.71 507 pytorch 1,792,819 2,155 43,944 56 2.10 111 4.15 1,472 55.07 35 1.31 164 6.14 115 4.30 720 26.94 2,673 qtbase 2,714,097 783 55,238 185 4.51 89 2.17 2,403 58.54 258 6.29 229 5.58 132 3.22 809 19.71 4,105 rocksdb 497,140 628 10,766 41 1.66 52 2.10 1,494 60.36 21 0.85 34 1.37 59 2.38 774 31.27 2,475 server 1,967,124 300 195,145 22 1.59 2 0.14 874 63.01 40 2.88 172 12.40 33 2.38 244 17.59 1,387 tensorflow 3,284,592 3,068 125,560 778 5.67 747 5.45 8,108 59.13 235 1.71 279 2.03 499 3.64 3,067 22.37 13,713 terminal 360,717 313 2,855 159 3.69 49 1.14 2,640 61.20 118 2.74 311 7.21 124 2.87 913 21.16 4,314 vtk 3,690,369 352 81,218 500 7.78 216 3.36 2,167 33.74 147 2.29 1,137 17.70 503 7.83 1,753 27.29 6,423 winget-cli 305,116 317 539 64 2.56 62 2.48 1,252 50.00 65 2.60 111 4.43 312 12.46 638 25.48 2,504 xbmc 1,094,954 785 59,641 42 9.77 2 0.47 208 48.37 29 6.74 83 19.30 20 4.65 46 10.70 430 yarp 1,029,531 77 17,416 45 2.25 18 0.90 1,021 51.13 91 4.56 352 17.63 65 3.25 405 20.28 1,997 yuzu 488,099 203 20,860 30 19.61 7 4.58 76 49.67 0 0.00 6 3.92 3 1.96 31 20.26 153 zerotierone 137,784 58 5,409 34 2.05 64 3.85 975 58.70 12 0.72 62 3.73 56 3.37 458 27.57 1,661 Total 99,515,040 25,525 2,830,541 24,325 7.28 10,938 3.27 177,801 53.24 30,109 9.01 39,615 11.86 7,689 2.30 43,444 13.01 333,921 The Section 4 outlines the results of our empirical study and provides a general discussion. Section 5 describes the threats to the validity. Finally, Section 6 presents some concluding remarks. 2 Background and Related Work This section presents some background about names and re- lated studies on naming identifiers. We introduce this section by presenting an overview of the role of names in software development. 2.1 Naming Names identify classes, attributes, methods, variables, and parameters (Lawrie et al., 2006). They were originally de- signed to be pieces of code used to represent values in memory (Tofte and Talpin, 1997) and now they have be- come the primary source of information in software devel- opment (Lawrie et al., 2006; Ratiu and Deissenboeck, 2006): programmers rely on existing names in their code compre- hension journey (Takang et al., 1996). Indeed, high-quality names have a significant influence on the comprehension of source code (Avidan and Feitelson, 2017). Arnaoudova et al. (2016) have acknowledged the critical role that the source code lexicon plays in the psychological complexity of software systems and coined the contradictory expression “Linguistic Antipatterns” (LAs) to denote poor practices in the naming, documentation, and choice of identifiers that might hinder program understanding. They argue that poor practices might lead programmers to make wrong assump- tions and waste time understanding source code (Arnaoudova et al., 2016). Deissenboeck and Pizka (2006) characterized a name as being a fully spelled word or even an abbreviation. Names can also be composed of two or more words, might include words that do not exist, or even be single alphabetical charac- ters. However, the proper use of words in names is a signifi- cant issue in software development (Feitelson et al., 2020). In Martin’s book (Martin, 2008), Tim Ottinger drew a series of simple rules to guide programmers on naming identifiers. Ac- cording to Ottinger, programmers have to focus on creating intention-revealing names (the name by itself should be capa- ble of informing what it does). They also have to avoid using non-informative words (e.g., words with multiple meanings, words with little differentiation between themselves or num- ber series). Ottinger also advocates that names should be pro- nounceable and searchable. For instance, it is impractical to discuss any source code composed of words that program- mers cannot pronounce in a code review session. Coding style guides and conventions also aim to address the naming identifiers’ challenges (dos Santos and Gerosa, 2018). However, they are usually hard to enforce rules, as oth- ers discussed in Martin’s book (Clean Code) Martin (2008). Caprile and Tonella (2000) proposed an approach for improv- ing the meaningfulness of identifier names. The approach en- tails the following steps: (i) extracting identifier names; (ii) normalizing identifier names; and (iii) applying the changes Gresta et al. 2023 to the source code. The proposed rules for creating mean- ingful names aim to guarantee that each word composing a name must belong to a dictionary of standard words and be compliant with existing grammar. Deissenboeck and Pizka (2006) proposed a set of precise rules for constructing con- cise and consistent names. In the interest of preserving con- sistency, the authors advocate that a single name must repre- sent only one concept. The rules, therefore, ensure that one concept will not be taken into consideration in multiple iden- tifier names. In order to preserve conciseness, the rules en- sure that names chosen by programmers stand for the con- cepts they are indeed trying to convey. More recently, Feitelson et al. (2020) suggested a three- step method to help programmers to systematically come up with meaningful names. The model encompasses the follow- ing steps: (i) selecting the concepts to include in the name; (ii) choosing the words to represent each concept; and (iii) creat- ing a name from these words. The authors demonstrated that programmers could use the model to guide choosing names that are superior (in terms of meaningfulness) over randomly chosen names. 2.2 Names in Software Quality There have been many studies that examine how names affect comprehension and programmer’s efficiency. Avidan and Fei- telson (2017) conducted an experiment involving ten pro- grammers in hopes of understanding the impact of identi- fier names in program comprehension. They observed that, when changing identifiers names from fully spelled words to single-letter ones, the fully spelled version was perceived as more understandable. Hofmeister et al. (2017) also con- cluded that abbreviations and single-letter names decrease code comprehension and could indicate low-quality code as observed by Butler et al. (2010) and Kawamoto and Mizuno (2012). Butler et al. (2010) showed that source code contain- ing poor quality identifiers names were associated with Find- Bugs warnings. Kawamoto and Mizuno (2012) also observed that concise identifier names have a substantial effect on the fault-proneness in NetBeans. Takang et al. (1996), based on a survey conducted with 89 computer science students, concluded that the combina- tion between identifier names and comments in the code pro- vides a minor improvement in code comprehension. Hence, improving identifier names seem to be a better option than including comments in the code. Spending more time choos- ing meaningful identifier names can result in less work during software maintenance (Lawrie et al., 2007a). Low- quality names can affect code negatively by causing confu- sion and misinformation. The study conducted by Lawrie et al. (2007a) found that the quality of identifier names im- proves over time and is also related to the software license. Modern software systems contain more high-quality names, and proprietary ones include more abbreviations than open- source projects. Moreover, a study investigating the seman- tic nature of identifier names in four large-scale open-source projects showed that the number of commits and contribu- tors tended to influence the quality of names. Projects with a high number of commits and contributors tend to have more identifier names presenting a large text-corpora of existing words (Gresta and Cirilo, 2020). 3 Empirical Study Setup This section describes the empirical study design. We con- ducted an empirical study to characterize how C++ and Java programmers name attributes, parameters, and variables. Specifically, we analyzed 1,421,607 identifier names (i.e., attributes, parameters, and variables names) from 40 Java projects and categorized these names into eight naming prac- tice categories. Afterwards, we expanded our analysis by se- lecting a sample of 40 C++ projects. Upon analyzing this sample, we found 1,181,774 identifier names, which we then categorized according to the aforementioned eight naming practice categories. We used the results of categorizing iden- tifier names from these two samples to provide answers to the research questions discussed in the next subsection. 3.1 Goal and Research Questions We set out to probe into how common eight naming practices are “in the wild” (i.e., in real world software systems) – see Section 3.2. More specifically, our goal is to contribute to- wards a better understanding of their prevalence in attributes, parameters, and variables naming in Java. We believe a more insightful interpretation of the results of our study can be ob- tained from the standpoint of a researcher interested in help- ing programmers by defining naming guidelines for educa- tional material and code review. Our main goal is to provide answers to the following research questions (RQs): • RQ1: How prevalent are the eight naming practice categories? We set out to investigate whether identifier names in open-source projects can be categorized ac- cording to eight naming practices categories and how common these naming practices are across C++ and Java projects; • RQ2: Are there context-specific naming practices cat- egories? We set out to examine if specific naming prac- tice categories tend to occur more often in certain con- texts (e.g., ATTRIBUTE, PARAMETER, METHOD, IF, FOR, WHILE, SWITCH); • RQ3: Do the naming practice categories carry over across different C++ and Java projects? We attempt to explore the prevalence of the categories spanning mul- tiple C++ and Java projects and identify any correla- tion between software metrics and programmer’s nam- ing practices; • RQ4: What is the perception of software developers about the investigated naming categories? We set out to probe into programmers’ perceptions regarding the use and occurrence of the eight investigated naming practices. 3.2 Naming Practice Categories The categories presented in this subsection are a compilation of programmers’ practices reported in several studies (Ar- Gresta et al. 2023 naoudova et al., 2016; Beniamini et al., 2017; Alsuhaibani et al., 2021) and books (Martin, 2008; DiLeo, 2019). Inspired by antipattern templates (Brown et al., 1998), in order to ex- plain the naming practice categories, we frame the discussion of each category in terms of the following elements: category name, examples, motivation (why), consequences of the nam- ing practice, and recommendations. 3.2.1 Kings This category represents identifier names composed by num- bers at the end. Example: String name1 and String name2 or Integer arg1 and Integer arg2 represent ar- bitrary distinctions as number series. Why: programmers of- ten opt to employ names that fall into this category to dis- tinguish between identifiers that appear in the same scope. Consequences: names with numbers at the end, however, are not very informative and do not represent intentional nam- ing (Martin, 2008; DiLeo, 2019). Recommendation: usu- ally, identifiers represent different things; whenever that is the case, they should be named accordingly (Martin, 2008). 3.2.2 Median This category is a variation of the Kings category and comprises identifier names composed of numbers in the middle. Example: the names fastUInt64ToBuffer and base64Bytes contain numbers that might be representing 64 bits values. Why: numbers in the middle, in general, are used to denote the value stored in the attribute/variable or even to provide some distinction among similar identifier names. Consequences: names with numbers in the middle can potentially be harder to search for in the source code, hard to pronounce, and also can be very similar to other names that differ only in terms of the numbers that appear somewhere in the middle (Martin, 2008). Recommendations: program- mers should use numbers only when necessary and surround numbers with pronounceable words (Martin, 2008). 3.2.3 Ditto The category Ditto consists of identifier names spelled in the same way as their Types. Example: timeZone is spelled as its Type TimeZone in the same way that the name object has the same name as its Type (Object). Why: naming iden- tifiers according to the respective type is an easy option to avoid mental mapping (which usually are associated with the problem domain concepts). Consequences: this naming practice might result in names that are harder to map to their purposes when used in larger scopes, and tend to cause mis- information when the type name changes but the identifier names do not (Martin, 2008; Alsuhaibani et al., 2021). Rec- ommendations: avoid using Ditto based names in very large scopes and/or in contexts in which other names can conflict with them (Martin, 2008). 3.2.4 Diminutive This category encompasses identifier names that are a chunk of their respective Type name. Example: listener is an example of a name in this category when its associated Type is named EngineTestListener. The name NFRuleSet ruleSet is also considered as a chunk of its Type. Why: developers usually rely on short names to avoid overloading the reader with many concepts. Consequences: when used in large-scope contexts, names that fall into this category might impair code comprehension (Martin, 2008). Recom- mendations: programmers should use names that properly convey the identifier’s purpose within the local context and scope (Martin, 2008). 3.2.5 Cognome Identifier names in this category contain as an additional suf- fix or prefix the name of the respective Type. Example: an identifier nameString includes in its name the the respec- tive Type name (String). Why: usually programmers resort to adding suffixes in names to help them remember the Types. Consequences: encoding Type into names might place an ex- traneous cognitive load on the programmer Martin (2008); DiLeo (2019). Recommendations: give identifiers names that are meaningful without having to resort to adding its Type information to the names Martin (2008). 3.2.6 Index and Shorten These categories represent similar naming practices: nam- ing an identifier with a single-letter word. The Index cate- gory represents names with one arbitrary letter. Names in the Shorten category are the starting letters that correspond to their respective Types. Example: the names Integer i and Integer j falls into the Index category and Person p and String s are examples of Shorten names. Why: single- letter names are traditionally used to identify counters in loops. Consequences: single-letter names usually are not easy to locate in the source code (unsearchable) and, when employed in large scopes, can be hard to be understood (Mar- tin, 2008; DiLeo, 2019; Beniamini et al., 2017). Recom- mendations: use single-letter names only in local and small scopes; otherwise, intent-revealing names are better (Martin, 2008). 3.2.7 Famed This category includes very common names; that is, when naming become arbitrary and programmers need to come up convenient defaults. Famed names appear in almost every source code, potentially, in similar contexts, such as in loop statements (e.g., FOR). Example: the word i is a recurrent identifier name used in loops to denote counters. Why: very popular identifiers are part of the programmer mindset and can be quickly remembered and understood. Implications: when used in an indiscriminate fashion, they may cause mis- information Martin (2008); Alsuhaibani et al. (2021). Rec- ommendations: use intent-revealing names even in short- scope contexts Martin (2008); Alsuhaibani et al. (2021). 3.3 Data Extraction and Analysis Projects selection Our sample comprises 40 open-source Java projects and 40 C++ projects hosted on GitHub. These Gresta et al. 2023 projects are listed in Tables 1 and 2. We included widely used projects, most of which have been under development for at least five years (e.g., fastjson, jenkins, junit4, mockito, retrofit, spring-boot, tomcat, pytorch, and tensorflow). Also, some projects were taken into ac- count because they appear in a curated list of “awesome” projects.4 Table 1 and 2 give an overview of the examined projects. As shown in these tables, our Java and C++ sam- ples cover somewhat small codebases (with less than 10K LoC) and large-scale ones (with over 100K LoC). Overall, we selected heterogeneous Java and C++ projects from a broad range of domains: e.g., software testing, game de- sign, web applications development, image manipulation, and natural language processing. The selected projects also have a reasonable number of attributes, parameters, and vari- ables names and were developed collaboratively by a di- verse group of programmers. Therefore, we consider that we have selected a somewhat representative set of Java and C++ projects. The Java projects were collected in July 2021 from GitHub by cloning and storing their respective repositories. In a sim- ilar fashion, we extracted the information from the selected C++ projects in January 2022. After storing the repositories, we extracted three common software metrics: (i) the total lines of code (we excluded non-functional code such as com- ments and white-spaces); (ii) the number of commits; and (iii) the number of contributors. To answer RQ3, we corre- lated these metrics with the prevalence of the categories in projects. Names Extraction In order to extract identifier names from each project, we created a parser based on the SrcML tool Collard et al. (2013). SrcML is a multi-language pars- ing tool for the analysis and manipulation of source code. Sr- cML turns source code into a document-oriented XML for- mat (srcML5), which allows for queries using XPath. For example, the srcML format contains structural information (markup tags) about identifier declarations (), associated types (), and context (). We extracted 2,603,381 names from the 80 collected projects. After applying the naming categorization (see Sec- tion 3.2), we get a total of 753,811 identifier names dis- tributed across the categories (Kings, Median, Ditto, Diminu- tive, Cognome, Index, Shorten) as shown in Tables 1 and 2. The experimental package is available in Github 6. To investigate and get an overview of the elements in the Famed category, we used the entire dataset extracted from both programming languages. We examined the name of each extracted identifier and the associated Type to answer RQ1 and RQ3. Therefore, for each naming category practice we report the occurrences in the studied projects and across them. To answer RQ2 we analyzed the context where identi- fiers were declared. Survey Design and Sampling To answer RQ4 we de- signed an online questionnaire containing fifteen closed- 4java-lang.github.io/awesome-java 5srcml.org 6github.com/rng-lab/naming-practices-analysis ended questions related to naming practices. A brief descrip- tion (in Portuguese) and an example accompanied these ques- tions (see Appendix A). We also included two initial ques- tions to collect the demographic information of the respon- dents. The respondents had to point out their experience in software development as a single choice from four op- tions: under two, two to five, six to 10, or over ten years; and also their education level (undergraduate, graduate, post- graduate). We selected the web-based questionnaire to conduct our survey because it maximizes the number of possible respon- dents. The Google Forms7 was chosen to host the question- naire and enable data collection and pre-processing. The questionnaire was first trialed within the authors’ organiza- tions, with one of the authors registering possible observed issues. Some minor adjustments were made to ensure the con- sistency and clarity of the questions. Finally, the question- naire link was posted to multiple websites (e.g., forums) and online groups (e.g., discord, whatsapp). 4 Experimental Results In this section, we present the results of our empirical study around the RQs described in the previous sections. 4.1 RQ1: How prevalent are the naming prac- tice categories? To answer RQ1, we analyzed the categories Kings, Median, Ditto, Diminutive, Cognome, Shorten, and Index regarding how commonly they appear in the projects in our samples. Tables 1 and 2 list how common each of these categories are Table 3. The top 10 names in Ditto category Names Num. Num. Repetitions Projects Ditto in Java programs url 2,421 24 list 1,464 32 file 1,444 32 method 1,044 29 context 1,042 25 object 991 29 uri 968 25 node 844 21 type 593 30 date 526 25 Ditto in C++ programs T 1,227 34 string 1,134 18 uint8_t 564 15 args 247 22 t 231 20 std 143 19 type 141 19 handle 96 17 mode 45 16 7www.google.com/forms Gresta et al. 2023 across the 80 investigated projects. Considering the identi- fier names in the chosen Java projects, 20.79% are composed by numbers at the end (Kings), 7.65% have numbers in their middle (Median), 26.24% are spelled the same as their Types (Ditto), 7.50% contain the hole Types as a sub-part (Cog- nome), 4.51% have in their spelling a sub-part of their respec- tive Types (Diminutive), 3.35% are single-letter names com- posed of the first letter of their Types (Shorten), and 29.93% are arbitrary single-letter names (Index). As for the C++ projects in our sample, only approximately 7.28% of the iden- tifier names fall into the Kings category, 53.24% of the iden- tifiers are named according to their respective types (Ditto), around 9% follow the Cognome naming practice, 11.86% of the C++ identifier names are Diminutive, only 2.3% belong to the Shorten category, and approximately 13% of the C++ identifier names are single-letter names (Index). These results indicate that the use of single-letter names (Index) is a widespread naming practice adopted in object- oriented programming. Indeed, Beniamini et al. (2017) have observed that single-letter names account for 9–20% of names in Java programs. As stated by them, the most com- monly occurring single-letter name is i, and in some cases, j is also highly used. In addition, we observed that single-letter names representing contractions of their respective Type are not so common (Shorten), but are prevalent across projects (see Section 4.3). Programmers seem to be conscious about single-letter names implications (Hofmeister et al., 2017), and thus avoid choosing such naming practice: this category Table 4. The most common names (Famed) Names Num. Num. Common Num. Num. Repetitions Projects Type Occurrences Different Types Famed in Java programs value 16,940 40 String 3,345 598 result 12,975 39 int 1,924 887 name 11,374 40 String 10,208 116 i 11,172 39 int 9,794 139 e 10,225 40 Throwable 1,851 589 index 8,224 38 int 7,184 83 key 7,696 35 String 3,187 205 s 7,442 35 String 2,771 318 c 7,337 35 int 1,468 441 t 6,989 37 Throwable 1,210 336 a 6,970 34 float 739 575 b 6,511 38 int 983 486 type 6,162 40 Class 1,523 315 input 6,008 37 String 565 277 p 5,256 35 int 381 443 source 5,025 37 String 765 263 n 5,010 34 int 2,930 165 request 4,719 32 Request 1,489 212 context 4,437 37 Context 1,042 241 id 4,216 36 String 1,523 104 Famed in C++ programs i 5,421 40 int 2,362 151 value 3,912 40 double 427 268 x 3,856 36 double 858 250 result 3,771 40 T 448 231 index 3,106 38 int 869 88 n 3,027 37 int 729 159 ctx 2,964 22 OpKernelConstruction 622 105 name 2,545 37 string 950 187 type 2,534 40 int 306 426 b 2,370 39 bool 386 219 p 2,351 37 void* 190 412 size 2,285 39 size_t 619 119 context 2,279 34 OpKernelConstruction 501 133 s 2,254 35 Status 427 243 len 2,101 34 Uint32 463 47 node 2,093 30 Node 154 286 v 1,983 38 double 118 253 data 1,832 37 void* 441 211 val 1,821 35 int 192 199 c 1,776 38 char 246 199 Gresta et al. 2023 Figure 1. Naming practices distribution over Java programming statements 0 Statements Pe rc en ta ge Kings Median Ditto Diminutive Cognome Index Shorten Attr 30.84% 13.56% 29.01% 6.20% 9.01% 10.63% 0.76% For 17.49% 10.60% 13.15% 2.38% 6.32% 45.70% 4.36% If 7.98% 2.07% 13.28% 2.84% 6.31% 52.99% 14.53% Method 18.64% 3.46% 27.31% 5.43% 9.36% 32.78% 3.04% Param 19.53% 8.90% 29.10% 2.95% 4.81% 32.46% 2.24% Switch 12.16% 2.11% 14.32% 8.59% 4.62% 44.77% 13.42% While 9.51% 0.91% 13.43% 2.11% 5.98% 55.79% 12.27% Kings Median Ditto Diminutive Cognome Index Shorten represents only 3.35% (14,088) of the examined Java names and 2.3% ( 7,689) of the identifier names in C++ projects. Names that fall into the Ditto naming practice category make up the lion’s share of all identifier names in C++ (53.24%) projects and are the second most common naming practice in Java (26.24%) programs. Even though it might be argued that Ditto is a sound naming practice given that it leads to pronounceable names and many IDEs suggest names that include the identifier Type, in most cases, the practice does not lead to the creation of intention-revealing names. Table 3 lists the five most reoccurring names in such a cate- gory for Java and C++ projects. According to Table 3, the use of identifier names as list, object, args, unit8_t and t are common, but these names do not reveal intentions. When the context is not explicit or broad, programmers have to trace back what kinds of data are in an identifier named as list or t. These names are generic and hurt the reader’s understand- ing. Moreover, whether the Type name changes, then the iden- tifier names will be misleading as in cases such as string and type. According to Avidan and Feitelson (2017), the evil face of names is misleading names. The habit of choosing names that represent arbitrary se- quential distinctions also revealed a common practice among Java and C++ programmers (Kings). However, number- series is considered a bad practice in object-oriented pro- gramming when creating meaningful names. Number-series naming is a non-informative option, which might disturb code comprehension and maintainability. The use of num- bers in the middle of names, although prevailing in the stud- Figure 2. Naming practices distribution over C++ programming statements 0 Statements Pe rc en ta ge Kings Median Ditto Diminutive Cognome Index Shorten Attr 9.34% 9.16% 20.75% 24.27% 32.68% 3.41% 0.38% For 21.98% 6.39% 26.81% 4.96% 4.78% 32.42% 2.65% If 13.31% 3.75% 17.01% 6.42% 2.97% 44.80% 11.75% Method 22.64% 6.14% 25.17% 8.44% 5.18% 28.61% 3.81% Param 3.98% 1.68% 65.86% 10.54% 5.65% 10.32% 1.97% Switch 9.59% 1.26% 20.25% 5.33% 1.26% 51.74% 10.56% While 8.71% 3.42% 17.17% 7.49% 4.88% 48.90% 9.44% Kings Median Ditto Diminutive Cognome Index Shorten ied names, does not appear to be a recurrent naming practice. We observed that the most common numbers used in the mid- dle of names are: (i) 0, 1, 2, 3, 4, 5, and 6 – as well as meaning some distinction; and (ii) 8, 16, 32 and 64 – meaning identi- fiers which might be representing 8, 16, 32 or 64 bits values, respectively. The scenarios in which programmers choose names that are variants of their Type are also common. For example, names that contain sub-parts of their Type (Cognome) ac- count for 7.50% of the identifier names in Java projects and around 9% in C++ programs. Often, these identifier names represent prefix/suffix (noise words) conventions, such as: streetString; listPersons; floatArg. Noise words are redundant and should never appear in names. In general, streetString is not better than street. Short names are in general easier to comprehend and one of the first things a programmer can do to keep identifier names short is to avoid adding unnecessary information. In contrast, names that are part of their Type are not so common. These names are hard to search for and are not very meaningful in most contexts. 4.1.1 Very Common Names In Feitelson et al. (2020), the authors observed that the prob- ability of two programmers choosing the same name is low: the median probability was only 6.9%. At the same time, when a specific name is chosen, it is usually understood and often used by most programmers (Avidan and Feitel- son, 2017; Swidan et al., 2017). In fact, we observed that there are some frequently used names. The Top-3 most com- Gresta et al. 2023 mon names in Java programs are (see Table 4): (i) value (16,940 occurrences); (ii) result (12,975 occurrences); and (iii) name (11,374 occurrences). It might be expected that i is a widespread name (Beniamini et al., 2017), but many other single letter names are also commonly used across Java projects (e.g., e, s, c, t, a, b, p, n). Most of them are in the Top-10 most common names. Another interesting observation is index and key as part of the Top-10 most common names. Overall, some of the common identifier names in Table 4 are popular in programmer’s vocabulary: value, result, name, index, key, type, input, source, request, context, id. As for C++ programs, the three most common identi- fier names are (i) i (5,421 occurrences), (ii) value (3,912 occurrences), and (iii) x (3,856 occurrences). According to our results, many of the identifier names shown in Table 4 are widely common in programs written in Java and C++: value, result, name, index, type, context, i, b, n, p, and s. It turns out that value appears among the top three most used identifier names both in Java and C++. Java pro- grammers seem to have a slight preference for the names result and name in comparison to C++ programmers. As mentioned, some single-letter names are widely used by pro- grammers in both languages, being i the most commonly used single-letter name in Java and C++. Further analysis of the names in Table 4 and their corre- sponding most common Types led to interesting results about programmers’ rationale when programming in Java and C++. As noted by Beniamini et al. (2017), analyzing this link yields interesting results because it is possible to understand the meaning related to names frequently used by programmers, especially single-letter names. We can observe most identi- fier names are associated with int variables (e.g., result, i, index, c, b, p, n) or String Types (e.g., value, name, key, s, input, source, id). As shown in a survey conducted by Beniamini et al. (2017), single-letter names such as i and j are understood as counter variables (integer values) and most of the time used as loop control variables. There are other interesting findings. For example, in Java programs the single-letter name e, is usually correlated with error and exception (Beniamini et al., 2017). Our results show that e is mainly associated with the Throwable Type. In the same way, s is a single-letter name essentially associated with String (see Table 4). However, we also found some counter-intuitive results. For instance, contrary to our expec- tations, we observed that in programs written in Java the single-letter name b is not linked with boolean values (Be- niamini et al., 2017) but with integer values. Additionally, the identifier name t is mainly associated with Throwable; which is somewhat counter-intuitive because t is also often used to name and convey the idea of time-related constant val- ues and variables or variables that hold temporary values (Be- niamini et al., 2017). Other names that seem to have meaningful associations are the following: type, which is generally associated with the Class Type; context and request, which are often associ- ated with the Context and Request Types. Our results would seem to suggest that the underlying meaning of the identifier names vary a lot. For example, the name result was associated with 855 different Types. The name i, which intuitively is associated with index (int), also assumes other 139 different Types. Nevertheless, in most cases (9,794 out of 11,172), this name is associated with in- teger values. The name name seems to be usually associated with the String Type: 10,208 out of 11,374 occurrences are associated with String. 4.2 RQ2: Are there context-specific naming practices categories? To answer the RQ2, we investigated the predominance of the naming practice categories over particular contexts (ATTRIBUTE, PARAMETER, METHOD, FOR, WHILE, IF, and SWITCH). The results are present in Figure 1 and 2. We found that while some naming conventions (Allamanis et al., 2014) acknowledge the use of single-letter words (In- dex and Shorten) to name a local, temporary or loop variable, this practice is much more pervasive than any other. Except for naming attributes Java and C++, in which case Java pro- grammers prioritize the use of Ditto and Kings naming prac- tices while C++ programmers tend to use Cognome, Ditto, and Diminutive. Surprisingly, names with numbers at the end appear 30,655 times in our study as Java attributes and only 4,066 in class attributes in C++ projects. Especially in large- scope contexts, Kings names should always be avoided by programmers. In contrast, using Ditto names in such a case seems to be a reasonable choice. IDEs (e.g., Eclipse and Intel- lij IDEA) usually analyze the scope and generate suggestions from the current context and these suggestions often include information regarding the respective Type. Focusing on particular contexts, we might see that pro- grammer’s practices are context-specific. For example, the Table 5. Spearman correlation Category LoC Commits Commiters Java C++ Java C++ Java C++ Corr p-value Corr p-value Corr p-value Corr p-value Corr p-value Corr p-value Kings 0.337 0.038 0.391 0.014 0.150 0.365 0.199 0.222 0.053 0.748 0.090 0.583 Median 0.254 0.123 0.004 0.978 0.054 0.743 -0.197 0.226 -0.081 0.627 0.070 0.668 Ditto -0.517 0.001 -0.049 0.763 -0.216 0.191 0.074 0.649 0.101 0.545 -0.041 0.801 Diminutive -0.021 0.898 0.335 0.037 0.008 0.959 0.225 0.166 -0.171 0.304 -0.025 0.875 Cognome -0.227 0.169 0.268 0.098 -0.300 0.066 0.188 0.250 -0.178 0.283 -0.103 0.532 Index 0.341 0.036 -0.330 0.040 0.133 0.421 -0.311 0.054 -0.098 0.554 0.010 0.950 Shorten 0.387 0.016 -0.196 0.229 0.124 0.453 -0.110 0.501 -0.068 0.681 0.128 0.435 Gresta et al. 2023 use of practices that might result in meaningful names (e.g., Ditto) is more common in long-scope contexts (ATTRIBUTE and METHOD) than in short-scope ones (IF, FOR, WHILE, SWITCH). Especially in C++ projects, Ditto makes up for the lion’s share of the parameters names. Java and C++ program- mers seem to adopt less descriptive names in the context of switch and while statements. As shown in Figures 1 and 2, Index names appear more often inside contexts surrounded by if, for, switch, and while statements, where their oc- currence is widely and accepted (Kernighan and Pike, 1999; Beniamini et al., 2017). However, as observed by Avidan and Feitelson (2017), hiding the plural names using single-letter words may camouflage the meaning of the respective identi- fier. It might not be a natural interpretation that the identifier stores more than one object. The predominance of Kings and Index as parameter names do not agree with the findings of Avidan and Feitelson (2017). Their experiment indicated that parameter names contribute more to code comprehension than any other names (e.g., at- tributes or local variables). Since parameters are part of the method header and the starting point of the comprehension task, programmers pay special attention to parameter names in order to better understand the method behavior (Avidan and Feitelson, 2017). However, every naming practice cate- gory we studied are used to name parameters, although, as observed by Avidan and Feitelson (2017), parameter names are often more carefully chosen by programmers. 4.3 RQ3: Do the naming practice categories carry over across different Java and C++ projects? In hopes of answering the RQ3, we analyzed the preva- lence of naming practice spanning multiple projects. Tables 1 and 2 list the categories by projects. All selected projects turned out to have problematic names, which suggests that the investigated naming category practices are probably not uncommon. Even the most popular projects have nam- ing practices which might result in meaningfulness names (e.g., fastjson, jenkins, junit4, mockito, retrofit, spring-boot, tomcat, tensorflow, and pytorch). As highlighted in Tables 1 and 2, Ditto and Index are very common naming practices. Especially, these practices are dominant (representing more than 50% of analyzed iden- tifiers) in some projects. For example, Ditto names are widely used in Java and C++ programs, accounting for 85.08% in riptide (Java), 80.78% of the identifier names in clickhouse (C++), 78.26% in webmagic (Java), 72.93% of the names in kdenlive (C++), 68.60% in mysql-server (C++), 68.32% in percona-server (C++), 65.99% in keywhiz, and 54.46% in aeron. The problem with Ditto is that when the Type changes, the identifier name might lose its meaning (Scalabrino et al., 2017). Index names appear to be more common in Java programs. For instance, these iden- tifier names account for 58.93% of all identifiers in boofcv (Java) and 66.53% in rxjava (Java). It would seem that Index names are not very common in C++: proxysql which is the program in which Index names are most common, has around 34.7% of the identifier names following this naming prac- tice. rocksdb and citra also include a substantial amount of identifiers named according to the Index naming practice: 34.71% and 30.30%, respectively. In some isolated cases, some name practice seems to be dominant, as Kings in fastjson (49.88%) and libgdx (47.83%). On the other hand, the naming practices Cognome, Diminutive and Shorten are not dominant in any specific project. Specifically, Shorten seems to be a naming practice that most programmers try to avoid: programmers avoid nam- ing identifiers using the first letter of the Type. As mentioned, Shorten names usually are not easy to search for in the source code and, when employed in large-scope contexts, they tend to be hard to understand. To better comprehend whether the project’s characteristics may influence the prevalence of one practice, we looked at the correlation between common software metrics (e.g., lines of code, number of contributors, and number of commits) and the predominance of the naming practice categories. Table 5 summarizes the Spearman test results. The results show no representative correlation between the investigated project characteristics and the categories of naming practices. Overall, we can observe a low correlation between the num- ber of contributors and the prevalence of any category. One might surmise that an increase in the number of programmers might be beneficial towards removing bad naming practices. However, this does not seems to be the case. The same ratio- nale might be employed to the number of commits: whether the project evolves, the quality of the identifiers names might evolve or decay. Though, in contrast to Deissenboeck and Pizka (2006), which stated that identifiers names are subject to decay during software evolution, the results show that it might not seem to be the case. Especially observing LoC, we might observe some compelling correlations. For exam- ple, there is a negative correlation (rho -0.517) between size and the category Ditto (for Java programs). Therefore, names spelled in the same way as their respective Types tend to be way more common in small projects. On the other hand, large Java projects might tend to contain names involving practices such as Index (rho 0.341) and Shorten (rho 0.387). Figure 3. Respondents Demographics 5.8% 38.5% 32.7% 23.10% Less than 2 years Between 2 and 5 years Between 5 and 10 years More than 10 years (a) Respondents experience in software development 26.9% 44.2% 28.8% Undergraduate Graduate Graduand (b) Respondents education level Gresta et al. 2023 Figure 4. Naming practices distribution over programming statements 0 Frequency Pe rc en ta ge Kings Never Rarely Occasionally Often Kings 30.8% 48.1% 17.3% 3.8% 0.0% Median 73.1% 19.2% 7.7% 0.0% 0.0% Ditto 50.0% 9.6% 17.3% 15.4% 7.7% Diminutive 11.5% 11.5% 46.2% 21.2% 9.6% Cognome 36.5% 28.8% 19.2% 13.5% 1.9% Index 25.0% 21.2% 26.9% 17.3% 9.6% Shorten 40.4% 26.9% 19.2% 13.5% 0.0% Never Rarely Occasionally Often VeryOften As shown in Table 1, Ditto and Index are the most domi- nant practice across Java projects. Considering only the two categories, they account for 235,886 identifier names, repre- senting 56.17% of all analyzed names in Java projects. These results are consistent with the findings of Beniamini et al. (2017). Although code conventions and style guides may con- strain identifier naming practices, programmers seem to be heavily influenced by IDEs content assist capabilities. As pro- grammers work in the editor, content assist analyzes their code and recommended elements to complete partially en- tered statements. Therefore, it is indispensable to provide more sophisticated and context-aware capabilities to assist programmers in naming and renaming identifiers Jiang et al. (2019); Isobe and Tamada (2018); Peruma et al. (2018, 2019). Finally, programmers would seem to prioritize single-letters names in contexts where they are widely accepted (see Sec- tion 4.2). 4.4 RQ4: What is the perception of software developers about the investigated naming categories? This section presents the results of our survey with 52 pro- grammers. We start by characterizing the respondents (Sec- tion 4.4.1). Next, we assess the relevance of the naming prac- tice categories by how often they are used by programmers (Section 4.4.2). We then analyze how naming practice cate- gories adoption varies according to programming statements (Section 4.4.3). 4.4.1 Respondents’ Demographics Figure 3 depicts the respondents’ experience in software de- velopment and the corresponding frequencies and percent- ages. A total of 5.8% of the respondents have less than two years of experience, while 55.8% have more than five years of experience, suggesting that most survey respondents are experienced programmers. Moreover, we seem to have col- lected a reasonably balanced distribution of programmers in Figure 5. Naming practices distribution over programming statements Kings Median Ditto Diminutive Cognome Index Shorten 14 4 19 27 21 4 10 24 11 23 39 30 6 19 8 0 13 20 10 43 13 4 0 12 18 12 13 8 16 39 22 7 17 9 22 Attribute Method Loop Conditional None Gresta et al. 2023 terms of education level. Figure 3 shows the respondents’ ed- ucation level. As the majority of the respondents (73%) have a graduate degree, we claim that it increases our confidence in the validity of the responses. 4.4.2 Most Commonly Used Naming Practices The respondents were queried about how often they choose identifier names conforming to the naming practice cate- gories. A five-point Likert scale was used to capture respon- dent opinions ranging from “Never” to “Very Often”. Fig- ure 4 shows how frequently respondents have been using each naming practice category. In our sample, Diminutive is the most frequently used naming practice category (i.e., used “Often” or “Very Often”), followed by Index and Ditto. This result seems to align well with our observation about the prevalence of the naming practices in open-source object- oriented programming (see Section 4.1). Notably, from the survey, we can make the following ob- servations: • All the respondents adopt at least one naming practice category “Occasionally” or “Often”, with 26% (13) of the respondents claiming to adopt at least one naming practice “Very Often”. • Diminutive is the most adopted naming category prac- tice by respondent. However, as we could observe, this naming practice category is not so prevalent in the analyzed object-oriented projects (see Section 4.1) as claimed by the survey programmers. • Median is the least adopted naming practice category (see Figure 4), with just 26% (14) of the respondents us- ing it “Rarely” or “Occasionally”. The lower use of this naming practice corroborates our observation that pro- grammers seem to be conscious of this harmful practice in object-oriented programming. • Ditto is not a widespread naming practice among the sur- vey respondents. Only 12 out of 52 programmers (23%) indicated a tendency to write identifier names spelled in the same way as their Types; which do not ratify our previous observations about the prevalence of Ditto across Java and C++ projects (see Section 4.3). This contrasting result suggests that programmers might be not aware of their general use of naming practices. More- over, this might also be a sign that naming assistant fea- tures present in modern IDEs do not influence the re- spondents. 4.4.3 Most Commonly Used Naming Practices Accord- ing to Context In order to specify the location in which programmers mainly observe the occurrences of the naming practice categories, the respondents were allowed to select multiple locations (ATTRIBUTE, METHOD, LOOP, CONDITIONAL, and NONE). This is expected to be done by remembering instances of nam- ing practice categories encountered by respondents in their software development works. The two most common answers from the respondents were: ATTRIBUTE and METHOD (see Figure 5). These find- ings share similarities with those presented in Section 4.2, wherein 56% of the names occur as ATTRIBUTE or are de- clared in the context of METHOD. One notary exception is Index, in which case, 43 out of 52 respondents indicated that this naming practice occurs mainly inside contexts sur- rounded by LOOP statements (FOR or WHILE). Indeed, as ob- served by Beniamini et al. (2017), single-letter names can be used safely in a short-scope context. Finally, as expected, the majority of respondents (39 out of 52) indicated that they usu- ally do not observe Median in their day-life (see Figure 5). 5 Threats to Validity As with most empirical studies, our study also has some prac- tical limitations, i.e., it is also subject to some threats to its validity. In this section, we present potential threats and how we tried to mitigate some of those issues. Conclusion & External Validity One potential threat is that the samples we used in our study might not be represen- tative of the target population: our analysis took into account 40 open-source Java projects and 40 C++ projects. To miti- gate this threat concerning the conclusion and generalization of the study results, we tried to select a heterogeneous sample. We think the impact of this threat is minimal for three reasons: (i) Java and C++ are two popular programming languages;8 (ii) our sample covers somewhat small code-bases (with less than 10K LoC) and large-scale ones (with over 100K LoC), and (iii) we selected projects from a broad range of domains. Thus, we argue that our study can be seen as an initial step towards identifying trends Java and C++ programmers fol- low when picking identifier names. However, given the sizes of our samples, we cannot rule out the possibility that our results do not reflect how Java and C++ programmers name identifiers. That is, the results might not be generalizable be- yond the study samples and the participants that took part in our survey. To understand the prevalence of naming categories across Java and C++ projects, we employed a set of metrics: pro- gram size (LoC), number of commits, and number of con- tributors. Nevertheless, as with many software metrics, one potential threat is that these measurements might not be so- phisticated enough for our investigation. Thus, our findings might not carry over to other settings and similar program- ming languages. It is also worth emphasizing that context and scope would seem to play an important role in determining identifier names. For instance, some of the most common identifier names listed in Table 3 would seem to be context-dependent, e.g., node. We surmise that is the case because programmers might want to include relevant domain information when turning concepts into names. Although we tried our best to maximize the sample heterogeneity during sample selection, we cannot rule out the fact that the most common domains (e.g., XML file parsing) from which the programs in our sam- ple were extracted might have an impact on variable naming. Finally, the representativeness of the survey respondents cannot be guaranteed. Our target population was program- mers, but we did not take any measures to verify the identity 8www.tiobe.com/tiobe-index/ Gresta et al. 2023 of the respondents. However, we have included two initial questions, which might have permitted us to filter out indi- viduals not belonging to our target population. There might also exist some other factors that bias our conclusions. One example is the environment in which the respondents worked. Another one is whether or not respondents have a correct un- derstanding of each category. To mitigate the latter, we in- cluded in the questionnaire a brief description and an exam- ple of the categories. Future studies can ask respondents to consider this factor and evaluate how it impacts the naming practice category adoption. Construct & Internal Validity A threat to the construct validity of our study comes from the number of identifier names we analyzed in our study. It might be argued that a more significant amount of names may lead to better and more conclusive results. To mitigate this threat we analyzed 2,603,381 identifier names in highly diverse sets of Java and C++ projects. Additionally, another potential threat has to do with how well the naming practices we identified reflect ex- tant research and current industry practices. We tried to mit- igate this threat by drawing from previous research, which helped us to get a better understanding regarding whether or not some of the naming practices we identified are indeed recurring practices. We also conducted a survey with 52 par- ticipants in order to gather programmers’ perceptions about the use and occurrence of the investigated naming practices. We tried to minimize possible construct and internal va- lidity associated with the survey by disseminating it online through multiple websites and online groups; and introduc- ing a brief description and an example of each question. 6 Conclusion Coming up with proper identifier names is challeng- ing (Brooks, 1983). As stated by Host and Ostvold (2007), even though programmers have to name identifiers on a daily basis, it still entails a great deal of time and thought. To make matters more challenging, identifier names are pivotal for program comprehension: developers have to go over identi- fier names to comprehend the code that they need to update and poorly chosen names might hinder source code compre- hension (Avidan and Feitelson, 2017). Given that it has been estimated that identifiers contribute to about 70% of a soft- ware system’s codebase (Deissenboeck and Pizka, 2006), it cannot be disputed that there is a need to define what makes up a good identifier as well as to assist developers in nam- ing identifiers. Similarly, identifying practices that result in poor identifier names might enhance programmers’ aware- ness and contribute to improving educational materials and code review methods. As an initial foray into creating an ap- proach to optimal identifier naming (i.e., how to assign the proper words to an identifier), we investigated eight nam- ing practices categories “in the wild”. The categories pro- vide examples of naming practices from real-world software projects. We illustrated their possible consequences and also outlined their prevalence across projects and code contexts (i.e., ATTRIBUTE, PARAMETER, METHOD, FOR, WHILE, IF, and SWITCH). Our results based on 2,603,381 identifier names extracted from 80 real-world Java and C++ projects and on a survey, would seem to suggest the following: • The eight categories are recurrently found in practice, but two are more common in Java and C++ projects: naming identifiers with the same name as her Type (Ditto) and use single-letter names denoting counters (Index). Specifically, Index and Ditto are by far the most frequently occurring naming practices across Java projects: Index occurrences account for approximately 30% of all naming practice occurrences in the exam- ined Java projects, while Ditto occurrences amounted to roughly 27%. As for C++ programs, Ditto is the most widely used naming practice, which accounts for around 54% of all naming practice occurrences. Index and Diminutive are also popular among C++ coders, accounting for 13% and 11% of all naming practice occurrences. Shorten seems to be the least used nam- ing practice both by Java and C++ programmers. Ad- ditionally, programmers seem to be hardly influenced by IDE-like features that help them to choose identifier names, although only 12 out of 52 surveyed program- mers (23%) acknowledged a tendency to write identifier names spelled in the same way as their Types; • There are several very common names (e.g., value; result; and name) and recurrent single-letter names (e.g., i, e, s, c) used in practice. The lion’s share of these names are used to denote identifiers that store ei- ther integer or string values. According to our re- sults, single-letter identifiers are more commonly used by Java programmers: i, e, s, c, t, a, b, p, and n would seem to be widely used by programmers. In C++ (in contrast to Java), coders tend to prefer a smaller set of single-letter names: i, e, s, s, c, t, a, b, p, and n. Thus, differently from Java, in C++ e, c, t, and a do not rank among the most common single-letter identifier names; • The programmers naming practices are context-specific: single-letters names (Index and Shorten) seem to be more common in short-scope contexts (IF, FOR, WHILE), although they can also be found in large-scope contexts (e.g., ATTRIBUTE). Results from our survey question- naire showed that programmers acknowledge that the Index naming practice occurs mainly inside contexts sur- rounded; • Diminutive is the most adopted naming category prac- tice by survey respondents and Median is the least used naming practice. All the respondents adopt at least one naming practice category “Occasionally” or “Often”. • We could benefit from including poor naming practices in code reviews. The current practices follow extensive checklists, but no one addresses naming issues. A more nuanced take is to consider variable names that depart from commonly used naming practices as elements that can lead to a source of problems. We believe our results have the potential to inspire several future research directions. Our work highlights the need for further research on how naming practices are prevalent in source code and how better names can be chosen. In this Gresta et al. 2023 direction, an aspiring goal would be to devise tools capa- ble of automatically evaluating and suggesting renaming op- portunities during code review. Similarly, code generation tools can capitalize on commonly used naming practice to generate names automatically. Additionally, since our results would seem to suggest that some identifier names are context- dependent, we believe that tools (e.g., IDE-based identifier name recommendation system) can take advantage of con- text information during software development by constantly monitoring how programmers name identifiers so that it can help developers new to a given project through the automated recognition of context- and project-specific naming conven- tions. Therefore, this automated identifier naming assistant can support developers by identifying inappropriate naming choices and making recommendations. As a result, our long- term goal is to support the identification of opportunities to rename identifiers and understand more about programmers naming practices. Finally, as future work, we plan to perform a qualitative study on commits, code changes, and review dis- cussions. Another possible future research avenue would be to account for the role of human factors in choosing identi- fier names by exploring how programmer experience, team size, and mood influence naming practices throughout differ- ent software projects. Although our results give practitioners and researchers alike a good glimpse into the most common options for nam- ing identifiers in C++ and Java, we did not investigate how each naming practice contributes, if at all, to improving code comprehension. Therefore, future research efforts should aim to better understand how these commonly used naming prac- tices influence readability during code comprehension. References Allamanis, M., Barr, E. T., Bird, C., and Sutton, C. (2014). Learning natural coding conventions. In International Symposium on Foundations of Software Engineering. Alsuhaibani, R. S., Newman, C. D., Decker, M. J., Collard, M. L., and Maletic, J. I. (2021). On the naming of meth- ods: A survey of professional developers. In International Conference on Software Engineering. Arnaoudova, V., Di Penta, M., and Antoniol, G. (2016). Lin- guistic antipatterns: What they are and how developers per- ceive them. Empirical Software Engineering, 21(1):104– 158. Avidan, E. and Feitelson, D. G. (2017). Effects of variable names on comprehension: An empirical study. In 25th In- ternational Conference on Program Comprehension. Beniamini, G., Gingichashvili, S., Orbach, A. K., and Feitel- son, D. G. (2017). Meaningful identifier names: The case of single-letter variables. In International Conference on Program Comprehension, pages 45–54. Brooks, R. (1983). Towards a theory of the comprehen- sion of computer programs. International Journal of Man- Machine Studies, 18(6):543–554. Brown, W. H., Malveau, R. C., McCormick, H. W. S., and Mowbray, T. J. (1998). AntiPatterns: Refactoring Soft- ware, Architectures, and Projects in Crisis. John Wiley & Sons, Inc., USA, 1st edition. Butler, S., Wermelinger, M., Yu, Y., and Sharp, H. (2010). Exploring the influence of identifier names on code qual- ity: An empirical study. In 2010 14th European Confer- ence on Software Maintenance and Reengineering, pages 156–165. IEEE. Caprile, B. and Tonella, P. (2000). Restructuring program identifier names. In icsm, pages 97–107. Charitsis, C., Piech, C., and Mitchell, J. (2021). Assessing function names and quantifying the relationship between identifiers and their functionality to improve them. In Con- ference on Learning@ Scale. Collard, M. L., Decker, M. J., and Maletic, J. I. (2013). srcml: An infrastructure for the exploration, analysis, and manipu- lation of source code: A tool demonstration. In 2013 IEEE International Conference on Software Maintenance, pages 516–519. IEEE. Deissenboeck, F. and Pizka, M. (2006). Concise and consis- tent naming. Software Quality Journal, 14(3):261–282. DiLeo, C. (2019). Clean ruby. dos Santos, R. M. and Gerosa, M. A. (2018). Impacts of coding practices on readability. In Internation Conference on Program Comprehension. Fakhoury, S., Ma, Y., Arnaoudova, V., and Adesope, O. (2018). The effect of poor source code lexicon and read- ability on developers’ cognitive load. In International Con- ference on Program Comprehension. Feitelson, D., Mizrahi, A., Noy, N., Shabat, A. B., Eliyahu, O., and Sheffer, R. (2020). How developers choose names. IEEE Transactions on Software Engineering. Gresta, R. and Cirilo, E. (2020). Contextual similarity among identifier names: An empirical study. In Workshop de Visu- alização, Evolução e Manutenção de Software, pages 49– 56. SBC. Gresta, R., Durelli, V., and Cirilo, E. (2021). Naming Prac- tices in Java Projects: An Empirical Study. In XX Brazilian Symposium on Software Quality, pages 1–10. ACM. Hofmeister, J., Siegmund, J., and Holt, D. V. (2017). Shorter identifier names take longer to comprehend. In 2017 IEEE 24th International conference on software analysis, evolu- tion and reengineering (SANER), pages 217–227. IEEE. Host, E. W. and Ostvold, B. M. (2007). The programmer’s lexicon, volume i: The verbs. In International Working Conference on Source Code Analysis and Manipulation. Isobe, Y. and Tamada, H. (2018). Are identifier renaming methods secure? In International Conference on Software Engineering, Artificial Intelligence, Networking and Par- allel/Distributed Computing. Jiang, L., Liu, H., and Jiang, H. (2019). Machine learning based recommendation of method names: How far are we. In International Conference on Automated Software Engi- neering. Kawamoto, K. and Mizuno, O. (2012). Predicting fault-prone modules using the length of identifiers. In 2012 Fourth In- ternational Workshop on Empirical Software Engineering in Practice, pages 30–34. IEEE. Kernighan, B. W. and Pike, R. (1999). The Practice of Programming. Addison-Wesley Longman Publishing Co., Inc. Gresta et al. 2023 Lawrie, D., Feild, H., and Binkley, D. (2007a). Quantifying identifier quality: an analysis of trends. Empirical Soft- ware Engineering, 12(4):359–388. Lawrie, D., Morrell, C., and Feild, H. (2007b). Effective iden- tifier names for comprehension and memory. Innovations Syst Softw Eng, 3(1):303–318. Lawrie, D., Morrell, C., Feild, H., and Binkley, D. (2006). What’s in a name? a study of identifiers. In 14th IEEE International Conference on Program Comprehension. Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. I. (2004). An information retrieval approach to concept loca- tion in source code. In 11th working conference on reverse engineering, pages 214–223. IEEE. Martin, R. C. (2008). Clean code: A handbook of agile soft- ware craftsmanship. Nyamawe, A. S., Bakhti, K., and Sandiwarno, S. (2021). Identifying rename refactoring opportunities based on fea- ture requests. International Journal of Computers and Ap- plications, pages 1–9. Oliveira, D., Bruno, R., Madeiral, F., and Castor, F. (2020). Evaluating code readability and legibility: An examination of human-centric studies. In International Conference on Software Maintenance and Evolution. Peruma, A., Mkaouer, M. W., Decker, M. J., and Newman, C. D. (2018). An empirical investigation of how and why developers rename identifiers. In 2nd International Work- shop on Refactoring. Peruma, A., Mkaouer, M. W., Decker, M. J., and Newman, C. D. (2019). Contextualizing rename decisions using refactorings and commit messages. In International Work- ing Conference on Source Code Analysis and Manipula- tion. Ratiu, D. and Deissenboeck, F. (2006). Programs are knowl- edge bases. In 14th IEEE International Conference on Pro- gram Comprehension (ICPC’06), pages 79–83. IEEE. Scalabrino, S., Bavota, G., Vendome, C., Linares-Vásquez, M., Poshyvanyk, D., and Oliveto, R. (2017). Automati- cally assessing code understandability: How far are we? In International Conference on Automated Software Engi- neering. Schankin, A., Berger, A., Holt, D. V., Hofmeister, J. C., Riedel, T., and Beigl, M. (2018). Descriptive compound identifier names improve source code comprehension. In International Conference on Program Comprehension. Swidan, A., Serebrenik, A., and Hermans, F. (2017). How do scratch programmers name variables and procedures? In International Working Conference on Source Code Analy- sis and Manipulation (SCAM), pages 51–60. Takang, A. A., Grubb, P. A., and Macredie, R. D. (1996). The effects of comments and identifier names on program comprehensibility: an experimental investigation. J. Prog. Lang., 4(3):143–167. Tofte, M. and Talpin, J.-P. (1997). Region-based memory management. Information and computation, 132(2):109– 176. Wainakh, Y., Rauf, M., and Pradel, M. (2021). Idbench: Evaluating semantic representations of identifier names in source code. In International Conference on Software En- gineering. Gresta et al. 2023 Appendix A Survey Questionnaire Education level ◦ Undergraduate ◦ Graduate ◦ Graduand Experience in software development ◦ Under two years ◦ Two to five years ◦ Six to ten years ◦ Over ten years 1. How often do you choose identifier names with numbers at the end? Examples: People people1; People people2 ◦ Never ◦ Rarely ◦ Occasionally ◦ Often ◦ Very often Where do you usually see identifier names with numbers at the end? ▭ Attributes ▭ Methods ▭ Loops ▭ Conditionals ▭ None 2. How often do you choose identifier names with numbers in the middle? Example: Char int2char ◦ Never ◦ Rarely ◦ Occasionally ◦ Often ◦ Very often Where do you usually see identifier names with numbers in the middle?? ▭ Attributes ▭ Methods ▭ Loops ▭ Conditionals ▭ None 3. How often do you name identifiers after their Type names? Examples: String string, People people ◦ Never ◦ Rarely ◦ Occasionally ◦ Often ◦ Very often Where do you usually see identifier names spelled in the same way as their Types? ▭ Attributes ▭ Methods ▭ Loops ▭ Conditionals ▭ None 4. How often do you name identifiers as chunk of their respective Type name? Examples: EngineExecutionTestListener listener ◦ Never ◦ Rarely ◦ Occasionally ◦ Often ◦ Very often Where do you usually see identifier names as chunk of their respective Type name? ▭ Attributes ▭ Methods ▭ Loops ▭ Conditionals ▭ None 5. How often do you includes in identifier names an additional suffix or prefix that is the name of the respective Type? Examples: String nameString ◦ Never ◦ Rarely ◦ Occasionally ◦ Often ◦ Very often Where do you usually see identifier names containing an additional suffix or prefix that is the name of the respective Type? ▭ Attributes ▭ Methods ▭ Loops ▭ Conditionals ▭ None 6. How often do you choose single-letter identifier names? Examples: Integer j ◦ Never ◦ Rarely ◦ Occasionally ◦ Often ◦ Very often Where do you usually see single-letter identifier names? ▭ Attributes ▭ Methods ▭ Loops ▭ Conditionals ▭ None 7. How often do you name identifiers with the starting letters that correspond to their respective Types? Examples: People p ◦ Never ◦ Rarely ◦ Occasionally ◦ Often ◦ Very often Where do you usually see names which are the starting letters that correspond to their respective Types? ▭ Attributes ▭ Methods ▭ Loops ▭ Conditionals ▭ None Introduction Background and Related Work Naming Names in Software Quality Empirical Study Setup Goal and Research Questions Naming Practice Categories Kings Median Ditto Diminutive Cognome Index and Shorten Famed Data Extraction and Analysis Experimental Results RQ1: How prevalent are the naming practice categories? Very Common Names RQ2: Are there context-specific naming practices categories? RQ3: Do the naming practice categories carry over across different Java and C++ projects? RQ4: What is the perception of software developers about the investigated naming categories? Respondents' Demographics Most Commonly Used Naming Practices Most Commonly Used Naming Practices According to Context Threats to Validity Conclusion Survey Questionnaire