Journal of Software Engineering Research and Development, 2022, 10:9, doi: 10.5753/jserd.2022.1897  This work is licensed under a Creative Commons Attribution 4.0 International License.. Assessing the Credibility of Grey Literature: A Study with Brazilian Software Engineering Researchers Fernando Kamei  [ UFPE, IFAL | fernando.kenji@ifal.edu.br ] Igor Wiese  [ UTFPR | igor@utfpr.edu.br ] Gustavo Pinto  [ Zup Innovation & UFPA | gustavo.pinto@zup.com.br ] Waldemar Ferreira  [ UNICAP | waldemar.neto@unicap.br ] Márcio Ribeiro  [ UFAL | marcio@ic.ufal.br ] Renata Souza  [ UFPE | rmcrs@cin.ufpe.br ] Sérgio Soares  [ UFPE | scbs@cin.ufpe.br ] In recent years, the use and investigations about Grey Literature (GL) increased, in particular, in Software Engi- neering (SE) research. However, its understanding is still scarce and sometimes controversial, such as interpreting GL types and assessing their credibility. This study aimed to understand the credibility aspects that SE researchers consider in assessing GL and its types. To achieve this goal, we surveyed 53 SE researchers (who answered that they have used GL in our previous investigation), receiving a total of 34 valid responses. Our main findings show that: 1) GL source produced or cited by a renowned source is the main credibility criteria used to assess GL, 2) most of the GL types tend to have a Low to Moderate level of Control and Expertise, 3) there is a positive statistical correlation between the level of Control and Expertise for most GL types, and 4) the different respondent profiles shared similar opinions about the credibility criteria. Our investigation contributes to helping future SE researchers that intend to use GL with more credibility. Additionally, shows the need for future studies to better understand the GL types in SE research. Keywords: Grey Literature, Credibility, Empirical Software Engineering, Evidence-Based Software Engineering. 1 Introduction Grey Literature (GL) refers to a kind of publication that does not go through a peer-reviewed process before its publica- tion (Petticrew and Roberts, 2006). Some areas of knowledge have used and investigated GL. For instance, in Management, Adams et al. (2016b) investigated how GL could be used with relevance for management and organization studies. In Science of Information (Schöpfel and Prost, 2020), there is an investigation about the term and concept of GL in scien- tific papers. In Software Engineering (SE), many researchers interpret GL as any material that was not formally peer-reviewed and published (Garousi et al., 2019). In the last years, SE re- searchers increased their interest in investigating GL, mo- tivated by the growth of social media and communication channels that SE practitioners use to communicate, exchange problems and ideas (Storey et al., 2017), including, for in- stance, code hosting websites such as GitHub (Coelho et al., 2020) and communication platforms such as Slack (Stray and Moe, 2020). In SE, several studies investigated and recognized the im- portance and usefulness of GL. For instance, Garousi et al. (2016) explored the benefits of GL for Multivocal Literature Reviews, showing what the secondary studies gained when considered GL and what was missed when it was not consid- ered. Other studies (Williams and Rainer, 2017; Rainer and Williams, 2018) investigated the benefits and challenges of using blog content for SE research, and how to improve its use by selecting GL content with more credibility. Despite the increase in investigations in this field, there are some mis- understandings about GL and its diverse types (Tom et al., 2013; Kamei et al., 2021), and how the set of credibility crite- ria investigated in previous studies (e.g.,Williamsand Rainer (2017)) could be used and interpreted to the diverse types of GL (Kamei et al., 2021). According to Adams et al. (2016a), the different types of GL could be classified in terms of the “shades” of grey, which groups GL according to two dimensions: Control and Expertise. Garousi et al. (2019) explained these dimensions as follows: Control is the extent to which content is produced, moderated, or edited in conformance with explicit and trans- parent knowledge creation criteria. On the other hand, Exper- tise is the extent to which we can determine the producer’s authority and knowledge. In this paper, we begin by studying the different percep- tions of SE researchers about GL. We then focused on study- ing how GL could be assessed considering its different types. For each study, we surveyed Brazilian SE researchers. In the first survey — which was published previously (Kamei et al., 2020) — we investigated how Brazilian SE researchers use GL, focusing on understanding which criteria they employed to assess its credibility as well as the benefits and challenges they perceived. In the second survey (the novel contribution of this paper), we focused on how Brazilian SE researchers that previously used GL perceived the criteria to assess the different GL types according to Control and Expertise. In the following, we list our main findings (S1 means Sur- vey 1, while S2 means otherwise): S1 We identified the main GL sources used by the Brazilian SE researchers; S1 We identified several motivations to use (or to avoid) GL; https://orcid.org/0000-0002-5572-2049 mailto:fernando.kenji@ifal.edu.br https://orcid.org/0000-0001-9943-5570 mailto:igor@utfpr.edu.br https://orcid.org/0000-0001-7598-2799 mailto:gustavo.pinto@zup.com.br https://orcid.org/0000-0003-4548-7601 mailto:waldemar.neto@unicap.br https://orcid.org/0000-0002-4293-4261 mailto:marcio@ic.ufal.br https://orcid.org/0000-0002-2849-1273 mailto:rmcrs@cin.ufpe.br https://orcid.org/0000-0002-4428-2535 mailto:scbs@cin.ufpe.br Submitted to JSERD Kamei et al. 2022 S1, S2 We identified that the main criteria employed by Brazil- ian SE researchers to assess GL credibility are: GL source be provided by renowned authors, institutions, companies, or cited by a renowned source; S2 GL is not widely used as a reference in scientific studies; S2 We identified different interpretations to assess GL types, showing the importance to consider each type in particular; S2 We identified for most of the GL types a strong to very strong positive correlations (p-value <= 0.05%) between the perceptions of the level of Control and Ex- pertise; S2 We did not find a significant correlation (p-value <= 0.05%) between the perceptions of Control and Exper- tise to GL types when considering the respondent’s pro- file; S2 We perceived misunderstandings about whether a source type is considered a GL type or not, mainly re- lated to the most classified sources as High Control and High Expertise. This paper is structured as follows: Section 2 presents the core concepts of this work. Section 3 shows the research questions explored with their rationales. Section 4 exposes the methods employed to conduct, analyze and synthesize the data collected. Section 5 summarizes the answers to the researcher questions (RQ1–RQ4) of the previous investiga- tion (Kamei et al., 2020). Section 6 provides the answers to the research questions (RQ5–RQ6) specifically for this in- vestigation. Section 7 presents the discussions about the find- ings, lessons learned, and the threats to the validity of this re- search. Section 8 provides the description and comparison of the related works. Finally, Section 9 exposes the conclusions and future works. 2 Background Grey Literature (GL) has many definitions. However, the most known is called as Luxembourg definition (Garousi et al., 2019), approved at the Third International Conference on Grey Literature in 1997, that stated: “[GL] is produced on all levels of government, academics, business, and industry in print and electronic formats, but which is not controlled by commercial publishers, i.e., where publishing is not the pri- mary activity of the producing body.” Focusing on Software Engineering (SE) research, recently, Garousi et al. (2019) proposed the following definition: “Grey literature can be defined as any material about SE that is not formally peer- reviewed nor formally published.” Considering those definitions, they showed a wide concept of what would be considered a GL, showing that it can be produced in different ways. However, it may lead to a mis- understanding. For this reason, Adams et al. (2016a) intro- duced some terms to distinguish the different concepts about grey, including grey literature, grey data, and grey informa- tion. The term “grey data” describes user-generated web con- tent (e.g., tweets, blogs, videos). The term “grey information” is informally published or not published (e.g., meeting notes, emails, personal memories). However, SE literature hardly distinguishes these terms. Similarly, we considered all forms of grey data and grey information as GL in our work. Beyond the GL types, Adams et al. (2016b) classified GL according to “shades of grey”. In SE, Garousi et al. (2019) adapted these shades according to three tiers, as shown in Figure 1. In this figure, on the top of the pyramid is the “tra- ditional literature” with scientific articles from conferences and journals. On the rest of the pyramid are what we called as three tiers of GL. These tiers are running according to two di- mensions: Control and Expertise. The first dimension runs between extremes “low” and “higher” and the second runs between extremes “unknown” and “known”. The darker the color, the less moderated or edited the source in conformance with explicit and transparent knowledge creation criteria. Figure 1. The “shades” of grey literature, adapted of Garousi et al. (2019). Recently, GL was used and investigated in SE research for many purposes. For instance, primary studies explored the GL available on several social media sources used by SE practitioners. For instance, Rainer and Williams (2018) assessed the importance of blog posts to SE research, and Oliveira Oliveira et al. (2021) investigated several Java projects from GitHub to evaluate the developers’ skills based on the source code activities. ThepresenceofGLinsecondarystudieswasnotableinthe investigations conducted by Zhang et al. (2020) and Kamei et al. (2021) and by the increase in studies based on Grey Literature Reviews (GLR) (e.g., Raulamo-Jurvanen et al. (2017) and Soldani et al. (2018)) and Multivocal Litera- ture Reviews (MLR) (e.g., Garousi et al. (2017) and Saltan (2019)). Explaining these types of study, a GLR is a sec- ondary study that explores the evidence, looking at only GL sources, and a Multivocal Literature is also a secondary study that searches for GL and traditional literature. Even with this increase in interest in GL, its use is recent in the SE research (Zhang et al., 2020; Kamei et al., 2021). and there are some gaps and different findings of GL in SE re- search. For instance, Kamei et al. (2021) identified that there is a lack of understanding of what is considered a GL type, and previous studies provide different criteria to assess GL credibility (Kamei et al., 2020; Williams and Rainer, 2019). 3 Research Questions In this section, we stated our research questions and the ra- tionale for their purposes. Submitted to JSERD Kamei et al. 2022 RQ1: Why do Brazilian SE researchers use grey litera- ture? Rationale: Recently, SE practitioners have relied on social media and communication channels to share and acquire knowledge (Storey et al., 2017). On the one hand, some re- searchers try to take advantage of its use in SE research. For instance, Rainer and Williams (2018) explored the benefits and challenges of blog articles as evidence in SE research. On the other hand, some concerns (e.g., lack of detail and lack of empirical methods) related to GL could make SE researchers skeptical about their credibility (Rainer and Williams, 2019). In this broad question, we intend (i) to understand if Brazilian SE researchers are using GL and, if so, (ii) what motivates them to use, or if not, (iii) the reasons that lead to not using GL. RQ2: What types of grey literature are used by Brazil- ian SE researchers? Rationale: According to Adams et al. (2016a), GL has many forms, from traditional mediums such as question & answer websites and blogs to more dynamic mediums such as Tele- gram and Slack. For this reason, Bonato (2018) emphasized the importance of exploring the GL definition and its types for each research area. There is a lack of understanding of GL types, precisely what the Brazilian SE researchers used. This research question sought to investigate what Brazilian SE researchers often use GL sources. A better understanding of the GL types could guide future research in this area. RQ3: What are the criteria Brazilian SE researchers employ to assess grey literature credibility? Rationale: Software Engineering research uses GL sources, such as data provided by practitioners retrieved from several social media and communication channels. However, as GL is, by nature, a not peer-reviewed source, SE practitioners are free to share their thoughts using social media, for instance, without worrying about methodological concerns. Thus, it is essential to assess GL sources to ensure the selected GL is appropriate for the study. Answering this question will help us understand the credibility criteria that Brazilian SE researchers consider. RQ4: What benefits and challenges Brazilian SE re- searchers perceive when using grey literature? Rationale: According to Storey et al. (2014), the SE re- search community has increased its interest in GL since the widespread presence of SE professionals using social me- dia and communication channels. For instance, exploring the Stack Overflow, Zahedi et al. (2020) found some trends and challenges in continuous SE that researchers could better ex- plore. In this question, we are interested in understanding the (i) benefits and (ii) challenges that researchers may face when resorting to GL. Answering this question is essential to understanding the potential benefits and challenges of using GL more broadly by researchers. RQ5: How do SE researchers prioritize a set of criteria to assess grey literature credibility? Rationale: In our first investigation (Kamei et al., 2020), we provided a set of criteria used by Brazilian SE researchers to assess GL credibility. Previous literature (Williams and Rainer, 2019) also identified another set of criteria. In this question, we focused on understanding the importance of those criteria to assess GL credibility. RQ6: What is the perception of Brazilian SE re- searchers about the different types of Grey Literature according to the perspective of Control and Expertise? Rationale: Due to the diverse nature of the GL types, some studies suggested that GL needs to be assessed in different ways (Garousi et al., 2019). For this reason, Adams Adams et al. (2016b) classified its types according to the shades of grey. This classification is based on two dimensions: Control and Expertise. Control refers to the rigor with which a source is produced. Expertise is the extent to which the knowledge and producer authority can be determined. Nevertheless, this understanding and classification are still confused. This re- search question sought to understand how Brazilian SE re- searchers commonly perceived the GL types according to the (i) Control and (ii) Expertise. 4 Research Methods In this work, we followed (Linåker et al., 2015), aiming to use a survey methodology for data collection. This data was collected from a group of people sampled from a large popu- lation. We conducted two surveys. The first (Survey 1) aimed tounderstandtheBrazilianSEresearcher’sperceptionsabout GL. The second (Survey 2) investigated only the Brazilian re- searchers from the first survey who answered that they used GL. In the following sections, we detailed the procedures used to conduct Survey 1 with participants of a flagship confer- ence of SE in Brazil (Section 4.1). Then, we present the pro- cedures used for Survey 2 that focused on the researchers that have experience using GL (Section 4.2). Finally, we pro- vide the methods used for the analysis of both surveys (Sec- tion 4.3). 4.1 Survey 1: Initial investigation with the Brazilian SE researchers In Survey 1, we intended to gather a broad perception of GL used by Brazilian SE researchers, focusing on understand- ing the motivations to use (or avoid), the types of GL used, the benefits and challenges, and the criteria used to assess its credibility. Submitted to JSERD Kamei et al. 2022 4.1.1 Survey Design We conducted our survey with participants of the 10th Brazil- ian Conference on Software: Practice and Theory (CBSoft), the largest Brazilian software conference with many SE re- searchers’ participating. It includes well-established and spe- cialized satellite SE conferences in its domain. Our popula- tion comprehends SE researchers are potentially interested in using GL in their research. We chose our sample using non- probabilistic sampling by convenience (Baltes and Ralph, 2021). Before sending the final survey version, an experienced re- searcher (Ph.D. SE researcher with more than 15 years of ex- perience in research) reviewed our draft. We also conducted a pilot study by randomly selecting two participants and ex- plicitly asking for their feedback. We received feedback sug- gesting changing the order and re-writing some questions to make them more understandable to the target population. We obtained the contact of all the 252 participants, asking the conference’s general chair whether s/he could share this information with us, which s/he gently provided.1 We used two approaches to invite the researchers to an- swer our questionnaire. First, we placed posters on the event’s walls and tables with a brief description of the work and the link to the online survey. Second, we sent the actual survey to the 250 remaining participants of the event. In the invitation email, we briefly introduced ourselves, presented the research’s purposes, highlighted that the invite was to the participant of the CBSoft, and the link to the online survey. We also mentioned that the participant was free to withdraw at any moment, and all information stored was confidential. The survey was open for responses from September 26th to October 11th, 2019. We received a total of 76 valid an- swers (30.4% response rate). We did not consider the pilot survey answers. 4.1.2 Survey Respondents Among the survey respondents, 48.7% have a Ph.D., 31.6% have a Master’s, 2.6% are graduate specialization, 14.5% have a Bachelor’s degree, and 2.6% are undergraduates. Among them, 72.4% are men, and 27.6% are women. Ta- ble 1 presents the demographics’ information about the re- spondents and their experience using GL or not. This table shows that most respondents with Ph.D. and Master’s de- grees answered that they were using GL. 1In the period of this research, the Brazilian General Data Protection Law was not yet officially published. Table 1. Demographics information of the Survey 1 respondents. Gender Level of course Used GL Not used GL Woman Doctorate 5 5 Man Doctorate 24 3 Woman Master 4 2 Man Master 15 3 Woman Expert 1 1 Man Expert 0 0 Woman University graduate 0 2 Man University graduate 2 7 Woman Technical education 0 0 Man Technical education 0 0 Woman High school 1 0 Man High school 1 0 4.1.3 Survey Questions Our survey had 11 questions (three were required, nine of which were open). We used different questions flow for those who used GL (did not answer question 10) from those who did not (answered only questions 1 to 4 and questions 10 and 11). Table 2 presented the questions covered in this survey. 4.2 Survey 2: Investigating Brazilian SE re- searchers that use Grey Literature In this survey, we intended to do a follow-up survey to col- lect perceptions only from the Brazilian SE researchers from Survey 1, who answered that they have previously used GL. We focused on the perceptions of the different GL types con- cerning the dimensions of Control and Expertise. 4.2.1 Survey design Using a non-probability sample by convenience (Baltes and Ralph, 2021), we invited by email once again the 53 re- searchers that participated in our Survey 1 and mentioned the use of GL. We first drew our questionnaire and improved it through the conduction of three sequential steps: 1) A pilot study with five Ph.D. SE researchers; 2) Another SE researcher special- ist assessed the questionnaire; and 3) Received feedback of a participant relating a problem in the first hours after open- ing the survey. For this reason, we closed the survey to stop receiving answers. Then, we deleted all answers previously received and sent a new questionnaire version to the researchers. We opened the survey for answers from February 10th to March 4th, 2021. We received a total of 34 valid answers (64.1% re- sponse rate). We did not consider the pilot survey answers. 4.2.2 Survey Respondents In this survey, as we retrieved our sample from the previous one who answered that they had used GL, we did not ask the same questions (e.g., gender, academic degree). Instead, we collected information about their experience in SE research and using GL in scientific articles. Submitted to JSERD Kamei et al. 2022 Table 2. Questions covered in the Survey 1. # Question Type of question Options of answers (for closed questions) Required? RQ Q1 What is your e-mail? Open - No - Q2 What is your gender? Open - Yes - Q3 Please list the highest academic de- gree you have received. Closed High school, Technical education, University graduate, Expert, Mas- ter’s degree, Doctorate. Yes - Q4 Have you used grey literature? If you never used, go to question Q10. Closed Yes, No. Yes RQ1 Q5 What sources of grey literature did you use? Open - No RQ2 Q6 In which conditions do you use grey literature? Open - No RQ1 Q7 In which conditions do you do not use grey literature? Open - No RQ1 Q8 Could you list any benefits in using grey literature? Open - No RQ4 Q9 Could you list any challenges in us- ing grey literature? Open - No RQ4 Q10 If you answered ’no’ in question four, please state why did you never use or avoid use grey literature? Open - No RQ1 Q11 What would be a reliable source of grey literature for you? Open - No RQ3 The respondents’ profile of our survey was composed of 76.5% of professors or researchers and 23.5% of undergradu- ates. Regarding SE research experience, 55.9% of the respon- dents had more than ten years. Considering the experience us- ing GL, 47% had conducted between 2 and 5 scientific stud- ies using GL, although 26.5% were unable to answer. 4.2.3 Survey Questions Our second survey had ten questions (six were required, and four were open). Table 3 presents the questions covered in this survey. Before question 4, we produced and included a video2 to summarize and explain the “shades of GL” accord- ing to the level of Control and Expertise. 4.3 Data Analysis and Synthesis In both surveys, we employed a mixed-method approach based on both qualitative (Section 4.3.1) and quantitative (Section 4.3.2) methods to analyze data. We used a qualita- tive approach when we were interested in questions about “what” and “how” and a quantitative analysis using descrip- tive statistics to discuss frequency and distribution and cor- relation analysis between the dimensions of Control and Ex- pertise to each GL type. We describe these methods in the following. 4.3.1 Qualitative analysis We used a qualitative approach based on the thematic anal- ysis technique (Braun and Clarke, 2006). This process in- 2Video explaining the “shades of GL” (in Portuguese): https://youtu.be/hGMkVXIApR0 volved three SE researchers with previous qualitative re- search experience (one Ph.D. student (R1) and two Ph.D. pro- fessors (R2–R3)) for both surveys. We performed an agreement analysis with the codes and categories generated by each researcher using the Kappa statistic (Viera and Garrett, 2005) to Survey 1. The Kappa value was 0.749, indicating a Substantial Agreement level, according to the Kappa reference table (Viera and Garrett, 2005). For Survey 2, we do not calculate Kappa due to the analysis process that occurred with the researchers working together. Figure 2 presents a general overview of the process em- ployed. In the following, we detailed the procedure used to analyze all the answers (adapted from Pinto et al. (2019)) of both surveys, showing the differences employed in each sur- vey research: 1. Familiarizing with data: The process starts with two in- dependent researchers reading the answers of the survey respondents, as expressed in Figure 2-(a). 2. Initial coding: Then, for Survey 1, two independent re- searchers (R1 and R2) individually analyzed and added codes. For Survey 2, the researchers analyzed, dis- cussed, and coded together (R1 and R2, into a dotted box). We used a post-formed code, so we labeled por- tions of text that expressed the meaning of the excerpts without any previous pre-formed code. The initial codes are temporaries, since they still need refinement. We re- fined the emerged codes throughout all the analyses. An example of coding is present in Figure 2-(b). 3. From codes to categories: Here, we already had an ini- tial list of codes. For Survey 1, two researchers individ- ually conducted this process (R1 and R2). For Survey Submitted to JSERD Kamei et al. 2022 Table 3. Questions covered in the Survey 2. # Question Type of question Options of answers (for closed questions) Required? RQ Q1 What is your occupation? Closed Professor/Researcher, Student (M.Sc. or Ph.D.), Other (open). Yes - Q2 How many years of experience did you have conducting SE research? Closed Until 1 year, From 1 and 3years, From 4 to 6 years, From 7 to 9 years, 10 years or more. Yes - Q3 How many scientific studies have you conducted using GL as source of evidence? Closed I do not know, No one, Only one, From 2 and 5, From 6 and 10, More than 10. Yes - Q4 We are aware that the level of Con- trol varies from source to source. For this reason, we ask you to consider your experience more fre- quent in relation to each source type in relation to the Control dimension of the production. Closed Source types: {adapted from Maro et al. (2018); Level of Control: I did not consider it as a GL type, Low Control, Moderate Control, High Control, No opinion. Yes RQ6 Q5 Please, explain what did you con- sider to classify each source type with the Control criteria presented in Question 5. Open - No RQ6 Q6 We are aware that the level of Ex- pertise varies from source to source. For this reason, we ask you to consider your experience more fre- quent in relation to each source type in relation to the Expertise dimen- sion of the production. Closed Source types: {adapted from Maro et al. (2018); Level of Expertise: I did not consider it as a GL type, Low Expertise, Moderate Exper- tise, High Expertise, No opinion. Yes RQ6 Q7 Please, explain what did you con- sider to classify each source type with the Expertise criteria pre- sented in Question 7. Open - Yes RQ6 Q8 Considering a GL source with im- portant information to your re- search, would you include a GL source if it is produced by/with. Closed Choices for Expertise criteria: Be produced by a renowned author, Be produced by a renowned institution, Be produced by a renowned com- pany, Be cited by others renowned sources, Describe the methods of collection, Cites an academic ref- erence, Cites a practitioner source, Presents information with rigor, Presents empirical data; Choices for answers: No opinion, No, Yes. Yes RQ5 Q9 Could you cite any additional poten- tial aspect to assess the credibility of a GL source that was not men- tioned before? Open - No RQ6 Q10 We are planning to conduct a future research about Quality Assessment in Grey Literature. Please, could you inform your mail to future con- tact? Open - No - Submitted to JSERD Kamei et al. 2022 2, this process occurred with two researchers working together (R1 and R2). This process begins to look for similar codes in the data. We grouped the codes with similar characteristics in broader categories. Eventually, we also had to refine the categories identified, compar- ing and re-analyzing them in parallel, using an approach similar to axial coding (Spencer, 2009). Figure 2-(c) presents an example of this process. 4. Categories refinement: Here, we have a potential set of categories. For both surveys, in a consensus meeting be- tween R1 and R2 (Figure 2-(d)), the categories were evaluated and solved the disagreements of interpreta- tionforevidencethatsupportedorrefutedthecategories found. We also renamed or regrouped some categories to describe the excerpts better there. In cases where dis- agreements remained, we invited a third researcher (a Ph.D. professor) to review and solve them for both sur- veys. 4.3.2 Quantitative analysis We based our quantitative investigation on three samples: (i) We used the answers from 76 SE researchers to answer RQ1; (ii) We used the answers from 53 researchers that mentioned using GL to answer RQ2, RQ3, and RQ4; and (iii) We used the answers from 34 to answer RQ5 and RQ6. For the descriptive statistics, we highlighted that one an- swer of a respondent could be related to more than one category found. In the investigations related between the GL types and the dimensions of Control and Expertise, we present it into boxplots to show the differences of interpreta- tions of each GL type. We used Spearman’s rank correlation coefficient for the correlation analysis of the Control and Expertise perceptions for each GL type. Then, we transformed the answers related to the level of Control and Expertise (Low, Moderate, High) into non-linear scales: Low = 0, Moderate = 50, and High = 100. For the quantitative data analysis, we used R language and Python. This last, with the support of Google Colab3. 5 Previous Results In this section, we summarized the findings of our first study to present answers to RQ1–RQ4. To understand these re- search questions, consider reading the previous study (Kamei et al., 2020). To each RQ, we summarized the categories in tables with the total number of occurrences of a given category in the column “#”. Two critical observations are required: 1) The researchers may have reported more than one answer per question, which may happen to be grouped into different cat- egories; and 2) Some questions are not required. Thus, the overall results might not reach 100% of respondents. 3https://colab.research.google.com RQ1: Why do Brazilian SE researchers use grey literature? In our Survey 1, we identified 53 SE researchers using GL for research purposes. Focusing on understanding better why and how SE researchers are using GL or avoiding its use, we asked questions that included the motivations to use GL or reasons to avoid it. In the following, we present a summary of the (i) motivations to use GL and (ii) and the reasons to avoid or never use GL. (i) Motivations to use Table 4 presents the identified SE researchers’ motivations for using GL. In this table, the first column describes the mo- tivation identified, followed by the number of respondents related to the category and the percentage associated with the total of SE researchers that used GL (n=53). In the following, we briefly describe some motivations. Table 4. Motivations to use GL. Motivation # % To understand the problems 28 52.8% To complement research findings 12 22.6% To answer practical and technical questions 10 18.9% To prepare classes 4 7.5% To conduct government studies 1 1.9% To understand problems was the most cited motivation to use GL, where several researchers noted the use of GL for some reasons: to understand or investigate a new topic, or to search for something to solve problems, or to acquire specific information to deepen the knowledge. To complement research findings was the second most cited motivation, mentioned when the knowledge gained from the traditional literature is not enough for the investiga- tion. For instance, a researcher noted the use of GL to com- plement the findings of a Mapping Study. To answer practical and technical questions was the third most cited motivation, related to the necessity to understand the state of the practice in SE. Other motivations were mentioned but to less extent, such as To prepare class and To conduct government studies. (iii) Reasons to avoid/never use Even though several motivations to use GL were identified, 50.9% of SE researchers (27/53) avoid using GL as a ref- erence or to reinforce some claims in scientific studies. We also found some researchers that never used GL (23/76 oc- currences, 30.3%) to any research situations. We used this value to analyze the extent of each category about reasons to never use GL. Of the 23 respondents that never used GL, only 15 answered the reason. Table 5 presents the summary of the findings for this question. In the following, we briefly describe the reasons to avoid GL. Submitted to JSERD Kamei et al. 2022 Figure 2. Example of a coding process used to analyze the questionnaire answers. Table 5. Reasons to avoid/never use GL. Reason # % Lack of reliability 6 26% Lack of scientific value 3 13% Lack of opportunity to use 3 13% Lack of reliability was the main reason that SE researchers mentioned not to use GL. This is related to the lack of rigor in which GL sources are written and published, which affects its credibility. Lack of scientific value was another category mentioned, where the researchers were afraid that the use of GL would weaken a research paper when submitted to the peer-review process. Lack of opportunity to use was related to the nature of research previously conducted and because GL is recent in the context of SE. Summary of RQ1: Brazilian SE researchers use GL mo- tivated mainly to understand new topics, find information about practical and technical questions, and complement re- search findings. However, some researchers avoid GL, par- ticularly as references in scientific papers, due to its lack of reliability and scientific value. RQ2: What types of grey literature are used by Brazilian SE researchers? In this question, we explored the GL sources used by the 53 SE researchers that mentioned using GL. Table 6 listed these sources. In the following, we briefly present some of our find- ings. Q&A websites were the most common source mentioned, used to interact with other users, create content, post com- ments, and assess the content. Some examples of sources mentioning Q&A websites were Stack Overflow and Quora. Blog post was the second most common category found. Blogs from renowned practitioners and from companies that produce a diversity of material and content for SE and soft- ware development, in general, were mentioned. Technical reports were mentioned for SE researchers that used technical experience, reports, and surveys derived from industry and national and international research groups. Companies websites provided by Google, Facebook, and ThoughtWorks, containing information regarding their tech- nologies, methods, and practices, were mentioned as sources used. Some researchers said browsing these websites to find news to help decision-making about a specific technology. Table 6. GL sources used by SE researchers. Source # % Q&A websites 16 30.2% Blog posts 15 28.3% Technical reports 14 26.4% Companies websites 8 15% Preprints 5 9.4% Books/Book chapters 5 9.4% Software repositories 4 7.5% Videos 3 5.7% Magazine articles 3 5.7% News articles 2 3.8% Summary of RQ2: Brazilian SE researchers are using sev- eral GL sources. The most common are Q&A websites, blog Submitted to JSERD Kamei et al. 2022 posts, technical reports, and companies websites. RQ3: What are the criteria Brazilian SE re- searchers employ to assess grey literature cred- ibility? In this research question, we explored the answers into one open-ended question the criteria of how the SE researchers assess GL credibility. Table 7 summarized our findings. In the following, we briefly describe the criteria identified. Renowned authors were the criteria most cited, in which SE researchers considered the author’s experience and repu- tation concerning the topic. For instance, Martin Fowler was cited as a notorious software engineer with much knowledge. Renowned institutions were another crucial criteria, where SE researchers assess if renowned institutions or renowned research groups provided the GL content. Cited by others was a criterion mentioned to express those researchers that considered as a trusted source cited by others (studies or people). Renowned companies was a criterion identified that con- sider relevant when renowned software industries or portals produce the GL source. Table 7. Criteria to assess GL credibility. Criteria # % Renowned authors 15 28.3% Renowned institutions 14 26.4% Cited by a renowned source 8 15% Renowned companies 7 13.2% Summary of RQ3: Whoever produces GL’s content, whether made by a person, institution, or company since the producer is considered renowned, is a significant credibility criterion. RQ4: What benefits and challenges Brazilian SE researchers perceive when using grey liter- ature? In this research question, we explored the benefits and chal- lenges on the GL use mentioned by SE researchers. Table 8 summarizes the benefits and Table 9 the challenges. In the following, we briefly describe some of them. Table 8. Benefits of the use of GL. Benefit # % Easy to access and read 16 30.2% Provide a Practical Evidence 13 24.5% Knowledge acquisition 13 24.5% Updated information 6 11.3% Advance the state of the art/practice 5 9.4% Different results from scientific studies 3 5.7% Table 9. Challenges of the use of GL. Challenge # % Lack of reliability 34 64.2% Lack of scientific value 15 28.3% Difficult to search/find information 6 11.3% Non-structured information 6 11.3% (i) Benefits Easy to access and read was the most common benefit men- tioned, mainly because most GL sources are open access, are quickly recovered by free search engines, and the contents are usually easy to read. Empirical evidence was another essential benefit men- tioned, showing that GL provides evidence from the SE in- dustry to understand the state of the practice. Knowledge acquisition was mentioned as a benefit, as GL allows expanding knowledge with different information from what is usually obtained in traditional literature. Updated information was mentioned because the produc- tion of GL content happens fast compared with traditional literature, mainly related to technical content. Advance the state of the art/practice was mentioned due to the importance of GL to understand better the industry and to provide evidence to find relevant gaps in the practice. Different results from scientific studies was mentioned because some researchers considered GL essential to provide additional knowledge not yet available in the research area. (ii) Challenges Lack of reliability was the main challenge the researchers perceived, where some questioned the reliability of the data retrieved from GL. Lack of scientific value was the second category most cited. Some researchers mentioned that they did not feel com- fortable using GL as a reference in scientific works due to the research community’s lack of recognition of this source. Difficult to search/find information in GL sources was perceived as a challenge due to the diversity of sources. Each source has its structure and manner to provide access to the content, and it is not easy to replicate the study that used GL. Non-structured information was mentioned due to the lack of a writing pattern and a large variety of formats in which the GL sources are published, making it difficult to find information, for instance, using an automatic process. Summary of RQ4: We found several benefits, the most com- mon was that the GL’s content is easy to access and read, which is important to knowledge acquisition, mainly about providing practical evidence derived from SE practitioners. The most cited challenges were using GL in scientific re- search due to the lack of reliability and scientific value. 6 Results In this section, we present answers to RQ5 and RQ6, both research questions answered by the investigation of Survey Submitted to JSERD Kamei et al. 2022 Table 10. Prioritized criteria to assess GL credibility. Criteria # % Renowned authors 30 88.2% Renowned institutions 30 88.2% Cited by a renowned source 27 79.4% Cites academic sourcea 26 76.5% Present empirical dataa 26 76.5% Renowned companies 25 73.5% Cites practitioner sourcea 16 47.1% Rigor in presenting informationa 12 35.3% Describe the methods of collectiona 6 17.6% aProposed in Williams and Rainer (2019) 2. RQ5: How do SE researchers prioritize a set of criteria to assess grey literature credibility? In our second survey, we asked 53 researchers to prioritize the importance of a set of criteria to assess GL credibility. These criteria were derived from our first investigation and found in Williams and Rainer (2019) study. We received an- swers from 34 SE researchers. Table 10 presents the result of the ranking prioritization of credibility criteria, revealing that essential criteria perceived by SE researchers are: GL source be provided by Renowned authors, Renowned institutions, or Cited by a renowned source. We also investigated whether the SE researchers have any additional criteria to assess GL credibility not mentioned in the previous survey questions. By analyzing the answers, we did not find any new criterion that was not related to the cri- teria as earlier presented in Table 10. For instance, some re- searchers mentioned that the detailed description of the pub- lication context is an important criterion. For this case, we considered that it is already contemplated in Rigor in present- ing information criterion, previously mentioned by Williams and Rainer (2019). The author’s experience with the topic was another criterion mentioned. We considered this crite- rion related to the Renowned author’s criterion identified in our first survey. Summary of RQ5: We assessed the prioritization of credibil- ity criteria identified in our first investigation, in addition to those identified in previous studies. We found that the most used criteria by SE researchers are when the GL is produced by a renowned source, cited by a renowned authority, cites an academic source, and presents empirical data. RQ6: What is the perception of Brazilian SE re- searchers about the different types of Grey Lit- erature according to the perspective of Control and Expertise? Our last research question explored how the researchers per- ceived the different types of GL concerns to the dimensions of Control and Expertise. These dimensions are used to clas- sify the tiers of the “shades of GL.” Each dimension could be evaluated into three levels (Low, Moderate, High). Fig- ure 3 presents the results of classifications according to the level of Control, and Figure 4 shows the results of the level of Expertise. Even we are investigating different dimensions, interest- ingly, in some cases, the Figures 3 and 4 presented similar behaviors. For instance, for some GL types (e.g., blog posts, forums/list of discussions), the Low level was predominantly in both dimensions. We also found similarities concerning theother levelsforbothdimensions.Forinstance,sometypes (e.g., materials training, news articles, software repositories, and tutorials) run between Low (1st Quartile) to Moderate (2nd Quartile). Although, for a diversity of cases, the median behavior varied. We also found differences. For instance, considering the level of Control to cases/services descriptions and guide- lines, the classifications run between Low (1st Quartile) to Moderate (2nd Quartile). In contrast, for the level of Exper- tise to these GL types, we found outliers on the Low level (1st Quartile) and outliers on the High level (3rd Quartile). Other classifications caught our attention. For instance, re- garding the Control dimension, the opinions about the maga- zine articles are not equalized, as we identified some outliers in both extremes (Low and High). A similar classification we identified related to guidelines for the Expertise dimension. In addition to classifying the levels (Low, Moderate, and High) of the dimensions (Control and Expertise), we offered the possibility to the researcher to choose the options of “I did not consider it a GL type” or “I have no opinion.” We included these options because even previous studies (e.g., Maro et al. (2018)) presented the GL types for SE re- search; in our previous investigation (Kamei et al., 2021), we identified different interpretations, for instance, in which some types were not considered as GL. Table 11 shows the results of these classifications. Comparing the findings presented in Table 11 with the in- formation presented in Figures 3 and 4, we perceived that most of GL types classified with High Expertise and High Control were also, many times, considered as not a GL type (e.g., thesis, books/book chapters, and patents). Moreover, we identified that patents are still unknown to several re- searchers. Rationale to employ classification of each dimension (Control and Expertise) We asked why the researchers employed the classifications of each GL type according to the Control and Expertise. We identified four main reasons that are summarized in Table 12 and described in the following. Table 12. Reasons to classify GL types according to the level of Control and Expertise. Reasons # % Rigor 23 67.6% Producer reputation 14 41.2% Research experience 13 38.2% Peer interaction 5 14.7% Submitted to JSERD Kamei et al. 2022 Figure 3. Classification of each GL source type according to the level of Control. Each level of Control indicates: Low = 0; Moderate = 50; High = 100. Rigor (23/34 occurrences). Researchers considered the rigor (control) of each source’s production, for instance, the degree of formality present. In this regard, one researcher pointed out: “Technical reports, for instance, present sys- tematic studies with high control (of production).” This cat- egory was also related to the credibility dimension, as one researcher affirmed: “I consider that credibility is directly related to the rigor of the publication/availability of an arti- fact.” Producer reputation (14/34 occurrences). The pro- ducer’s reputation was considered an essential criterion to assess Control and Expertise, as one researcher pointed out: “The credibility relates to who is the author of the material and to the platform being conveyed. Another one mentioned: “Depending on the publisher, I can consider high (e.g., El- sevier) or low (e.g., autonomously published book) control. The same applies to news: the credibility of the source influ- ences the level of control regarding stricter editorial control in favor of the integrity of the information.” Researcher experience (13/34 occurrences). The own re- searchers’ experience was used to employ the classification. In this regard, one researcher pointed out: “I thought of the examples for each type that I have used and classified them according to my experience in dealing with each material.” Another one mentioned that: “I considered what I have read about grey literature.” Peer interaction (5/34 occurrences). Another criterion considered for assessing GL Control and Expertise was the users’ interactions in GL sources. In this regard, one re- searcher mentioned: “Another point is that if I have a lot of people interacting and building the content (such as Q&A websites), I consider that it has a certain control in the final knowledge presented there.” Another one pointed out: “In general, I consider the control to be higher when there is a peer review in some way, as in the case of theses and Stack Overflow.” Correlation analysis between the level of the dimensions (Control and Expertise) and each GL type We conducted our analysis using correlation statistics be- tween the two variables (Control and Expertise) to each GL type using the Spearman coefficient. We interpreted the Spearman coefficient according to Dancey and Reidy (2004). To conduct this analysis, aiming to pair the samples, we re- moved the answers in which one respondent answered that “I did not consider it a GL type” or “I have no opinion” to at least one dimension to the same GL type. Based on the results of Spearman’s rank correlation pre- sented in Table 13, we identified 13 GL types (13/19; 68.4%), with correlations that varies from strong to very strong pos- itive correlations (p-value <= 0.05% of significance). It indicates that when the Control’s level increases, the Exper- tise tends to increase. Considering only the group of GL types that presented less than 95% of significance, we identified six types. Among these types, 4 out of 6 (forums/list of discus- sions, cases/services descriptions, keynote speeches, materi- als training) had moderate correlations. For the remaining two (books/book chapters and magazine articles), we identi- fied the negligible correlations. Submitted to JSERD Kamei et al. 2022 Figure 4. Classification of each GL source type according to the level of Expertise. Each level of Expertise indicates: Low = 0; Moderate = 50; High = 100. Table 13. Types of Grey Literature: Control and Expertise correla- tion test. Notes: *Correlation is significant (strong) at the rho >= 0.4 and p-value <= 0.05 level; **p-value is not zero (we used three decimal places). Type of Grey Literature Spearman coefficient P-value Blog post .441* .017 Book/Book chapter .106 .607 Case/Soft. description .341 .082 Forum/Discussion list .337 .069 Guideline .518* .004 Keynote speeches .305 .101 Magazine article .167 .377 Manual .620* .000** Material training .308 .104 News articles .525* .003 Patent .550* .027 Q&A websites .656* .000** Slide presentation .593* .001 Soft. Repository .652* .000** Technical report .527* .005 Thesis .546* .013 Tutorial .688* .000** Video .671* .000** White paper .769* .000** Correlation analysis between the level of the dimensions (Control and Expertise) and the respondent profiles After analyzing our data, a chi-square test of independence was conducted between the respondent profiles and their in- clination to answer “I did not consider it a GL type” or “I have no opinion”. Therefore, we evaluated if the fact that the respondent is a professor or not has any influence in not con- sidering as GL or not having an opinion. Table 14 presents our result. Submitted to JSERD Kamei et al. 2022 Table 11. The types of GL in which SE researchers have no opinion regarding the level of Control and Expertise, or do not consider as GL ( GL). Control Expertise Type of source No opinion No opinion  GL Thesis 0 1 12 Patents 7 10 7 Books/Book chapters 2 1 6 Magazine articles 1 2 3 Case/Serv. desc 1 5 3 Manuals 1 3 3 Materials training 0 3 3 Software repositories 0 3 3 Blog posts 1 3 2 Forums / Lists 0 2 2 News articles 0 3 2 Slide presentations 0 6 2 Keynote speeches 0 2 2 Videos 3 4 2 Technical reports 3 2 2 Q&A websites 1 3 1 Guidelines 1 4 1 Tutorials 0 4 1 White papers 2 5 1 Table 14. Chi-square test between respondent profiles and (i) Not considered as GL, (ii) No opinion - Control, and (iii) No opinion - Expertise. Type of GL i ii iii Blog post .769 .526 .959 Book/Book chapter .925 .959 .526 Case/Soft. description .959 .959 .439 Forum/Discussion list .959 .999 .959 Guideline .526 .526 .579 Keynote speeches .959 .999 .959 Magazine article .959 .526 .959 Manual .769 .526 .769 Material training .959 .999 .769 News articles .959 .999 .769 Patent .883 .393 .726 Q&A websites .769 .526 .959 Slide presentation .959 .999 .925 Soft. Repository .769 .999 .769 Technical report .959 .769 .769 Thesis .526 .999 .194 Tutorial .526 .999 .579 Video .959 .769 .579 White paper .959 959 .711 As we can see in Table 14, we did not have found a statis- tically significant association (p < 0.05) between respondent profile and their inclination to have no opinion regarding the level of Control and Expertise, or did not consider as a GL type. Therefore, based on our results, we did not reject any null hypothesis, i.e., the respondent profile did not influence their answers, or our sample is not large enough to show this influence. We performed another Chi-square statistical test to dis- cover if the respondent profiles affect results to their opin- ion on Low, Moderate, or High level of Control and Exper- tise. For each factor (Control or Expertise) and GL (blog posts, books/book chapters, etc.), we populated a 2X3 con- tingency table composed of rows (i.e., respondent profile) and columns (i.e., their opinion as Low, Moderate, or High) variables. Table 15 presents the p-value from the chi-square statistical test for each contingency table. Table 15. Chi-square test between respondent profiles and (i) Ex- pertise level and (ii) Control level. Type of GL Expertise Control Blog post .785 .100 Book/Book chapter .958 .722 Case/Soft. description .632 .293 Forum/Discussion list .720 .557 Guideline .769 .853 Keynote speeches .185 .853 Magazine article .539 .692 Manual .496 .069 Material training .316 .690 News articles .049 .205 Patent .651 .905 Q&A websites .567 .289 Slide presentation .478 .157 Soft. Repository .387 .261 Technical report .848 .743 Thesis .746 .844 Tutorial .132 .707 Video .755 .894 White paper .925 .752 Table 15 shows the distribution of the p-values per compar- ison from each Chi-squared test of independence. As we can see, there is no evidence that different respondent profiles have different opinions. The only exception regards news ar- ticles credibility. The contingency table (see Table 16) sum- marizes the results from comparing answers from profes- sors/students and news articles credibility. We conclude that students think that news articles are more believable by ana- lyzing this result. Table 16. Contingency table from respondent profiles and the levels of Expertise for news articles Respondent profile Low Moderate High Professors/researchers 7 1 0 Students 8 13 0 Summary of RQ6: We identified similar behaviors when considering the same GL type concerning the two dimen- sions: Control and Expertise. Most GL types ran between the Low and Moderate levels in these dimensions. We also identi- fied some differences, such as the median of answers for Con- trol were at the Low level and a Moderate level for the Exper- tise dimension. The production rigor, the producer’s reputa- tion, researcher experience, and the permission of peer inter- Submitted to JSERD Kamei et al. 2022 action are the criteria employed by the researchers to assess GL source. Moreover, we found some misunderstandings to consider or not some data sources as GL, mainly related to thesis, patents, magazine articles, and books/book chapters. Considering the correlation analysis, we identified that it var- ied from strong to very strong between Control and Expertise dimensions for most GL types. Our investigation also shows a correlation analysis between the level of Control and Ex- pertise for most GL types, showing that when one dimen- sion increases, the other one tends to increase too. The same happens when the level decrease. Considering the researcher profile, we did not find evidence that different researcher’s profiles have different opinions, except for the news articles. 7 Discussion In this section, we discussed each research question, relating them to previous studies (Section 7.1). Then, we discussed some findings out of the scope of the RQs that caught our attention (Section 7.2). We also presented some advice to SE researchers based on the lessons learned with this research and previous knowledge (Section 7.3). Finally, we discussed some threats to the validity of this work (Section 7.4). 7.1 Revisiting findings In this section, we discussed our findings with each RQ. Even though we have addressed the RQ1–RQ4 in our previous study (Kamei et al., 2020), in this work, we included ad- ditional discussions and considered other related works not mentioned before. (RQ1) Motivations to use or reasons to avoid GL (i) Even our first investigation showed several motivations and benefits in using GL. Our second investigation shows that most researchers avoid its use as a reference in scientific papers. (ii) We organized the motivations to use GL into five cat- egories. Three of them were similar to previous works. For instance, Rainer and Williams (2019) and Zhang et al. (2020) also discussed the motivation to complement research find- ings. Another related motivation was to understand prob- lems, identified in three studies (Rainer and Williams, 2019; Neto et al., 2019; Zhang et al., 2020). (RQ2) Types of Grey Literature used We did not find previous primary studies focusing on this research question. We found tertiary studies that investi- gated the most GL types found in selected studies. For in- stance, Zhang et al. (2020) identified that the most common GL types used in the list of selected secondary studies were (in order) technical reports, blog posts, books/book chapters, and thesis. Considering the types of GL used by Brazilian SE re- searchers, the most common are the Q&A websites (e.g., Stack Overflow), blog posts (e.g., SE firms, such as Netflix, Uber, Facebook), and technical reports (e.g., from SEI). Our investigation shows that most of these types are related to SE practice, mainly retrieved from renowned firms or research institutions. (RQ3) Criteria used to assess Grey Literature credibility We found several criteria to assess the GL credibility, show- ing that most of them are related to the GL producer being renowned (authors, institutions, and companies). These crite- ria caught our attention because we did not find any criterion mentioning to assess the GL content. However, the challenge of Lack of reliability identified is related to this, and previous work (Williams and Rainer, 2019) have investigated a set of criteria to assess GL content (e.g., rigor in presenting infor- mation, presenting empirical data, describing the methods of data collection). (RQ4) Benefits and Challenges using Grey Literature We identified some contradictory findings between the bene- fits and challenges of GL use. They are part of the trade-off between traditional literature and GL nature. For instance, on the one hand, SE researchers mentioned that it is Easy to ac- cess and read the GL content. On the other hand, they said it the Difficult to search/find information. Regarding the bene- fit, it is related to accessing the GL content without paywall restriction and to the informal language usually written. How- ever, these benefits hinder the use of automatic data extrac- tion. We identified another trade-off, for instance, even the per- ceived benefit of Advanced the state of the art/practice, sev- eral researchers are avoiding the use of GL due to the chal- lenges of Lack of reliability and Lack of scientific value. In part, those trade-offs are expected, showing the necessity for further investigations on how to improve the use of GL in SE research. For instance, as we have done in this research. Even though we confirmed some findings of the litera- ture, the main benefit identified (Easy to access and read) was not mentioned by previous studies (Williams and Rainer, 2017; Rainer and Williams, 2018, 2019; Garousi et al., 2016). Similarly, it occurred with the challenges. For instance, the Lack of scientific value was not identified in previous stud- ies. Even, it was the second challenge most mentioned in our investigation. We informed that the benefits identified in this studyare relatedto ourresults ofa tertiarystudy (Kameiet al., 2021). Regarding the challenges, some findings in previous works (Zhang et al., 2020; Kamei et al., 2021). For instance, the Uncertain availability of GL was not identified in our in- vestigation. (RQ5) Prioritizing the Criteria to Assess Grey Literature Expertise This investigation confirmed some findings of Survey 1 (Kamei et al., 2020), showing that the most important cred- ibility criteria are related to the GL source be produced by a renowned source. However, using the prioritization criteria, some of these findings contrasted partly because, in Survey 1 results, no criteria were related to assessing the GL con- tent. At the same time, in Survey 2, several SE researchers Submitted to JSERD Kamei et al. 2022 considered important criteria of Citing academic sources and Presenting empirical data. The criteria of citing academic sources, describing the col- lection methods, and presenting empirical data caught our attention due to the emphasis on applying scientific perspec- tives to assess GL sources. In our opinion, these criteria are difficult to be used, as we discuss in the following: 1) Accord- ing to Williams (2018), online articles and blogs produced by SE practitioners rarely mentioned academic sources; 2) GL sources are produced mainly by practitioners (Kamei et al., 2021), and consultant/companies have different man- ners of expressing than academics one; and 3) Most of the GL sources do not present empirical data. Instead, they are primarily based on their opinions and belief (Rainer, 2017). (RQ6) Types of Grey Literature vs. Dimensions of Con- trol and Expertise Some findings caught our attention because some GL types run between two and sometimes into three levels of the classi- fication of the dimensions, showing that different interpreta- tions may occur for the same type. Although, the correlation analysis showed a strong correlation between these interpre- tations for most of the GL types investigated. Considering the respondent’s profiles, different from what we expected, our statistical analysis based on the Chi-square test showed that different respondent profiles shared similar opinions about each source type being considered a GL or not and concern- ing the level of control and credibility. The criteria used by SE researchers to classify these dimen- sions are mostly related to the rigor of source, researcher ex- perience, and the interaction permitted for the user to deal with each GL type. Although some of them considered it is challenging to classify considering only the source type, without a real example to be deeply assessed, as one re- searcher pointed out: “(...) the credibility will depend on who produced that content.” Moreover, we perceived that sources (e.g., technical reports, books/book chapters, thesis) produced by companies and institutions mainly were consid- ered with Moderate to a High level of Control and Expertise. In contrast, the sources commonly produced by SE practi- tioners (e.g., forums/list of discussions, blog posts, videos) have a Low level of Control and Expertise. These findings caught our attention because, in RQ2 results, the most used GL sources runs between Low to Moderate level. It appears that the benefits and the motivations to use GL outweigh the Low level of Control and Expertise presented in these sources. With these findings, we reinforce the claim of Garousi et al. (2019) that it is complicated to assess the dimensions of Control and Expertise alone. Although they could bring us one direction, other essential criteria include identifying GL’s producer and content. For this reason, we advocate that SE researchers use the concept of the “shades of GL” to clas- sify and assess a GL source because it recognizes the differ- ent perspectives of the nature of GL, although future investi- gations to set a limit between tiers of the shades are essential. Beyond that, we claimed the importance of employing ob- jective criteria to assess GL sources and better permit the GL classification according to the shades. Although, as our find- ings showed, it could be essential to propose intermediate shades between each tier. 7.2 Other discussions In this section, we discussed some findings and important dis- cussions unrelated to a specific research question. First, we discussed the relations among the researcher’s perceptions’ of GL. Second, describe the relationship between the credi- bility criteria and the dimensions of credibility investigated. Lastly, discuss our findings of the perceptions of the different GL types. Perceptions of Grey Literature We identified relations between the perceptions of GL, as shown in Figure 5. For instance, we identified some motiva- tions to use GL related to some benefits identified (slashed line) and some reasons to avoid GL with some challenges by GL use (dotted line). In what follows, we discussed some of them. Regarding the motivation to use “To complement research findings” is related to the benefit of use GL to provide “Dif- ferent results from scientific studies” as some respondents informed that the inclusion of GL could provide evidence not explored or identified in the research area. Another one is “To answer practical and technical question” related to the benefit of “Practical evidence”, which was not perceived us- ing only traditional literature. The reasons to avoid GL and the challenges identified are almost the same. Except for the “Lack of reliability” that hinders the replicability of the search for GL. It could be motivated due to the “Non-structured information” of a GL source. Expertise criteria vs. Dimensions of Control and Exper- tise The most important criteria identified to assess GL credibil- ity are related to the “Producer reputation” and the “Rigor” presented in the GL source. The first is related to the source be produced by a renowned author, institution, or cited by a renowned source. The second with how the information is presented, for instance, if it describes the methods used to collect the data. Figure 6 presented these criteria. We also identified some relations between the credibility criteria with some reasons to classify the Control and Exper- tise dimensions, as shown in Figure 6. The Control (slashed line) is related to the “peer interaction”, “producer reputa- tion”, and the “rigor”. The Expertise (dotted line), their re- lations are the same as the Control dimension, including the “researcher experience”. This last is related to their own re- searcher experience using GL to assess its credibility. GL types interpretation In our second investigation, we found some misunderstand- ing in interpreting GL types (see Table11), even though those types were recognized as GL in some previous SE works Submitted to JSERD Kamei et al. 2022 Figure 5. Relationships identified between the Motivations to Use GL with Benefits and the Reasons to avoid with the Challenges. (e.g., Maro et al. (2018), Zhang et al. (2020)). In the follow- ing, we present the most common types that were not con- sidered GL: thesis (11/34 occurrences), patents (6/34 occur- rences), books/book chapters (6/34 occurrences), and maga- zine articles (3/34 occurrences). In this regard, for instance, one researcher pointed out: “I understand that thesis and dissertations are not Grey because external researchers for- mally assess them.” We also found in previous studies some contradictions in interpreting a source type as a GL type or not. For instance, while Hosseinzadeh et al. (2018) considered books/book chapters as a GL type, the study of Berg et al. (2018) did not. We identified another conflict, for instance, while Neto et al. (2019) considered thesis a peer-reviewed source, Rodríguez- Pérez et al. (2018) classified them as GL types. These mis- understandings were also identified in the previous investi- gation with secondary studies (Kamei et al., 2021). In our opinion, these misunderstandings reflect on each source’s classification regarding Control and Expertise. For instance, for most researchers, books/book chapters, techni- cal reports, thesis, and patents were not considered a GL type and related them to a High level of Control and Expertise (Figures 3 and 4). It shows that the peer-reviewed process and grey literature boundary are unclear when considering only the source type. 7.3 Lessons learned With this investigation and the previous one (Kamei et al., 2020), we showed how GL could contribute to SE research. However, some advice is important to this use could be im- proved. For SE researchers, our findings highlight to pay attention when searching, selecting, and using grey literature in SE re- search: 1) Explore the GL sources before using on their re- search, as there are several types of GL source, to understand what evidence each GL source could provide and could ben- efit the research and how to retrieve information from them, due to the issues about the difficulty to search for; 2) It is important to the researchers be aware of a set of credibility criteria that could be used to assess GL sources. For instance, by selecting data produced by renowned sources (e.g., au- thors, institutions) and understanding how each credibility criteria could better fit each type of GL; 3) Another criterion to improve GL credibility could be used, considering the var- ious interpretations for GL assessment related to the Control and Expertise aspects; and 4) Understand how to improve the Submitted to JSERD Kamei et al. 2022 Figure 6. Relationships identified between the Grey Literature Expertise criteria with the Dimensions of Control and Expertise. search for GL using a systematic approach with methods and techniques to better deal with the content, aiming to reduce their lack of reliability. 7.4 Threats to Validity This section discussed some limitations and threats to valid- ity and what we have done to mitigate them. Construct validity: Even our efforts to improve our ques- tionnaire, we identified two potential threats in our research: 1) Specifically on the questions that we asked for the par- ticipant to classify each source type concerning the Control and Expertise dimensions. We mitigate this, informing the researchers that we know that Control and Expertise vary from source to source, and asked them to consider the most frequent experience for each data source. However, three re- searchers reported that assessing these GL types’ dimensions was difficult without considering the content and the pro- ducer. This difficulty may have introduced some bias, and 2) We used a non-probability sample by convenience (Baltes and Ralph, 2021) because we intend to investigate only SE researchers with previous experience in GL use. Then, we surveyed only 53 Brazilian researchers we knew had this ex- perience. Internal validity: As our investigation used personal inter- pretation, we may have introduced biases during the data ex- traction and analysis. We tried to minimize those by using a paired approach with a constant discussion between the re- searchers and invoking a third researcher to revise the de- rived codes and categories. External validity: Our first investigation used a sample of the SE researchers from the largest SE conference in Brazil. In the second investigation, our sample was representative of SE research because we had a 30.4% response rate with a diversity of researchers (1/3 are women, 50% have a Ph.D. in SE, and 30% a Master’s). In our second investigation, we conducted our survey with the researchers from the first sur- vey that mentioned they had used GL in SE research. We re- ceived 64.1% of response rate. From these, almost 60% are professors or researchers with more than ten years of SE re- search experience, and most have used GL from 2 and 5 sci- entific studies. Nevertheless, as we focused on the Brazilian SE research community for both surveys, the findings may not apply to other populations. Although, we used the peer review process during all this research, aiming to improve Submitted to JSERD Kamei et al. 2022 the external validity to draw general conclusions. Conclusion validity: Even with 30.4% and 64.1% of re- sponse rates in both surveys, we may have lost some impor- tant information. For the first investigation, we mitigated this threat by comparing our results with previous studies con- ducted with different populations, showing that our results showed similarly. Even though we have reached a consider- able response rate for the second investigation, our sample was small and focused only on the Brazilian SE researchers’ perspective to permit the results’ generalization. Another threat is related to the correlation analysis between the di- mensions of Control and Expertise to each GL type because we did not explicitly ask this correlation to the respondents. 8 Related works This section groups the related works in studies that explored GL’s credibility and quality assessment in SE research. For each study presented, we show the differences concerning our work. The Grey Literature Review (GLR) conducted by Raulamo-Jurvanen et al. (2017) focused on under- standing how SE practitioners choose a test automation tool by investigating the opinions and experiences of SE practitioners produced in GL sources. They analyzed the GL source’s credibility during the quality assessment according to the number of readers, number of shares, number of com- ments, number of Google Hits for the titles, and adopting backlinks analysis (a reference comparable to a citation). Our work differs because we provide different findings on assessing GL credibility. Moreover, we also intend to understand the prioritization of a set of criteria identified in previous investigations (Kamei et al., 2020; Williams and Rainer, 2019). Soldani et al. (2018) conducted another study based on GLR. This study investigated the pains and gains of the use of microservices. They perceived that the traditional literature on the topic is still in the early stage even though companies are working day-by-day with microservices, as witnessed by the considerable amount of GL on the subject. The authors considered a set of criteria of control factors to select GL sources: Practical Experience of the authors (+5 years), In- dustrial case-study, Heterogeneity (present the information about at least 5 top industrial domains), and Implementa- tion quantity (present detailed information). Our work differs from this because we focused on investigating and providing a set of general criteria that could be used to assess different types of GL sources. Williams and Rainer conducted two studies to investigate how to improve the quality and credibility assessment of blog articles in SE research. The first study (Williams and Rainer, 2017) examined some criteria to evaluate blog articles to be used as a source of SE research evidence through two pilot studies (a systematic mapping study and preliminary analy- ses of blog posts). The findings showed some criteria for se- lecting a blog article’s content (e.g., authentic, informative). The second study (Williams and Rainer, 2019) focused on finding credibility criteria to assess blog posts by selecting 88 candidate credibility criteria from a previous Mapping Study (Williams and Rainer, 2017). Then, to gather opinions on a blog post to evaluate those credibility criteria, they sur- veyed 43 SE researchers. Some criteria were found, for in- stance, the presence of reasoning, reporting empirical data, and reporting data collection methods. As discussed in the previous related works, our criteria were not focused on a spe- cific type of GL. Moreover, our identified criteria are differ- ent from Williams and Rainer’s, and we tried to understand what each SE researcher considered in assessing the different types of GL. Most recently, we conducted a tertiary study with sec- ondary studies of SE (Kamei et al., 2021) presenting a critical review of GL use in secondary studies. In total, were inves- tigated 446 studies, identifying 126 studies that searched or included GL as a primary source. This finding showed that GL was not widely used in the analyzed studies, although it increased in GL use over the years. The tertiary research explored the benefits, challenges, and motivations to use or avoid GL use. Our work differs from this previous one be- cause we asked the SE researchers directly, different from investigations with published studies, where these questions were not directly explored, leaving the authors the option to include or not that information. Even though the similarity of these works with our work, there are differences in at least four points: i) We found a dif- ferent set of credibility criteria: the source needed to be pro- vided by renowned institutions, renowned companies, cited by others, and derived from academia, ii) We did not focus on a specific type of GL source, iii) We explored the expe- rience of SE researchers to understand the perspectives on the credibility of different GL types and how SE researchers assess them, and iv) We investigated a set of prioritization criteria used to assess GL credibility. 9 Conclusions and Future Works Although the use and investigation of Grey Literature in SE research increased over the last years, they are still recent. In this work, we reported two investigations based on the Brazilian SE researchers’ perspective to present an overview of GL sources usage, potential benefits and challenges of its use, a set of criteria to assess GL credibility, and the percep- tions about GL types concerning Control and Expertise crite- ria. Our main findings show: 1. Blogs, community websites, and technical experi- ence/reports are the most common GL sources used by SE researchers; 2. The main motivations to use GL is because its content could complement research findings by providing dif- ferent results from scientific studies and answer practi- cal and technical questions; 3. GL use is not widespread as a scientific reference due to some credibility and reliability constraints; 4. The use of the “shades of GL” can help SE researchers to assess GL and interpret the different GL types. Al- though, we identified that SE researchers have different interpretations of GL Control and Expertise; 5. The most relevant criteria used to assess GL credibility Submitted to JSERD Kamei et al. 2022 are the GL source be provided by renowned authors, in- stitutions, companies, or be cited by a renowned source; 6. The most critical criteria to assess the Control and Ex- pertise of a GL source are related to the producer repu- tation and the rigor of the GL content presented; 7. Thereisapositivecorrelationforcredibilitycriteriacon- sidering the dimensions of Control and Expertise for each GL. It shows that when the level of Control in- creases, the level of Expertise tends to increase too; 8. We did not find significant differences between the opin- ions of graduate students and professors/researchers concerning the Control and Expertise dimensions ana- lyzed of each GL type. For replication purposes, all the data used in these investigations are available online at https://doi.org/10.5281/zenodo.5164714. For future works, we plan i) To expand our view by inves- tigating other SE research communities; and ii) To deeply understand the GL credibility aspects, focusing on building an objective quality assessment instrument that comprehends these several types. References Adams, J., Hillier-Brown, F. C., Moore, H. J., Lake, A. A., Araujo-Soares, V., and Summerbell, M. W. C. (2016a). Searching and synthesising ‘grey literature’ and ‘grey in- formation’ in public health: critical reflections on three case studies. Systematic Reviews, 5(1):164. Adams, R. J., Smart, P., and Huff, A. S. (2016b). Shades of grey: Guidelines for working with the grey literature in systematic reviews for management and organizational studies. International Journal of Management Reviews, 19(4):432–454. Baltes, S. and Ralph, P. (2021). Sampling in software engi- neering research: A critical review and guidelines. Berg, V., Birkeland, J., Nguyen-Duc, A., Pappas, I. O., and Jaccheri, L. (2018). Software startup engineering: A sys- tematic mapping study. Journal of Systems and Software, 144:255–274. Bonato, S. (2018). Searching the Grey Literature. Rowman & Littlefield. Braun, V. and Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2):77– 101. Coelho, J., Valente, M. T., Milen, L., and Silva, L. L. (2020). Is this GitHub project maintained? measuring the level of maintenance activity of open-source projects. Information and Software Technology, 1:1–35. Dancey, C. P. and Reidy, J. (2004). Statistics Without Maths for Psychology: Using Spss for Windows. Prentice-Hall, Inc., USA. Garousi, V., Felderer, M., and Hacaloğlu, T. (2017). Soft- ware test maturity assessment and test process improve- ment: A multivocal literature review. Information and Software Technology, 85:16–42. Garousi, V., Felderer, M., and Mäntylä, M. V. (2016). The need for multivocal literature reviews in software en- gineering: Complementing systematic literature reviews with grey literature. In Proceedings of the 20th Interna- tional Conference on Evaluation and Assessment in Soft- ware Engineering, EASE ’16, pages 26:1–26:6, New York, NY, USA. ACM. Garousi, V., Felderer, M., and Mäntylä, M. V. (2019). Guide- lines for including grey literature and conducting multivo- cal literature reviews in software engineering. Information and Software Technology, 106:101–121. Hosseinzadeh, S., Rauti, S., Laurén, S., Mäkelä, J.-M., Holvi- tie, J., Hyrynsalmi, S., and Leppänen, V. (2018). Diversi- fication and obfuscation techniques for software security: A systematic literature review. Information and Software Technology, 104:72–93. Kamei, F., Wiese, I., Lima, C., Polato, I., Nepomuceno, V., Ferreira, W., Ribeiro, M., Pena, C., Cartaxo, B., Pinto, G., and Soares, S. (2021). Grey literature in software engineer- ing: A critical review. Information and Software Technol- ogy, page 106609. Kamei, F., Wiese, I., Pinto, G., Ribeiro, M., and Soares, S. (2020). On the use of grey literature: A survey with the brazilian software engineering research community. In Proceedings of the XXXIV Brazilian Symposium on Soft- ware Engineering, SBES 2020, New York, NY, USA. As- sociation for Computing Machinery. Linåker, J., Sulaman, S., Maiani de Mello, R., and Martin, H. (2015). Guidelines for conducting surveys in software engineering. Technical report, Lund University. Maro, S., Steghöfer, J.-P., and Staron, M. (2018). Software traceability in the automotive domain: Challenges and so- lutions. Journal of Systems and Software, 141:85 – 110. Neto, G. T. G., Santos, W. B., Endo, P. T., and Fagundes, R. A. A. (2019). Multivocal literature reviews in software engineering: Preliminary findings from a tertiary study. In Proceedings of the ACM/IEEE International Sympo- sium on Empirical Software Engineering and Measure- ment, ESEM ’19, pages 1–6. Oliveira, J. A., Viggiato, M., Pinheiro, D., and Figueiredo, E. (2021). Mining experts from source code analysis: An empirical evaluation. Journal of Software Engineering Re- search and Development, 9(1):1:1 – 1:16. Petticrew, M. and Roberts, H. (2006). Systematic Reviews in the Social Sciences: A Practical Guide, volume 11. Black- well Publishing Ltd. Pinto, G., Ferreira, C., Souza, C., Steinmacher, I., and Meirelles, P. (2019). Training software engineers using open-source software: The students’ perspective. In Pro- ceedings of IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training, ICSE-SEET ’19, pages 147–157. Institute of Electrical and Electronics Engineers (IEEE). Rainer, A. (2017). Using argumentation theory to analyse software practitioners’ feasible evidence, inference and be- lief. Information and Software Technology, 87:62–80. Rainer, A. and Williams, A. (2018). Using blog articles in software engineering research: Benefits, challenges and case–survey method. In Proceedings of the 25th Australasian Software Engineering Conference), ASWEC ’18, pages 201–209. Submitted to JSERD Kamei et al. 2022 Rainer, A. and Williams, A. (2019). Using blog-like doc- uments to investigate software practice: Benefits, chal- lenges, and research directions. Journal of Software: Evo- lution and Process, 31(11):e2197. Raulamo-Jurvanen, Päivi, Mäntylä, M., and Garousi, V. (2017). Choosing the right test automation tool: A grey literature review of practitioner sources. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, EASE ’17, pages 21– 30. ACM. Rodríguez-Pérez, G., Robles, G., and González-Barahona, J. M. (2018). Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature review of the use of the szz algorithm. Informa- tion and Software Technology, 99:164–176. Saltan, A. (2019). Do we know how to price saas: A multi- vocal literature review. In Proceedings of the 2nd ACM SIGSOFT International Workshop on Software-Intensive Business: Start-Ups, Platforms, and Ecosystems, IWSiB 2019, pages 7–12. ACM. Schöpfel, J. and Prost, H. (2020). How scientific papers men- tion grey literature: a scientometric study based on scopus data. Collection and Curation. Soldani, J., Tamburri, D. A., and Heuvel, W.-J. V. D. (2018). The pains and gains of microservices: A systematic grey literature review. Journal of Systems and Software, 146:215–232. Spencer, D. (2009). Card sorting: Designing usable cate- gories. Rosenfeld Media. Storey, M.-A., Singer, L., Cleary, B., Filho, F. F., and Za- galsky, A. (2014). The (r) evolution of social media in software engineering. In Proceedings of the on Future of Software Engineering, FOSE ’14. ACM Press. Storey, M.-A., Zagalsky, A., Filho, F. F., Singer, L., and Ger- man, D. M. (2017). How social and communication chan- nels shape and challenge a participatory culture in soft- ware development. IEEE Transactions on Software En- gineering, 43(2):185–204. Stray, V. and Moe, N. B. (2020). Understanding coordination in global software engineering: A mixed-methods study on the use of meetings and slack. Journal of Systems and Software, 170:110717. Tom, E., Aurum, A., and Vidgen, R. (2013). An explo- ration of technical debt. Journal of Systems and Software, 86(6):1498–1516. Viera, A. J. and Garrett, J. M. (2005). Understanding inter- observer agreement: the kappa statistic. Family Medicine, 37(5):360–363. Williams, A. (2018). Using reasoning markers to select the more rigorous software practitioners’ online content when searching for grey literature. In Proceedings of the 22Nd International Conference on Evaluation and Assessment in Software Engineering, EASE ’18, pages 46–56. ACM. Williams, A. and Rainer, A. (2017). Toward the use of blog articles as a source of evidence for software engineering research. In Proceedings of the 21st International Con- ference on Evaluation and Assessment in Software Engi- neering, EASE’17, pages 280–285, New York, NY, USA. ACM. Williams, A. and Rainer, A. (2019). How do empirical software engineering researchers assess the credibility of practitioner-generated blog posts? In Proceedings of the 23nd International Conference on Evaluation and Assess- ment in Software Engineering, EASE ’19, pages 211–220. ACM. Zahedi, M., Rajapakse, R. N., and Babar, M. A. (2020). Min- ing questions asked about continuous software engineer- ing: A case study of stack overflow. In Li, J., Jaccheri, L., Dingsøyr, T., and Chitchyan, R., editors, EASE ’20: Eval- uation and Assessment in Software Engineering, Trond- heim, Norway, April 15-17, 2020, pages 41–50. ACM. Zhang, H., Zhou, X., Huang, X., Huang, H., and Babar, M. A. (2020). An evidence-based inquiry into the use of grey literature in software engineering. In Proceedings of the 42th International Conference on Software Engineering, ICSE ’20. Introduction Background Research Questions Research Methods Survey 1: Initial investigation with the Brazilian SE researchers Survey Design Survey Respondents Survey Questions Survey 2: Investigating Brazilian SE researchers that use Grey Literature Survey design Survey Respondents Survey Questions Data Analysis and Synthesis Qualitative analysis Quantitative analysis Previous Results Results Discussion Revisiting findings Other discussions Lessons learned Threats to Validity Related works Conclusions and Future Works