The Role of Best Practices to Appraise Open Source Software


Electronic Communications of the EASST
Volume 48 (2011)

Proceedings of the
Fifth International Workshop on
Foundations and Techniques for

Open source Software Certification
(OpenCert 2011)

The Role of Best Practices to Appraise Open Source Software

Miguel Regedor, Daniela da Cruz and Pedro Henriques

17 pages

Guest Editors: Luis Soares Barbosa, Dimitrios Settas
Managing Editors: Tiziana Margaria, Julia Padberg, Gabriele Taentzer
ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122

http://www.easst.org/eceasst/


ECEASST

The Role of Best Practices to Appraise Open Source Software

Miguel Regedor1, Daniela da Cruz2 and Pedro Henriques3

1 miguelregedor@gmail.com
2 danieladacruz@di.uminho.pt

3 prh@di.uminho.pt
Dep. de Informática / CCTC

Universidade do Minho
Braga, Portugal

Abstract: Thousands of open source software (OOS) projects are available for
collaboration in platforms like Github or Sourceforge. However, like traditional
software, OOS projects have different quality levels. The developer, or the end-
user, need to know the quality of a given project before starting the collaboration
or its usage—they might of course to trust in the package before taking a decision.
In the context of OSS, trustability is a much more sensible concern; mainly end-
users usually prefer to pay for proprietary software, to feel more confident in the
package quality. OSS projects can be assessed like traditional software packages
using the well known software metrics. In this paper we want to go further and
propose a finer grain process to do such quality analysis, precisely tuned for this
unique development environment. As it is known, along the last years, open source
communities have created their own standards and best practices. Nevertheless, the
classic software metrics do not take into account the best practices established by
the community. We feel that it could be worthwhile to consider this peculiarity as
a complementary source of assessment data. Taking Ruby OSS community and
projects as framework, this paper discusses the role of best practices in measuring
software quality.

Keywords: software metrics, static code analysis, open-source, program compre-
hension

1 Introduction

Nowadays, Open Source Software (OSS) is well disseminated. Thousands of OSS packages
can be found online, and free to download, in Open Source Project Hosting Websites (OSPHW)
like SourceForge1, Google Code2, or GitHub 3. Those websites, usually in conjunction with a
Version Control System (VCS), make it easy for developers, all around the globe, to collaborate
in Open Source Software Projects (OSSP), and also act as a way to make software available to
users.

1 http://sourceforge.net/.
2 http://code.google.com/.
3 https://github.com/.

1 / 17 Volume 48 (2011)

mailto:miguelregedor@gmail.com
mailto:danieladacruz@di.uminho.pt
mailto:prh@di.uminho.pt
http://sourceforge.net/
http://code.google.com/
https://github.com/


The Role of Best Practices to Appraise Open Source Software

According to NetCraft4, the market share for top servers across the million busiest sites was
66.82% for the open source web server, Apache, much higher than the 16.87% for Microsoft
web servers in May 2010. Even governments started noticing open source, during the last few
years, and in some case adopted it[Hah02]. The broad acceptance of OSS means that now OSS
is not only used by computer specialists.

John Powell5 has declared that measuring the savings that people are making in license fees,
the open-source industry is worth 60 billion dollars. Matt Asay6 shares the view that from the
customers perspective open source can be now considered the largest software industry in the
world. The full review can be found at CNET News7.

Usually large industries have a strict organization model, that is not the way open source
communities operates. Open Source communities work in a kind of bazaar style. E. Ray-
mond [Ray00] compares the traditional software development process to built cathedrals, few
specialized individuals working in isolation. While open source development seemed to resem-
ble a great babbling bazaar. But OSS is not developed, all the time, in bazaar style and each
community can have particular habits. Currently, big open source projects can have companies
supporting them. However, most projects are not that big and sometimes it is hard to distinguish
the project developers from the project customers/users, because of that bug reports and wanted
features can get indistinguishable too. The specification of an open source software project
evolves in an organic way [CM07].

Can software that is developed in such chaotic way be trusted as a high quality product? The
shock is that in fact the bazaar style seemed to work [HS02]. Some big projects, for instance
Linux distributions such as Ubuntu8, are the proof of it. However, how can the quality of this
software be measured?

The most basic meaning of software quality is commonly recognized as lack of ”bugs”, and
the meeting of the functional requirements. But quality is not simply based on that [GKS+07].
The quality of a software system depends, among other things, on update frequency, quantity
of documentation, test coverage, number and type of its dependencies and good programming
practices. By analysing those parameters a user can make a better choice when picking software
for a specific task [MA07].

When a user/developer finds a new OSSP, for example in GitHub, the things that will most
influence the time needed to have a better understanding of the project, to use, or collaborate in
it, are the quality of the documentation and the source code readability. Although the OSPHWs
provide plenty of useful information about the hosted projects, currently, they do not give a quick
answer to the following questions: Does this project have good documentation? Does the code
follow standards? How similar is it to other projects?

An OSSP is built up from hundreds, sometimes thousands, of files. It can be coded in many
different computer languages. To analyze manually a software project is a very hard and time
consuming task, and not all users have the ability to answer the previous questions by looking at
the source code [CAH03].

4 http://news.netcraft.com/archives/2010/05/14/may 2010 web server survey.html/, accessed on 2010/12/21.
5 John Powell is CEO, President, and Co-founder, Alfresco Software Inc.
6 Matt Asay is chief operating officer at Canonical, the company behind the Ubuntu Linux operating system.
7 http://news.cnet.com/8301-13505 3-9944923-16.html/ accessed on 2010/12/21.
8 http://www.ubuntu.com. Ubuntu is a free & open source operating system.

Proc. OpenCert 2011 2 / 17

http://news.netcraft.com/archives/2010/05/14/may_2010_web_server_survey.html/
http://news.cnet.com/8301-13505_3-9944923-16.html/
http://www.ubuntu.com


ECEASST

However, open source communities are constantly creating and improving their working method-
ologies. And even without noticing, communities create rules and best practices. By following
those best practices, software projects increase their maintainability level.

With that in mind, a system capable of analyzing and measuring a given OSSP, producing
detailed quantitative and qualitative reports about it, would enable users to make better choices
and, of course, developers to further improve the package.

This paper discusses the concept of Quality when addressing an OSSP, and how to measure it
(Section 3) using classic approaches. After that, the notion of best practices is introduced and
the impact of taking their use into account when assessing an OSSPs is explored. To make this
proposal clearer and stronger, Ruby9 community is taken as a starting target (Section 4). At last
but not least, to support our proposal a case-study is shown, in Section 5: seven Ruby OSSP are
measured and compared.

2 Open Source

Open source describes practices in production and development where everybody as access to
the product source materials. Definition from http://opensource.org/:

Open source is a development method for software that harnesses the power of
distributed peer review and transparency of process. The promise of open source is
better quality, higher reliability, more flexibility, lower cost, and an end to predatory
vendor lock-in.

Generically, open source refers to a computer software in which the source code is available,
free of charge, to the general public for use, modification and redistribution.

Nevertheless, in the past few years, the concept of open source has been widely used, not just
in computer software, but in every industry. Actually, new concepts, like open design10 or open
religions11, emerged from it. A simple metaphoric example:

A restaurant would be open source if the chef reveals to the general public his
cooking techniques and recipes. Consequently, by revealing his secrets, other people
can start doing the same dishes and even improve his techniques.

That might not be a good bet for a restaurant, but its proved to be a good one in software devel-
opment.

2.1 Open Source Software Development

Previously, in the introduction it was said that large industries have a strict organization model
and that it is not the way open source communities operates. Open Source communities work in
bazaar style. This term was introduced by Raymond in [Ray00].
9 Ruby is an open source programming language. Ruby community is relatively young but still very focused on
following best practices.
10 Open design is the development of physical products, machines and systems through use of publicly shared design
information.
11 Open-source religions attempt to employ open-source methodologies in the creation of religious belief systems.

3 / 17 Volume 48 (2011)

http://opensource.org/


The Role of Best Practices to Appraise Open Source Software

Of course, OSS is not developed, all the time, in the same way. Each community have par-
ticular habits. Different development and management methodologies, more traditional or more
agile, can be used. Currently, the truth is that the most successful communities organize them-
selves in a similar way as professional and proprietary companies, and some of the big open
source projects have big companies supporting them, but that it is not charity. Imagine the con-
sequences of a having a handful of highly motivated eyes going through the code, constantly
reviewing it, correcting and adding to it. People working not because they were told to, but be-
cause it is their own will. Those communities are the strength of OSS and the companies behind
it.

OS development makes possible to a project to reach a high quality level, in much less time and
with fewer financial investment, comparing to traditional software development. Nevertheless
those OSP still follow the OS core rules, and those projects are community driven, the users and
developers must feel engaged to it.

It is obvious that companies need to make money, but even if their software is free and open
source, new ways of income can be explored, for example charge for support, related services,
donations, etc.

The well known open source browser, Firefox, is the descent of the graphical web browser
named Mosaic released by Netscap in 1993. When Microsoft bundled Internet Explorer with
Windows, it was obvious that Netscape was doomed. But, they turned project into open source,
created the Mozilla Foundation, and the community gathered around it, helping the company
regain the lost market.

This and other examples shows OS develoment as one of the most effective development
models, today. As a fact, many companies are trying to explore these business models.

2.2 Open Source Project Hosting Platforms

The strength of the open source development model comes from the user base and the power
given to it. Users should fill out bug reports, submit feature requests, etc. Because the de-
velopers can be spread all around the globe there there is the need of effective administered
communication channels, for better cooperation and co-ordination.

An open source project hosting platform is the central tool that supports and co-ordinates the
development of an open source project, normally it is in a form of a website.

Since 1999 (year that SourceForge was launched), many open source project hosting websites
(OSPHWs) were created to host open source projects. OSPHWs offer different features, like
codebase 12 hosting (a project codebase is typically stored in a source control repository), code
review, bug tracking, web hosting, wiki, mailing list, etc [BHK06].

GitHub is one of the youngest OSPHW (launched in 2008). However, in only two years,
it drew more than 500,000 users (one quarter of sourceforge users) and is hosting more than
1,500,000 projects. The only version control system provided by GitHub is Git13. Because
GitHub projects are in fact git repositories, it is incredibly easy to make branches and merges
in GitHub. Although branching was considered a big pain in older version control systems, it

12 The term codebase means the whole collection of source code used to build a particular application or component.
13 http://git-scm.com/. Git is a free & open source, distributed version control system that Linus Torvalds developed
to help manage Linux kernel development.

Proc. OpenCert 2011 4 / 17

http://git-scm.com/


ECEASST

turned out that using git, it can in fact improve the developers collaboration and organization.
This happens because of the distributed philosophy and implementation of git Hosting a Git
repository is not hard but co-ordinating efforts of forking and merging amongst people is tough.
With a system like Github, it becomes a lot easier [Coo08]. However, the main reason for GitHub
popularity are the social aspects. Users and projects have public profiles and activity feeds which
display activity on public projects such as commits, comments, forks, etc. Furthermore, with so
many high profile projects on board ( jQuery, reddit, Sparkle, curl, Ruby on Rails, node.js,
ClickToFlash, Erlang/OTP, CakePHP, Redis), it is easy to imagine that GitHub could be the next
SourceForge.

The reasons above allow us to believe that GitHub has a strong and growing open source
community, that it is an important platform both for users and developers. Because of that and
the high number of ruby on rails projects hosted here, It was decided to focus this study on
GitHub projects

3 Assessing Open Source Software

The simplest operation in science and the lowest level of measurement is classification [Kan02].
By assessing OSS we mean to sort OSS projects into an ordinal scale14 This can be achieved by

defining a ranking system15 and by placing OSS projects into quality categories with respect to
certain quality attributes. First we need to find a way of quantifying those OSS quality attributes.

In software, quality is an abstract concept. It is commonly recognized as lack of ”bugs”, and
the meeting of the functional requirements. However, quality can be perceived and interpreted
differently based on the actual context, objectives and interests of each project. Many software
development companies do monitor costumer satisfaction as a quality index, for instance, IBM
ranks their software products in levels of CUPRIMDSO [Kan02]:

• Capability/Functionality (refers to the software meeting its functional requirements)

• Usability (refers to the required effort to learn, and operate the software)

• Performance/Efficiency (refers to the software performance and resource consumption)

• Reliability (refers to software fault tolerance and recoverability)

• Instalability/Portability (refers to the required effort to install or transfer the software to
another environment)

• Maintainability (refers to the required effort to modify the software)

• Documentation/Information (refers to the coverage and accessibility of the software doc-
umentation)

• Service (refers to the company monitoring and service)
14 Ordinal scale refers to the measurement operations through which the subjects can be compared in order.
15 Raking system example: to classify a quality attribute, for instance the project documentation, according to its
quality with five, four, three, two or one star.

5 / 17 Volume 48 (2011)


The Role of Best Practices to Appraise Open Source Software

• Overall (refers to an overall classification based on the other attributes)

Almost every big software company have similar quality attributes. ISO/IEC 9126 provides a
framework for the evaluation of software quality (The goal is to achieve quality in use, in other
words, quality from the user perspective) [Bev99] IISO/IEC 912 defines six software quality
attributes:

• Functionality (refers to the software meeting of the functional requirements)

• Reliability (refers to software fault tolerance and recoverability)

• Usability (refers to the required effort to learn, and operate the software)

• Efficiency (refers to the software performance and resource consumption)

• Maintainability (refers to the required effort to modify the software)

• Portability (refers to the required effort to transfer the software to another environment)

Quality attributes have interrelationships. They can be conflictive16 or support17 one another. For
example, the higher the functional complexity of the software, the harder it becomes to achieve
maintainability [Kan02].

Because of the OSP bazaar style and continuous development process, it is intuitive that the
maintainability and documentation attributes have a big influence on the overall quality and
continuous progress of an OSP (maintainability and documentation have support relationships
with usability, reliability and availability attributes, but might be conflictive with functionality
and performance attributes).

Failure to meet functionality often leads to late changes and increased costs in the develop-
ment process. The software industry and researchers have been mostly interested on testing
methodologies that focus on functional requirements and pay little attention to non-functional
requirements [CP09].

There are several challenges and difficulties, in assessing non-functional quality attributes
for software projects. For example, security is a non-functional requirement that needs to be
addressed in every software project. Therefore a badly-written software may be functional, but
subject to buffer overflow attacks. Another example is the amount of codebase comments, if the
code does not have any comments it will not affect the functional requirements, but it is obvious
that it will decrease readability and maintainability [GKS+07].

3.1 Classic Software Metrics

To classify OSS with regards to a certain quality attribute, we need to find which factors influence
it. Then we need a way to measure that attribute. If we need to measure we need metrics.

Fortunately, there are around two thousand documented software metrics, but there is few
information on how those metrics relate to each other. Most of them simply have different
names but give similar information [FN99]. The major challenge is to discover how important

16 Conflictive, negative influence, if one attribute is high it makes the other one low.
17 Support, positive influence, if one attribute is high it makes the other one high too.

Proc. OpenCert 2011 6 / 17


ECEASST

the information given by those metrics is, if the calculation effort pays off, how to interpret their
values and find correlations18 to assess the quality attributes of an OSP.

3.1.1 Lines of Code

A line of code is any line of program text that is not a comment or blank line, regardless of
the number of statements or fragments of statements in the line. This specifically includes
all lines containing program headers, declarations, and executable and non-executable state-
ments [CDS86].

3.1.2 Cyclomatic Complexity

The measurement of cyclomatic complexity [McC76] was designed to indicate a program’s testa-
bility and maintainability. It is the classical graph theory cyclomatic number, indicating the
number of regions in a graph. As applied to software, it directly measures the number of linearly
independent paths through a program source code.

3.1.3 Fan-In and Fan-Out

Fan-in and fan-out are perhaps the most common design structure metrics, which are based on
the ideas of coupling [YC79]:

• Fan-in is a count of the modules that call a given module

• Fan-out is a count of modules that are called by a given module

In general, modules with a large fan-in are relatively small and simple, and are usually located
at the lower layers of the design structure. In contrast, modules that are large and complex
are likely to have a small fan-in. There is also the theory that high fan-outs represent a high
number of method calls and thus are undesirable, while high fan-ins represent a high level of
reuse [WLCR07].

3.1.4 Object-Oriented Metrics

Classes and methods are the basic constructs of OO technology. The amount of function provided
by an OO software can be estimated based on the number of identified classes and methods or
their variants. Therefore, it is natural that the basic OO metrics are related to classes, methods
and their size.

The pertinent question therefore is what should the optimum value be for OO metrics. There
may not be one correct answer, but based on his experience in OO software development, Lorenz
proposed eleven metrics as OO design metrics called rules of thumb [LK94].

• Average Method Size (LOC): Should be less than 8 LOC for Smalltalk and 24 LOC for
C++

18 Correlation is probably the most widely used statistical method to assess relationships among observational
data [Kan02].

7 / 17 Volume 48 (2011)


The Role of Best Practices to Appraise Open Source Software

• Average Number of Methods per Class: Should be less than 20. Bigger averages indicate
too much responsibility in too few classes.

• Average Number of Instance Variables per Class: Should be less than 6. More instance
variables indicate that one class is doing more than it should.

• Class Hierarchy Nesting Level (Depth of Inheritance Tree, DIT): Should be less than 6,
starting from the framework classes or the root class.

• Number of Class/Class Relationships in Each Subsystem: Should be relatively high. This
item relates to high cohesion of classes in the same subsystem. If one or more classes in
a subsystem don’t interact with many of the other classes, they might be better placed in
another subsystem.

• Average Number of Comment Lines (per Method): Should be greater than 1.

4 Best Practices in OSSP development

Open source communities have a tendency to create coding rules, i.e., principles governing the
conduct of programmers and serving as a basis of measure or judgment. It is a natural and
evolutive process for people surviving in an open space. We can call these natural rules: best
practices. Best practices are methods thought as being the best way for achieving something;
they are spread through the community and everybody does it that way. It is obvious that when
a developer follows well established principles and best practices, the project maintainability
is increased. Consequently, project new comers will find it easier to understand the project
code [Dro02]. But there is more than that, following best practices discourage:

• Poor performance (due to bad patterns)

• Poor error checking (defensive programming)

• Inconsistent exception handling / Maintainability (long-term quality)

To attain the same benefits, companies define standards, this is, something considered by an
authority as a basis of comparison and a normal requirement for quality; by other words, an
approved behavior model for their workers. However, these principles are defined by the few
people on top and then spread down on the pyramid. Many times, those rules are not well
thought by the project leaders and that can block the progress. In the other way around, the
apparent chaos of open source also requires these rules, but contrary to the companies coding
standards, which work in a top down way, best practices happen in a bottom up and distributive
mode, everybody can try different ways of doing things, but the ones with better results are most
likely to be copied.

A simple metaphor exposes the difference:

Companies use traffic lights where open source communities use roundabouts.

Proc. OpenCert 2011 8 / 17


ECEASST

Both, the strict company standards (traffic lights) and the OSP best practices (roundabouts)
are ways to regulate intersections. The result of traffic lights is easier to predict, however that
regulation system does not depend much on the drivers skills; because it is so restrictive it will
happen often to find a driver stooped alone at the crossroads waiting for a green light, losing
precious time. On the other hand, the roundabout system is a less restricted system and relies
much more on the quality of the drivers, but it open the possibility to a much more efficient way
to avoid a traffic jam.

There is little work done concerned with measuring coding standards by automatic analyz-
ing source code. A plausible explanation for that is the fact that best practices are not a set of
immutable rules, they are a continuous evolution and improvement of development methodolo-
gies. Communities are constantly creating rules and best practices, even without noticing it. It
is not possible to write down a list of best practices without some ambiguities. Nevertheless, it
is still possible to use source code metrics and, by analyzing their values, to find weather some
methodological approaches were taken into account during the project development process.

At first glance, best practices metrics are for classic metrics as natural as medicine is for sci-
ence. But, it is not the case. In fact, classic metrics, on their own, do not give much information
about a project. In many cases, best practices can be the key to understand what should be the
optimum value for a classic metric, for instance, to determine how many lines of code should a
ruby method have. Of course, those questions are subjective. However, by analyzing renowned
projects, developers opinions and so on, it is possible to find out a best practice that gives a plau-
sible answer to the search for the most favorable value. We believe that, actually, best practices
can give a meaning to metrics.

Software projects can benefit a lot from using best practices. But, what is a best practice after
all?

4.1 Best Practices in RoR Projects

Ruby is a dynamic, object oriented, open source programming language created by Yukihiro
Matsumoto and public released in 1995. It has an elegant syntax that is natural to read and easy
to write. Ruby has drawn devoted coders worldwide. In 2006, Ruby achieved mass acceptance.
Moreover, the web framework Ruby on Rails is considered the biggest responsible for Ruby
popularity ( tens of thousands of Rails applications are online).

Ruby and Ruby on Rails community members are, in general, addicted to best practices.
However, in reality, many of those best practices are studied development methodologies. For
instance, the majority of Ruby on Rails book authors speak about automated tests, written using
specific DSLs, like Cucumber or Rspec. It is also common to associate Ruby on Rails with
Behaviour Driven Development (BDD) and Agile methodologies.

Because of all this, the ruby community has great potential to be a starting point to understand
the role of best practices, its benefits and how to measure it. In fact, there is already some work
done.

The web site Rails Best Practices19, works in similar way to a web forum and its objective is

19 http://www.rails-bestpractices.com/ is a web site created by Richard Huang, it was inspired by Wen-Tien Chang
talk given at Kungfu RailsConf 2009 in Shanghai. Slides can be found here http://www.slideshare.net/ihower/
rails-best-practices.

9 / 17 Volume 48 (2011)

http://www.rails-bestpractices.com/
http://www.slideshare.net/ihower/rails-best-practices
http://www.slideshare.net/ihower/rails-best-practices


The Role of Best Practices to Appraise Open Source Software

to engage developers to discuss which practices should be considered best practices to follow,
when building a RoR web application. The community involved with this web site is committed
to build a gem20 that produces a report about a given project.

4.2 Ruby Best Practices Examples

But what is is a best practice after all? Best practices can be related to code formatting:

• Use two spaces to indent code and no tabs, it is a matter of taste but every worthy ruby
developer do it that way.

• Remove trailing whitespace, trailing whitespace makes noises in version control systems.

Can be related to syntax:

• Avoid return where not required.

• Suppress superfluous parentheses, when calling methods, but keep them when calling
f̈unctions(̈when you use the return value in the same line).

Can be related to naming:

• Use snake case for methods.

• Other method naming conventions: Use map over collect, find over detect, find all over
select, size over length.

And can also be specific to a framework, rails best practices:

• Law of Demeter, A model should only talk to its immediate association.

• Move code into controller, according to MVC architecture, there should not be logic codes
in view.

• Isolate seed data, do not insert seed data during migrations, a rake task21 can be used
instead.

• Do not use default route, When using a RESTful design. The default RoR routes can cause
a security problems.

• Replace Complex Creation with Factory Method, Sometimes you will build a complex
model with params, current user and other logics in controller, but it makes your controller
too big, you should move them into model with a factory method.

20 Ruby Libraries are called gems. Ruby gems can be easily managed using rubygems (rubygems is for Ruby as
aptitude is for Debian or cpan for perl).
21 Rakefiles work in similar way to Makefiles but are written in ruby. It is a simple way to write code to automate
repetitive tasks.

Proc. OpenCert 2011 10 / 17


ECEASST

5 Assessing Ruby on Rails Projects

After deciding that some procedure is a best practice, it would be handy to find a way to au-
tomatically verify whether that practice is being followed by the developers of a given project.
With that in mind, an open source ruby gem was created (by the authors of Rails best practices
web site) with the objective of automatically producing a report that shows where, in the source
code, a project is failing to obey to consensual practices. At the moment of writing, this gem can
check for 28 kinds of best practices (from the 70 described in that web site).

However, one of the first things that we have noticed when we have applied this gem to OSS
projects, is that the biggest and most renown projects have much more errors than the smaller and
unknown projects. This nonsense has a simple interpretation. Small projects (like the majority
of RoR projects found in github) are simple software packages, often developed by a single user.
These applications are so simple that many times the code is almost entirely created by RoR code
generators. Usually, when code is not written by humans, it has few mistakes concerning those
recommendations.

5.1 First Study

Having taken the above into account, we decided to run the rails best practices gem on similar
RoR (Ruby on Rails) projects. Seven time tracking or project management open source systems
were chosen. After running the gem and counting not best practices (NBPs)22 occurrences, the
results in table 1 were obtained:

Rubytime seems to have the best results and Clockingit the worst. The fact is that very good
user reviews can be found about Rubytime. However, Tracks obtained an unexpected high score,
since it has been very sparsely maintained (old code has higher probability of not following the
current best practices). As explained before, those values are not really measuring if a project
follows best practices but instead measuring when it fails. This should also be taken into consid-
eration.

The most evident problem here is that best practices are not being weighted and neither the
size of the project considered. For instance, if the developers have the habit of leaving trailing
white spaces, the occurrences of this will obviously be related to the size of the project. On the
other hand, it is a best practice to remove the default route generated by rails, independently of
the project size this is true or false, there is no way to leave the route two times. So, if developers
do not take into account those two best practices, when the project grows, the number of trailing
spaces will increase and the results will show more NBPs, but the other one will always be only
one NBP. Because of that we can get twisted results.

To avoid this, the projects were sized. The size attribute is based on the quantity of models and
controllers in the project. After that, we divided the values previously obtained by the project
size. By doing that, a new set of results emerge, see table 2.

Those results are much more likely to be helpful in terms of understanding if a project is or
is not following best practices. The numbers reflect both the community reviews and our own

22 In fact, Rails best practices gem does not find best practices in the source code. It does the opposite, it discovers
when the code is not written according to a best practice, in other words, it identifies bad practices (similar to the
detection of code smells). We decided to name those occurrences NBP.

11 / 17 Volume 48 (2011)


The Role of Best Practices to Appraise Open Source Software

Rails Best Practices Results
Best Practice A B C D F G H
aAdd model virtual attribute - 2 7 - - 5 4
Always add db index - - - 43 - - 51
Isolate seed data - - - - - 79 17
Law of demeter 20 38 45 6 30 164 85
Move code into controller - - - - 2 - 4
Move code into model - 26 - 7 1 3 19
Move model logic into model - - 76 11 11 98 100
Move finder to named scope - 4 9 2 4 25 -
Needless deep nesting - - - 1 - - -
Not use default root - 1 1 - 1 1 1
Notes use query attribute - 2 - - - - -
Overuse route customizations - - 2 4 - 2 2
Remove trailing whitespace 68 57 126 110 330 316 100
Use factory method - 15 9 5 1 8 19
Replace instance var with local var 13 - 70 239 142 31 100
Use before filter - 7 9 8 8 19 23
Wrong email content type - 3 - - - - -
Use query attribute - - 11 5 8 29 6
Use say with time in migrations - - 24 - 10 23 56
Use scopes access - - - - - - 04
User model association - - 12 9 - 1 21
Keep finders on their own model 8 4 1 - 11 - -
Total 109 156 402 450 559 834 864

A: Rubytime , B: Notes , C: Tracks , D: Handy Ant , F: Retrospectiva , G: Redmine , H:
Clockingit
Figures shown represent the number of times a project do not follow a best practice; is
expected that smaller the number, better the project.

Table 1: Results obtained by running the best practices analyzer gem on the 7 Open Source
Projects chosen (data produced on April, 2011).

Rails Best Practices Results
Best Practice A B C D F G H
Total 109 156 402 450 559 834 864
Total Without Trailing Whitespace 41 99 276 340 229 518 764
Project Size 12 11 11 29 26 58 31
Total / Project Size 9 14 37 16 23 15 28
Total Without Trailing Whitespace / Project Size 3 9 25 12 9 9 25

A: Rubytime. , B: Notes , C: Tracks , D: Handy Ant , F: Retrospectiva , G: Redmine , H: Clockingit

Table 2: Results obtained by running the best practices analyzer gem on the 7 Open Source
Projects chosen, after normalization (data produced on April, 2011).

Proc. OpenCert 2011 12 / 17


ECEASST

estimates much more.

5.2 Second Study

After the first study reported above, we felt that it was time to make a bigger one; we should
repeat the experiment over a larger sample. In addition, there was the need to define an objective
quality metric to compare the metrics results with. As a second target for this new phase, it was
decide to find an objective quality rate (a reputation ranking) for each project in the sample, to
be possible to compare with the results computed for the best practices metrics.

For the second study, we selected 40 Ruby on Rails projects hosted in github and decided
to consider the number of followers23 and forks24, that each project has on github, as a project
reputation metric.

The objective was to prove that a negative correlation exists, between the NBPs of a project
and its followers and forks.

The previous study has shown us the need to apply different weights to each NBP. By diving
the NBPs by the project size, in the first study, seemed like we got better results. However, not
all NBPs depend on the project size. Therefore, we altered the rails best practices gem to make
it possible to know how much project files were analyzed by each rails best practice checker.

Basically, after collecting the GitHub URLs for each project, we followed the next steps:

• Retrieve GitHub information, in this step we get the followers and forks(and more info
that might be used in further analyses).

• Download the project repository.

• Run rails best practices gems, at this point, we get the non weighted NBPs and files given
by each one of the 29 checkers.

• Calculate the Weighted Global NBPs, the evaluation algorithm consists in dividing the
value returned by each NBP checker by the number of files checked and, then sum it.

Next, an excerpt of the obtained table is shown:

23 Number of users that want to receive notifications about the project.
24 Number of people that forked the project. This means that either they want to contribute to the project or create a
derived project

13 / 17 Volume 48 (2011)


The Role of Best Practices to Appraise Open Source Software

Rails Best Practices Results
Projects Forks Watchers C1 C1 F. W. C1 C2 C2 F. W. C1 ... T. NBPs W. T. NBPs
Rails Admin 30 2478 0 141 0 0 37 0 ... 50 739
Rubytime 12 82 24 161 149 0 134 0 ... 146 1334
Redmine 30 1781 49 996 49 1 362 2 ... 884 1402
BrowserCMS 30 784 11 234 47 0 216 0 ... 268 1510
Tracks 17 87 46 842 54 15 271 55 ... 569 2810
... ... ... ... ... ... ... ... ... ... ... ...

C(x): The rails best practices gem has 29 checkers(when this study was carried), each one tries to find occurrences of a
different nbp in the project.
C(x) Files: The number of files in the project, where it tried to find nbps (for instance, some checkers may only be concerned
with html files, some other checker nbps my only occur in model files, etc)
W. C(x): Weighted C(x) = C(X) / C(x)Files * 1000 (A really small number is added to each variable to avoid divisions by
zero).

Table 3: Results obtained by running the best practices analyzer gem on the 40 Open Source
Projects chosen, from GitHub (data produced on April, 2011). The full table can be found at
www.bestpracticesstudy.gorgeouscode.com

5.3 Results

After building a table containing the results for the 40 projects, we easily found correlations be-
tween columns. We discovered that the average correlation index, for the weighted C(x) columns,
is -0.2. Only three of the weighted C(x) columns do not have negative correlation. This is quite
good, considering the fact that there is an explanation for it. Those three checkers (without nega-
tive correlation) aimed at finding NBPs that almost non of the projects were committing, so there
is no correlation.

The most important results are in the next table:

Correlations
Total NBPs Total Weighted NBPs

Forks 0.14 -0.53
Watchers 0.07 -0.40

Table 4: The full table can be found at www.bestpracticesstudy.gorgeouscode.com

These correlation indexes show that if we just count the nbps there is no relation between them
and the number of forks and watchers. Nevertheless, the Weighted NBPs have a quite perceptible
negative correlation both with watchers and forks.

Observing that Table, it is possible to notice that the forks correlation is bigger. We believe
that if it happens, it is because forking a project shows intensions of digging into the code and,
of course, it easier to understand others code when it follows good practices.

As future work, we are considering more correlations with other variables that are already
available, but we haven’t used yet. The most relevant ones: the number of commiters, starting
date of the project, last commit data, and total number of commits. We believe that those vari-
ables can strongly be related with the forks, watchers and of course, in the end, the quality of the
project.

Proc. OpenCert 2011 14 / 17


ECEASST

6 Conclusion

Nowadays, thousands of open-source software packages can be found and freely downloaded
online. github is a web-based hosting service for projects that use the Git revision control system.
It hosts more than 1 million open-source projects.

There is little work done concerning the measurement of coding standards by automatic an-
alyzing source code. We strongly believe that some research and development should be done
in this direction. Along the paper we gave arguments in order to make evident that it is worth-
while to detect on the source code that the author follows the best practices recommended by the
respective community.

In this particular context, Ruby Community, there is already some work done. The reports
generated by the existent source code analyzers, can spot the occurrences of bad smells but this
is not enough. There is the need to interpret those results to end up with a high level quality
statement. By comparing some projects, it was possible to start understanding how to interpret
those values. For instance, the size of the project should be taken into consideration (it is intuitive
that a project with 10 lines of code and 10 errors is worse than a project with 1000 and 20 errors).
From the study we carried out and, described in the paper, we also have learned that each best
practice has a different importance level — 1 error that affects security or performance is, for
sure, worse than 10 errors related to indentation; or 10 errors related to naming conventions are
worse than the indentation mistakes).

We do believe that by analyzing a massive amount of open source projects, it is possible to
create a new set of metrics capable of quantify the standards followed by a given project, judge
the impact of the metrics evaluated and consequently assess its level of maintainability.

The future work is to develop a system capable of automatically produce quality reports about
a given OSSP combining traditional SW metrics with best practices analysis.

This system will enable users to make better choices about what software to use and help
developers to improve their software.

Acknowledgment

This work is funded by ERDF - European Regional Development Fund through the COMPETE
Programme (operational programme for competitiveness) and by National Funds through the
FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technol-
ogy) within project FCOMP-01-0124-FEDER-010049.

Bibliography

[Bev99] N. Bevan. Quality in use: meeting user needs for quality. Journal of Systems and
Software 49(1):89–96, 1999.

[BHK06] D. Binkley, M. Harman, J. Krinke. Animated visualisation of static analysis: Char-
acterising, explaining and exploiting the approximate nature of static analysis. In

15 / 17 Volume 48 (2011)


The Role of Best Practices to Appraise Open Source Software

6th International Workshop on Source Code Analysis and Manipulation (SCAM 06).
Philadelphia, Pennsylvania, USA. Pp. 43–52. 2006.

[CAH03] K. Crowston, H. Annabi, J. Howison. Defining open source software project suc-
cess. 2003.

[CDS86] S. Conte, H. Dunsmore, Y. Shen. Software engineering metrics and models.
Benjamin-Cummings Publishing Co., Inc. Redwood City, CA, USA, 1986.

[CM07] A. Capiluppi, M. Michlmayr. From the Cathedral to the Bazaar: An Empirical Study
of the Lifecycle of Volunteer Community Projects. INTERNATIONAL FEDERA-
TION FOR INFORMATION PROCESSING -PUBLICATIONS- IFIP 234/2007:31–
44, 2007.

[Coo08] P. Cooper. GitHub Officially Launches: Git Hosting A-Go-Go! 2008.
http://www.rubyinside.com/github-officially-launches-git-hosting-a-go-go-853.
html

[CP09] L. Chung, J. Prado Leite. On non-functional requirements in software engineering.
Conceptual Modeling: Foundations and Applications, pp. 363–379, 2009.

[Dro02] R. Dromey. A model for software product quality. Software Engineering, IEEE
Transactions on 21(2):146–162, 2002.

[FN99] N. Fenton, M. Neil. Software metrics: successes, failures and new directions. Jour-
nal of Systems and Software 47(2-3):149–157, 1999.

[GKS+07] G. Gousios, V. Karakoidas, K. Stroggylos, P. Louridas, V. Vlachos, D. Spinellis.
Software quality assessment of open source software. In Proceedings of the 11th
Panhellenic Conference on Informatics. Athens University of Economics and Busi-
ness, Patission 76, Athens, Greece, 2007.

[Hah02] R. Hahn. Government policy toward open source software. Brookings Institution
Press, Washington, DC, USA, 2002.

[HS02] T. Halloran, W. Scherlis. High quality and open source software practices. 2002.

[Kan02] S. Kan. Metrics and models in software quality engineering. Addison-Wesley Long-
man Publishing Co., Inc. Boston, MA, USA, 2002.

[LK94] M. Lorenz, J. Kidd. Object-oriented software metrics: a practical guide. Prentice
Hall, 1994.

[MA07] A. Marchenko, P. Abrahamsson. Predicting software defect density: a case study
on automated static code analysis. Agile Processes in Software Engineering and
Extreme Programming 4536/2007:137–140, 2007.

[McC76] T. McCabe. A complexity measure. IEEE Transactions on software Engineering,
pp. 308–320, 1976.

Proc. OpenCert 2011 16 / 17

http://www.rubyinside.com/github-officially-launches-git-hosting-a-go-go-853.html
http://www.rubyinside.com/github-officially-launches-git-hosting-a-go-go-853.html


ECEASST

[Ray00] E. Raymond. The Cathedral and the Bazaar. KNOWLEDGE, TECHNOLOGY AND
POLICY 12:1–35, 2000.

[WLCR07] Y. Wang, Q. Li, P. Chen, C. Ren. Dynamic fan-in and fan-out metrics for program
comprehension. Journal of Shanghai University (English Edition) 11(5):474–479,
2007.

[YC79] E. Yourdon, L. Constantine. Structured design. Fundamentals of a discipline of com-
puter program and systems design. Prentice Hall, 1979.

17 / 17 Volume 48 (2011)


	Introduction
	Open Source
	Open Source Software Development
	Open Source Project Hosting Platforms

	Assessing Open Source Software
	Classic Software Metrics
	Lines of Code
	Cyclomatic Complexity
	Fan-In and Fan-Out
	Object-Oriented Metrics


	Best Practices in OSSP development
	Best Practices in RoR Projects
	Ruby Best Practices Examples

	Assessing Ruby on Rails Projects
	First Study
	Second Study
	Results

	Conclusion