RATIO MATHEMATICA ISSUE N. 30 (2016) pp. 45-58 ISSN (print): 1592-7415 ISSN (online): 2282-8214 Dealing with randomness and vagueness in business and management sciences: the fuzzy-probabilistic approach as a tool for the study of statistical relationships between imprecise variables Fabrizio Maturo Department of Management and Business Administration University G. d’Annunzio, Chieti - Pescara f.maturo@unich.it Abstract In practical applications relating to business and management sciences, there are many variables that, for their own nature, are better described by a pair of ordered values (i.e. financial data). By summarizing this mea- surement with a single value, there is a loss of information; thus, in these situations, data are better described by interval values rather than by single values. Interval arithmetic studies and analyzes this type of imprecision; however, if the intervals has no sharp boundaries, fuzzy set theory is the most suitable instrument. Moreover, fuzzy regression models are able to overcome some typical limitation of classical regression because they do not need the same strong assumptions. In this paper, we present a review of the main methods introduced in the literature on this topic and introduce some recent developments regarding the concept of randomness in fuzzy re- gression. Keywords: fuzzy data; fuzzy regression; fuzzy random variable; tools for business and management sciences 2010 AMS subject classifications: 62J05; 62J86; 03B52; 62A86; 97M10 doi: 10.23755/rm.v30i1.8 45 Fabrizio Maturo 1 Introduction Regression analysis offers a possible solution to study the dependence between two sets of variables. Standard classical statistical linear regressions take the form [27]: yi = b0 + b1xi1 + b2xi2 + ... + bjxij + .... + bPxiP + ui (1) where: • i=1,.....,N is the i-th observed unit; • j=1,...,P is the j-th observed variable; • yi is the dependent variable, observed on N units; • xij are the P independent variables observed on N units; • b0 is the crisp intercept and bj are the P crisp coefficients of the P variables; • ui are the random error terms that indicate the deviation of Y from the model; • yi, xij, bj, ui are all crisp values. In classical regression model it is assumed that: • E(ui) = 0 • σ2ui = σ 2 • σui,uj = 0 ∀ i,j with i 6= j In matrix form, the classical regression model is expressed as: y = Xβ + u (2) where y = (y1, y2, ..., yN)′, b = (b0, b1, b2, ..., bP)′, u = (u1, u2, ..., uN)′ are vectors and X is a matrix: X =   1 x11 . . . x1P 1 x21 . . x2P 1 . . . . . 1 . . . . . 1 xN1 . . . xNP   46 Dealing with randomness and vagueness in business and management sciences The aim of statistical regression is to find the set of unknown parameters so that the model gives is a good prediction of the dependent variable Y. The most widely used regression model is the Multiple Linear Regression Model (MLRM), as well as the Ordinary Least Squares (OLS) [12] is the most widespread estima- tion procedure. Under the OLS assumptions the estimates are BLUE (Best Linear Unbiased Estimator), as stated by the famous Gauss-Markov theorem. OLS is based on the minimization of the sum of squared deviations: min (y − Xb)′(y + Xb) (3) The optimal solution of the minimization problem is the following vector: b̂ = (X′X)−1X′y (4) The OLS model is comfortable but its assumptions are every restrictive. Sev- eral phenomena violate these assumptions causing biased and inefficient estima- tors [9]. In particular the assumptions E(u|X) ≈ N(0,σ2I) is very strong and rarely it is respected in real phenomena. Moreover in case of ”quasi” multi- collinearity (many highly correlated explanatory variables), although this does not violate OLS assumption there is a bad impact on the variance of B. In these cir- cumstance the OLS estimators are efficient and unbiased but have large variance, making estimation useless from a practical point of view. The effects of the quasi multi-collinearity are more evident when the sample size is small [1]. The generally proposed solution consists in removing correlated exploratory variables. This solution is unsatisfying in many applications fields where the user would keep all variables in the model. In general, we can observe that classical statistical regression has many useful applications but presents troubles in the following situations [26]: • Number of observations is inadequate (small data set); • Difficulties verifying distribution assumptions; • Vagueness in the relationship between input and output variables; • Ambiguity of events or degree to which they occur; • Inaccuracy and distortion introduced by linearization; Furthermore, there are many variables that, for their own nature, are better described by a pair of ordered values, like daily temperatures or financial data. By summarizing this measurement with a single value, there is a loss of information. In these situations data are better described by interval values rather than by single 47 Fabrizio Maturo values. Interval arithmetic studies and analyzes this type of imprecision; but if the intervals has no sharp boundaries, fuzzy set theory is the better tool. In particular fuzzy regression model are able to overcome some typical limitation of classical regression because they don’t need the same strong assumptions. Furthermore, some nuanced concepts that exist in economic and social sciences, need to be necessarily treated with linguistic variables, which for their nature, are imprecise concepts. 2 Fuzzy Linear Regression Models (FLR) There are two general ways, not mutually exclusive, to develop a fuzzy regres- sion model: • Models where the relationship of the variables is fuzzy; • Models where the variables themselves are fuzzy; Therefore fuzzy linear regression (FLR) can be classified in: • Partially fuzzy linear regression (PFLR), that can be further divided into: – PFLR with fuzzy parameters and crisp data; – PFLR with fuzzy data and crisp parameters; • Totally fuzzy linear regression (TFLR) where data and parameters are both fuzzy. Fuzzy Least Squares Regression is more close to the traditional statistical ap- proach. In fact, following the Least Squares line of thought [13], the aim is to minimize the distance between the observed and the estimated fuzzy data. This approach is referred as Fuzzy Least Squares Regression (FLSR). In case of one independent variable, the model take the form: ỹi = b0 + b1x̃i + ũi i=1,2,...,N (5) where: • i=1,.....,N is the i-th observed unit; • yi is the dependent fuzzy variable, observed on N units; • xi is the independent fuzzy variable, observed on N units; 48 Dealing with randomness and vagueness in business and management sciences Figure 1: Relation between output and input variables • b0 and b1 are the crisp intercept and the crisp regression coefficient; • ui are the fuzzy random error terms; From a graphical point of view [26] the relation between output and input variables can be represented as shown in Fig.1 In case of several independent variables, the model take the form: ỹi = b0 + b1x̃i1 + b2x̃i2 + ... + bjx̃ij + .... + bP x̃iP + ũi (6) where: • i=1,.....,N is the i-th observed unit; • j=1,...,P is the j-th observed variable; • yi is the dependent fuzzy variable, observed on N units; • xij are the P independent fuzzy variables, observed on N units; • b0 is the crisp intercept and bj are the P crisp regression coefficients mea- sured for the P fuzzy variables; • ui are the fuzzy random error terms; Limiting the reasoning to the first model, the error term can be expressed as follows: ũi = ỹi − b0 − b1x̃i i=1,2,...,N (7) 49 Fabrizio Maturo Therefore, from a least square perspective, the problem becomes as follows: min N∑ i=1 [ỹi − b0 − b1x̃i]2 i=1,2,...,N (8) Many criteria for measuring this distance have been proposed over the years; however, the most common are two methods: • The Diamond’s approach; • The compatibility measures approach. 2.1 FLSR using distance measures The Diamond’s approach is also known as fuzzy least squares regression using distance measures. This is the most close approach to the traditional statistical one. Following the Least Squares line of thought, the aim is to minimize the distance between the observed and the estimated fuzzy data, by minimizing the output quadratic error of the model. Since the model contains fuzzy numbers the minimization problem considers distances between fuzzy numbers [5, 17, 20, 15, 19, 18]. Diamond defined an L2-metric between two triangular fuzzy numbers; it mea- sures the distance between two fuzzy numbers based on their modes, left spread and right spread as follows d[(c1, l1,r1), (c2, l2,r2)] 2 = = (c1 − c2)2 + [(c1 − l1) − (c2 − l2)]2 + [(c1 + r1) − (c2 + r2)]2 (9) The methods of Diamond are rigorously justified by a projection-type theorem for cones on a Banach space containing the cone of triangular fuzzy numbers, where a Banach space is a normed vector space that is complete as a metric space under the metric d(x,y) = ||x−y|| induced by the norm [25]. In the case of crisp coefficients and fuzzy variables, the problem is the follow- ing: min N∑ i=1 d[ỹi ∗ − ỹi]2 i=1,2,...,N (10) where, ỹi ∗ = b0 + b1x̃i (11) 50 Dealing with randomness and vagueness in business and management sciences Figure 2: Compatibility measure therefore the optimization problem can be written as follows: min N∑ i=1 d[b0 + b1x̃i − ỹi]2 i=1,2,...,N (12) Using Diamond’s difference in this minimization problem, we can obtain the parameters. If the solutions exist, it is necessary to solve a system of six equa- tions in the same number of unknowns; of course, these equations arise from the derivatives being set equal to zero. 2.2 FLSR using compatibility measures The second type of fuzzy least squares regression model is based on Celmins’s compatibility measures [3]. A compatibility measure can defined by γ(Ã,B̃) = maxmin(µA(x),µB(x)) (13) This index is included in the interval [0,1] as shown in Fig. 2. A value of ”0” means that the membership functions of the fuzzy numbers A and B are mutually exclusive as shown in Fig. 3. A value of ”1” means that the membership functions A coincides with that one of B as shown in Fig.4. The basic idea is to maximize the overall compatibility between data and model. Thus, the objective may be reformulated in a minimization problem with the following objective function: 51 Fabrizio Maturo Figure 3: Zero compatibility Figure 4: Max compatibility 52 Dealing with randomness and vagueness in business and management sciences min N∑ i=1 [1 −γi]2 i=1,2,...,N (14) 3 Fuzzy regression models with fuzzy random vari- ables Recent studies have reintroduced the concept of Fuzzy Random Variables (FRVs) [24] firstly introduced by Puri and Ralescu [23]. The need for FRVs arises when the data are not only affected by imprecision but also by randomness [11]. Several papers deal with this topic that it is called fuzzy-probabilistic approach. It consists in explicitly taking into account randomness for estimating the regression parameters and assessing their statistical properties [22, 7, 8]. The membership function of a fuzzy number can be expressed, in term of spreads as: µ Ã (x) =   LAm−x Al for x ≤ Am, Al > 0 1 for x ≤ Am, Al = 0 Rx−Am Ar for x > Am, Ar > 0 0 for x > Am, Ar = 0 (15) where the functions L, R : <− > [0, 1] are convex upper semi-continuous functions so that L(0) = R(0) = 1 and L(z) = R(z) = 0, for all z ∈