Sample Paper - Manuscript Preparation 23 J. mt. area res., Vol. 2, 2017 Journal of Mountain Area Research AN EFFICIENT AND COST-EFFECTIVE MATHEMATICAL MODEL TO ANALYZE BIG DATA Ubaidullah*, W. Akram, I. A. Memon Sukkur Institute of Business Administration, Sindh Pakistan. ABSTRACT An efficient and cost-effective piecewise mathematical model is presented to represent a descriptive huge data mathematically. The techniques of function lines as decision boundaries are applied to incorporate the big data of the organization into slope intercept form. Which may be very helpful for a better understanding of discrete data to obtain sustainable and accurate results. Based on the boundaries limitation results of the collected data of the Federal Board of Revenue, the income tax against the income is studied. And finally the reliability of piecewise function to optimize the role of strategic management in any organization is investigated. The results showed that, the slope rate measured in the boundaries of income in percentage or increased slope rate is in good agreement with that predicted by the organization in descriptive form. KEYWORDS: Big data; Mathematical Model; Boundaries; Tools; Computational technique * Corresponding author: (E-mail: ubaidullah@iba-suk.edu.pk) 1. INTRODUCTION The software engineering and computer science characterized big data as the large data set that become hard to do work. Due to the size and complexity of big data it is difficult to obtain the required results using on-hand database management devices or traditional data processing techniques. Scientist, business executives and technocrats hypothesize that the phenomena of big data is difficult to explain due to the unequal growth rate and huge volume. The term big is not big in volume if we go back ten years ago a hard derive of 30 MB was big but nowadays a 2TB derives are common. One historically important distinction between the collected data from dawn of civilization to 2003 is 5 exabytes, now we are creating 5 exabytes every two days. Today we are not only creating the size that is volume of data but we are also creating variety of data with a faster rate, this is called three Vs of Big Data, Volume, Variety and Velocity. In the era 1440s and by 1500s, the printed data were 10 Vol. 2, 2017 http://journal.kiu.edu.pk/index.php/JMAR Full length article Ubaidullah et al., J. mt. area res. 02 (2017) 23-28 24 J. mt. area res., Vol. 2, 2017 million texts including 2 million books, while in the 6th and 7th centuries only 120 books were produced annually in Western Europe. The International Data Corporation stated that discovery and analysis of data economically with high velocity from very huge volumes of data by using new technologies and architecture designs is big data. Viktor Mayer-Schönberger and Kenneth Cukier described that big data is a way to take out new insights or make new forms of value in traditions that change markets, organizations etc. To collect, store and analyze the data sets from unstructured data we have need advance techniques, software’s and systems. In advance laboratories and advance engineering research centers big data provided significant advances. Whether we are in technology or business the term big data is absolutely different than the past things. Martin Hilbert and Priscilla Lopez stated that in 1996 digital data was one percent and by 2007 almost ninety four percent data was digital. Around 2000 each things became digital while in the 20th century the digital data was only texts and numbers. Andrew Whit stated that big data has the potential to open up stirring new opportunities in social research, it while it is difficult to access. Hall [1] stated that curiosity, litheness and motivation to learn by doing assortment and job experience. Lorenz [2] described that data is like an assembly of facts, but it is not necessarily the facts always truth. The weakly interpretation causes the wrong conclusions so we have needed to understand the big data. Jeanne Harris described that the importance to understand the mathematical reasoning and statistical models is not only need for technical experts but it is also need for managers to meet the challenges of big data. Harris [4] stated that sixty percent of respondent on a survey feel the need to develop new skills for their employees to translate big data into insight and business value. To reduce large raw data sets into small dimensions the topology technique is a flexible technique for different systems. Big Data has the potential not only to update research, but it has the potential also transform education [8]. Hamann [9] described the technique of data discretization for resembling the curve by line segments. Discretization technique is a first step making the data suitable for numerical assessment and execution on digital computers. McMaster in 1987 provided a scheme of data reduction of piecewise linear curves. Gar in 2011 stated that the industry analysis companies are not facing challenges only in volume but also in velocity and Variety. Big data is a data which is recorded from a data generating source. One challenge is the collection of required data from these sources without losing the exact required information. Another challenge is to automatically collect the right data from the data source. We have need also an information extraction method to take out the associated data from the data source and express it in an ordered form for analysis. Data analysis is also a challenge, so to overcome this challenge we need domain analysis scientist to create effective data base design. The piecewise defined function is a well- defined mathematical technique to formulate and interpret the big data. The Federal Board of Revenue is a supreme federal agency of Pakistan for auditing, enforcing and collecting revenue for the government of Pakistan. The data of collection federal taxes is a big data. In our paper we have used the piecewise function to formulate and interpret the collected data. We formulated the income tax slabs for salaried class in Pakistan for Financial Year 2014-15 into Ubaidullah et al., J. mt. area res. 02 (2017) 23-28 25 J. mt. area res., Vol. 2, 2017 piecewise defined form that is limitations. Our developed mathematical model is a cost effective and time efficient. 2. MATHEMATICAL AND GRAPHICAL REPRESENTATION OF DATA 2.1 Mathematical model The general mathematical form for n dimensional piecewise continuous and convex linear functions is RRf n : . Like ))(( 1  n RPP So that: bxaxf ba              .maxmin ),( If the function is convex and continuous then, )( 1  n RP So that: bxaxf ba             .max ),( Here bxa   . is a linear polynomial such that 𝑎 ≠ 0 𝑎𝑛𝑑 𝑎, 𝑏 ∈ 𝑅. The piecewise linear function effectively reduced the problem size and enhanced the computational efficiency. 2.2 Data Collection and Processing According to the Finance Act passed by the government of Pakistan, these below mentioned income tax rates will be followed for salaries in the year 2014-2015. Suppose 𝑥 represents the income and 𝑇(𝑥) represents the income tax. The tax slabs are as follows: S# Taxable Income Rate of Tax 1 Where the taxable income does not exceed Rs.400,000 0% 2 Where the taxable income exceed Rs.400,000 but does not exceed Rs.750,000 5% of the amount exceeding Rs.400,000 3 Where the taxable income exceed Rs.750,000 but does not exceed Rs.1,400,000 Rs.17,500+10% of the amount exceeding Rs.750,000 4 Where the taxable income exceed Rs.1,400,000 but does not exceed Rs.1,500,000 Rs.82,500 +12.5% of the amount exceeding Rs.1,400,000 5 Where the taxable income exceed Rs.1,500,000 but does not exceed Rs.1,800,000 Rs.95,000+15% of the amount exceeding Rs.1,500,000 7 Where the taxable income exceed Rs.1,800,000 but does not exceed Rs.2,500,000 Rs.140,000+17.5% of the amount exceeding Rs.1,800,000 8 Where the taxable income exceed Rs.2,500,000 but does not exceed Rs.3,000,000 Rs.262,000+20% of the amount exceeding Rs.2,500,000 Ubaidullah et al., J. mt. area res. 02 (2017) 23-28 26 J. mt. area res., Vol. 2, 2017 9 Where the taxable income exceed Rs.3,000,000 but does not exceed Rs.3,500,000 Rs.362,500+22.5% of the amount exceeding Rs.2,500,000 10 Where the taxable income exceed Rs.3,500,000 but does not exceed Rs.4,000,000 Rs.475,000+25% of the amount exceeding Rs.3,500,000 11 Where the taxable income exceed Rs.4,000,000 but does not exceed Rs.7,000,000 Rs.600,000+27.5% of the amount exceeding Rs.4,000,000 12 Where the taxable income exceed Rs.7,000,000 Rs.1,425,000+30% of the amount exceeding Rs.7,000,000  The rate of income tax is zero 0% if the taxable salary income does not exceed Rs. 400,000 i.e. 0)( xT .  The rate of income tax is 5% if the taxable salary income exceed Rs. 400,000 but does not exceed Rs 750,000 i.e. 2000005.0)( )000,4000(05.0)(   xxT xxT .  The rate of income tax is 10% if the taxable salary income exceed Rs. 750,000 but does not exceed Rs. 1,400,000  i.e. 5750010.0)( )000,750(10.017500)( 20000)000,750(05.0)(    xxT xxT xT  The rate of income tax is 12.5% if the taxable salary income exceed Rs. 1,400,000 but does not exceed Rs. 1,500,000  i.e. 92500125.0)( )000,400,1(125.082500)(   xxT xxT  The rate of income tax is 15% if the taxable salary income exceed Rs. 1,500,000 but does not exceed Rs. 1,800,000  i.e. 13000015.0)( )000,500,1(15.095000)(   xxT xxT  The rate of income tax is 17.5% if the taxable salary income exceed Rs. 1,800,000 but does not exceed Rs. 2,500,000   i.e. 000,175175.0)( )000,800,1(175.0140000)(   xxT xxT  The rate of income tax is 20% if the taxable salary income exceed Rs. 2,500,000 but does not exceed Rs. 3,000,000  i.e. 500,2372.0)( )000,500,2(2.0262500)(   xxT xxT  The rate of income tax is 22.5% if the taxable salary income exceed Rs. 3,000,000 but does not exceed Rs. 3,500,000  i.e. 500,312225.0)( )000,000,3(225.0362500)(   xxT xxT  The rate of income tax is 25% if the taxable salary income exceed Rs. 3,500,000 but does not exceed Rs. 4,000,000  i.e. 000,40025.0)( )000,500,3(25.0475000)(   xxT xxT  The rate of income tax is 27.5% if the taxable salary income exceed Rs. 4,000,000 but does not exceed Rs. 7,000,000 Ubaidullah et al., J. mt. area res. 02 (2017) 23-28 27 J. mt. area res., Vol. 2, 2017  i.e. 000,500275.0)( )000,000,4(275.0000,600)(   xxT xxT  The rate of income tax is 30% if the taxable salary income exceed Rs. 7,000,000  i.e. 000,6753.0)( )000,000,7(3.0000,425,1)(   xxT xxT                                    000,000,7000,6753.0 000,000,7000,000,4000,500275.0 000,000,4000,500,3000,40025.0 000,500,3000,000,3500,312225.0 000,000,3000,500,2500,2372.0 000,500,2000,800,1000,175175.0 000,800,1000,500,1000,13015.0 000,500,1000,400,1500,92125.0 000,400,1000,750500,571.0 000,750000,400000,2005.0 000,40000 )( xx xx xx xx xx xx xx xx xx xx x xT 2.3 Graphical Representation Figure 1. Represents the Income in million on x-axis and Income tax in million on y-axis in standard form. Figure 2. Represents the Income in million on x-axis and Income tax in million on y-axis in scientific notation. 3. RESULTS AND DISCUSSION From the derived mathematical model and the graphical representation we can conclude that the data we have collected of the fiscal year 2014-2015 of federal budget of Pakistan is big in term of volume. To manage, share, analyze and visualize the data in a timeframe it is difficult without advanced tools, software, and systems. The used mathematical model summarized the big data into a small form such that we can calculate with a faster rate easily and efficiently. The income tax depends on the income so we have taken income on x-axis and income tax on y-axis. Scaling on the axis is as; on x-axis income is in millions and on y-axis income tax is in hundred thousand. The graph shows as income increase the income tax is also increase. Figure 1and Figure 2 indicate the increase of income tax due to increase of income. If the income of a pair is 750,000 the income tax is 17500. This showed the reliability of the piecewise linear mathematical model. From the above Model of T(x), the slope of the intervals are as: 0 1 2 3 4 5 6 7 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Income Tax of Federal Budget 2013-2014 Income I n c o m e T a x In c o m e T a x Income Ubaidullah et al., J. mt. area res. 02 (2017) 23-28 28 J. mt. area res., Vol. 2, 2017 0, 0.05, 0.10, 0.125, 0.15, 0.175, 0.2, 0.225, 0.25, 0.27 and 0.3. These are the mathematical indicators which are efficient and cost effective to analyze and interpret the big data into small one. These indicator indicates that as income increase the income tax is also increase. 4. CONCLUSION We presented a piecewise mathematical model which converts a descriptive data into a single model based on the linear coefficients, assigned variables and tax slab’s percentage into a single model. We used the high level language software ‘MATLAB’ that is able to reliably detect and sharply the tax slab of the tax payer. This software also accurately calculate the exact amount of the individual taxpayer. The problem here is to find the slab percentages that appears in the acquired data. Finally we did optimize our collected data. References [1] D. Lazer, A. Pentland, L. Adamic, S. Aral, A-L. Barabási, D. Brewer. Computational Social Science”. Science: 323 (2009), 721-723. [2] S. Shvetank, H. Andrew, C. Jaime. "Good Data Won't Guarantee Good Decisions”. Harvard Business Review, HBR.org. Retrieved (2012). [3] V. Mayer- Schönberger & K. Cukier. Big Data: A Revolution that Will Transform How We Live, Work, and Think”, (2013) New York, Houghton Mifflin Harcourt Publishing Company. [4] J. Harris. Data is useless without the skills to analyze it, (2012) HBR Blog Network. [5] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, & A. H. Byers (2011). [6] “Big data: The next frontier for innovation, competition, and productivity”. McKinsey Global Institute. [7] D. Raywood. Big data analyst shortage is a challenge for the UK. SC Magazine, (2012). [8] CCC, Advancing Personalized Education. Computing Community Consortium”. Spring 2011. [9] B. Hamann, J. L. Chen. "Data point selection for piecewise linear curve approximation". Computer Aided Geometric Design 11 (1994). [10] M. H. Lin, J. G. Carlsson, D. Ge, J. Shi and J. F. Tsai, A Review of Piecewise Linearization Method. Mathematical Problems in Engineering (2013). [11] K. Holmberg. Solving the Staircase Cost Facility Location Problem with Decomposition and Piecewise Linearization. European Journal of Operational Research, 75(1994) 41-61. [12] A. B. Keha, I. R. De Farias, and G. L. Nemhauser, Models for Representing Piecewise Linear Cost Function”. Operation Rsearch Letters, 32 (2004) 44-48. [13] V. Ford and A. Siraj, Clustering of Smart Meter Data for Disaggregation, In Proc. IEEE Global Conference on Signal and Information Processing (Global SIP),Austin, TX (2013). [14] www.fbr.gov.pk [15] W. Huang, P. Eades, S. H. Hong, C. C. Lin. Improving multiple aesthetics produces better graph drawings. J Vis Lang Comput 24 (2013) 262- 272. [16] M. J. Baker, S. G. Eick. Space-filling Software Visualization. Journal of Visual Languages & Computing 6(1995)119-133. This work is licensed under a Creative Commons Attribution 4.0 International License. http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1 http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1 http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1 http://www.fbr.gov.pk/ http://www.journalofbigdata.com/sfx_links?ui=s40537-015-0022-3&bibl=B3 http://www.journalofbigdata.com/sfx_links?ui=s40537-015-0022-3&bibl=B10 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/