Microsoft Word - B_19_R.doc HUNGARIAN JOURNAL OF INDUSTRIAL CHEMISTRY VESZPRÉM Vol. 38(2). pp. 149-153 (2010) MODELS AND METHODS TO DETECT SIMILARITY OF MANUFACTURING MACHINES O. HORNYÁK1 , G. SÁFRÁNY2 1Department of Information Engineering at University of Miskolc, HUNGARY E-mail: hornyak@ ait.iit.uni-miskolc.hu 2evoPRO Ltd., HUNGARY This paper focuses on detecting similarity of Manufacturing machines. It discusses the application of the Group Technology (GT) concept in the case of producing controller software for manufacturing machines. An overview on the modular machines is given. The paper presents a new concept for the development of the controllers’ software. The advantages of GT are presented. Two grouping approaches are detailed: the bottom-up approach and the top-down approach. A Genetic Algorithm Clustering is presented for the automatic grouping. The second part of the paper deals with the similairity of PLC code. Smith and Waterman local allignment search algorithm is used to detect similar code paths in PLC programs Keywords: Group Technology, Genetic Algorithm Clustering, Code similarity, Smith Waterman algorithm Introduction The design, assembly, starting up, monitoring and maintenance of industrial manufacturing devices have become very complex tasks over the time. According to the predictions for the future the complexity will not decrease. Simultaneously, the demands for and expectations of the quality of the products and processes have been growing. It is a well known fact that the cost price is an important factor in today’s competitive manufacturing. Another important factor is the lead-in time spent on the aforementioned activities. Engineers need to revise the activities required for producing a manufacturing design to be able to meet today’s needs and to take a step towards future manufacturing devices. The manufacturing machine producers need to develop the code used by their controllers. In case of process control the Programmable Logic Controller (PLC) based solutions are very widespread as the controlling algorithms are stored in a program. At the moment the engineering tools used for creating such controller codes do not utilize the opportunities of Group Technology (GT), i.e. the reorganization and grouping of the similar hardware and software modules, and creating developer databases. So the engineering tools offer only a static help in the software development process. Industrial experience shows the fact, that highly qualified engineers are required for working confidently with such tools. Modern software engineering tools have the capability to generate code automatically, see for example the recent releases of Developer Studio of Microsoft or Eclipse. Some Graphical User Interface (GUI) design tools or some database design tools can also generate code snippets or scripts. Software engineering companies tend to have coding standards to enforce similar coding style among their staff. Descriptions of algorithms are widespread through the internet; the implementations of these can be very similar. The code copied from external sources can be either authorised or plagiarised. Overview on modular machines The design of manufacturing machines – both in the hardware and software aspects – is undergoing a sensible change. The solutions which offer customizations required by the merchant at the designing phase receive the attention of engineers. ● The use of modular building techniques – including mechanical parts, standardized communication, standard software modules, and generation of software modules – have been increased. ● It is expedient to use automated tools to develop such a customer-specific controller code which very often includes motion control modules as well. Automated controller creation tools have the advantage of: ● increasing the precision of repeated tasks, ● reducing the dependency on engineers experience, ● improving the quality of the process and product documentation, ● reducing the preparation time of tendering, ● improving the flexibility of the preparation of the engineering work. 150 Automated generation of project variants Besides the modular design of the manufacturing machines there is an increased need for implementing methods and tools by which the automated generation of the controller algorithms and project databases can be executed, especially without expert knowledge of the system. It is not necessary for the involved engineers to have an overall knowledge about the controller details. However, a generic change request or a call for a tender may require prompt action to develop the required machines, including the controller’s software. Those machines are made to order or engineered to order, which means a high level of custom needs. To give a high service level for such customers there should be methods and tools for non expert engineers, marketing and sales staff which enable them: ● to combine the requested product (i.e. the generation of the product descriptor database, the automated generation of the controller code, the feasibility checks and availability of the required stock); ● to generate some software modules based on the existing project code library; ● to use an easy to understand, non technical workflow description language; ● to support the developer engineers in executing their repeated tasks; ● to support open source development. Group technology Group Technology is a management theory based on the principle that similar things should be done similarly. Products requiring similar operations, having similar attributes or needing a common set of resources are grouped into product families [3]. A key factor is the coding, which is an activity to create an – usually – alphanumeric key that describes the product and its attributes. The code must be exact and unambiguous. The literature identifies three kinds of codes: ● monocode (hierarchical code), where each character is restricted by the previous character, ● polycode (attribute), where each character has its own piece of information, ● hybrid (mixed code). Feasibility of Group Technology for manufacturing machine controllers The development process of creating a new manufacturing machine is a complex task. It is expedient to use the experience from former machine development projects. It is not an acceptable approach which does require the rewriting of an existing functional block of the controller software. Engineers need to split complex projects into small modules which can operate alone. Complex systems need to be built from these bricks according to predefined rules. Fig. 1 depicts a development office, where engineers develop modules, hardware descriptor tables, and units made from modules. Fig. 2 refers to the process in which new manufacture machine controller software can be automatically generated. Component database - units - hardware list - softwaremoduls Development Figure 1: Development process of a new project Orders Component database Integrator Machine-project database - units - hardware description table - softwaremodules First commissioning Connect plan Project description table A F E C D B Figure 2: Simplified workflow of a project realization Following the order A: the customer specific production machine description is presented. B: the technician doing the integration checks the components repository C for the required components in an automated way. Then a software tool generates the project database and the connected plan E for the machine controller. These passed to the installer staff F. After the integration there can be some feedback D. Note, that the suggested development process implements the modules only once. These form a module repository so that the aforementioned advantages can be achieved. When a new module is required to be developed the first step is to analyze the functionality. Experiments show that projects including similar modules have the same level of similarity. If the similar functions are implemented separately then it leads to redundant work and more opportunity of errors. 151 Engineers would need a tool which finds similar existing modules which implement the required functionality. The use of Group Technology can accomplish this to replace the human expertise. As manufacturing machines have been created for many years now, Group Technology can be suitable for analyzing existing projects. Very likely a project analysis including module similarity check will exhibit a high level of redundancy. In the following chapter the computer tools and mathematical models supporting these analyses will be detailed. Automated clustering methods Automated clustering methods should be computer algorithms with the capability of clustering the controller code of the machines. There can be two different approaches. The top-down approach uses the project description table and extracts the module information. Then it codes the functional components and tries to find similarity in the module/unit repository. This approach can be used during the tendering phase of a project, when the actual machine and its controller do not exist. The other approach is the bottom-up approach, where the controller code should be investigated. PLC coding can be made in various ways; some uses icons like Relay Ladder Diagram (RLD). Some codes have textual format like the Structured Text (ST) form. This is suitable for GT clustering. Groups Existing projects Groups Code Code Requirements Figure 3: Top-down and bottom-up approaches PLC code The most widespread PLC programming languages are as follows: CP – Contact Plan: a PLC programming language with the support of net-and relay switch plans. FBD – Function Block Diagram: FBD is another graphical programming language. The main concept is the data flow that starts from inputs and passes in block(s) and generate the output. LD – Ladder Diagram: Ladder Diagrams are specialized schematics commonly used to document industrial control logic systems. They are called “ladder” diagrams because they resemble a ladder, with two vertical rails (supply power) and as many "rungs" (horizontal lines) as there are control circuits to represent. IL – Instruction List: Instruction List programming is defined as part of the IEC 61131 standard. It uses very simple instructions similar to the original mnemonic programming languages developed for PLCs. ST – Structured Text: Structural Text is a high level PLC programming language such as Pascal. SFC – Sequential Function Charts: Sequential Function Charts have long been established as a means of designing and implementing sequential control systems utilising programmable controllers. The Programming Standard IEC 61131-3 includes a graphical implementation of SFC’s in its suite of programming languages. Mathematical formulization of the top-down problem Let’s assume a binary polycode coding system. Each bit of the k bit long code represents a feature. 0 means that the controller does not implement the feature, 1 means that the feature is realized. Let’s investigate n projects. The problem can be represented by an n x k matrix M, see for example (1). ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ 1100000000 0011100000 0000010111 0001100000 1000011011 0110000000 0001100000 0000111110 0000101111 0010100000 10 9 8 7 6 5 4 3 2 1 (1) Let pi represent the ith row. We can define a distance function by which we can evaluate how similar is one project to the other. For the sake of simplicity we assigned the decimal representation of the pi vector. Let dec(pi) denote this assignment. Note, that this calculation is easy to execute, however it may give wrong result. For example p1 = 0111111111 is closer to p2 = 1000000000 than to p3 = 0111111001. So clustering the distance function dec(pi) will not give us a correct result, however, if we order the pi vectors according to their decimal representation then this can be a starting point for further clustering. This method, however, will find it easily if two vectors are exactly the same: pi= pj if dec(pi)= dec(pj). 152 Genetic algorithm clustering The rows in matrix M need to be rearranged so that there will be block of 1s near in the p distance of the diagonal item. [4] details two measures of the goodness of the grouping, called Grouping Efficiency and Grouping Efficacy. Grouping Efficiency is calculated as η = q·η1 + (1 – q)·η2 (2) where η1 is the proportion of 1s in the diagonal block, η2 is the proportion of 0s in the off-diagonal block, q is a weight coefficient. Grouping Efficacy is calculated as in out NN NN 01 11 + − =μ (3) where N1 is the total count of 1s in the matrix, N1 out is the count of 1s outside the diagonal blocks, N0 in is the count of 0s inside the diagonal blocks. A Genetic Algorithm Clustering (GAC) was implemented in the form of MATLAB code. Both the Grouping Efficiency and Grouping Efficacy were implemented as a fitness function. The n number of rows and k number of columns had been split to y and x number of block w1, w2, …, wy and u1, u2, …,ux so that kuandnw x j y i == ∑∑ 11 (4) If the GAC creates an instance which does not meet (4) then the fitness function will be 0. Otherwise the fitness function is calculated as described above. The pseudo code of the GAC is as follows Create the initial set of the population. Evaluate each of them While not to stop do Select two random elements Crossover random genes of those two Mutate the new specimen randomly Evaluate the new specimen The new population will be the top elite of the previous population plus the best ones of the new specimens End while After the block diagonalization process the (1) matrix should look as in (5). ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ 1100000000 0110000000 0001100000 0011100000 0010100000 0001100000 0000101111 0000111110 1000011011 0000010111 10 5 7 9 1 4 2 3 6 8 (5) You can see three groups identified by the algorithm here: Group 1 consists of (8, 6, 3, 4), Group 2 consists of (1, 9, 7), Group 3 consists of (5, 10). Computer algorithms for detection of similarity Let’s consider the bottom up approach, i.e. check the PLC code for similiarity. The Structured Text format discussed above is can undergo such an investigation because that is textual format of the code. In the past, many computer algorithms were introduced for detecting software similarity, mainly for detecting plagiarism. [2] defines seven layers of code modification in a plagiarism spectrum. Some recommended using software metrics to detect similarity. [4] was among the pioneers: its metric used the following quantities: n1 = number of unique or distinct operators. n2 = number of unique or distinct operands. N1 = total usage of all the operators. N2 = total usage of all the operands. Using these metrics we can calculate: V = (N1 + N2)log2(n1 + n2) (6) E = (n1N2(N1 + N2)log2(n1 + n2))/2n2) (7) where V refers to the volume of the program and E refers to the efforts to create the program. You may find the logarithmic function odd at first sight. In this metric they found a correlation between the number of bugs in the program, programming or debugging time of the program and the complexity of the program. 153 Local alignment detection using Smith-Waterman algorithm Smith and Waterman developed their algorithm to detect common molecular subsequences. They compared two molecular sequences: A=a1 a2 ... an and B=b1 b2 ... bm. The algorithm works as follows: set up a Hn+1 m+1 matrix whose first row and column have the 0 index and are zeroed. Hk0 = H0l = 0 for 0 ≤ k ≤ n and 0 ≤ l ≤ m Then Hij is the maximum similarity of two segments ending in ai and bj respectively. It is calculated as { } { } ⎪⎪⎭ ⎪⎪ ⎬ ⎫ ⎪ ⎪ ⎩ ⎪⎪ ⎨ ⎧ − − + = −≥ −≥ −− ljil kjkik jiji ij WH WH basH H 1,1 1 1,1 max ,max ),,( max (8), where s(ai,bj) is a score function for similarity, Wk is a weight (cost) function for a k-long deletion and Wl is a cost function for inserting l length of new sequence. The highest score in the matrix indicates the maximum local alignment of the two sequences. Once the matrix elements are calculated the maximum element has to be found. That refers to the end of the maximum alignment. The traceback algorithm will find the way back. Find the next highest score. – A diagonal jump implies there is an alignment (either a match or a mismatch). – A top-down jump implies there is a deletion. – A left-right jump implies there is an insertion. Figure 4: Trace back from the maximum element REFERENCES 1. O. HORNYAK, G. SAFRANY: Group technology for automated generation of machine controller code. 5th International Symposium on Applied Computational Intelligence and Infromatics. (2009), Timisoara, Romania, 17–21. 2. J. A. W. FAHIDI, S. K. ROBINSON: An empirical approach for detecting program similarity and plagiarism within a university programming environment, Compter Education, 11, 1987, 11–19. 3. A. PARKER, J. O. HAMBLEN: Computer Algorithms form Plagiarism Detection, IEEE Transactions on Education, 32 (2), 1989, 94–99. 4. M. H. HALSTEAD: Elements of software science, North Holland, New York, 1977. 5. T. F. SMITH, M. S. WATERMAN: Identification of common molecular subsequences, Journal of Molecular Biology, 147, 1981, 195–197. << /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.4 /CompressObjects /Tags /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /CMYK /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams false /MaxSubsetPct 100 /Optimize true /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo false /PreserveOPIComments true /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /CreateJDFFile false /Description << /ARA /BGR /CHS /CHT /CZE /DAN /DEU /ESP /ETI /FRA /GRE /HEB /HRV (Za stvaranje Adobe PDF dokumenata najpogodnijih za visokokvalitetni ispis prije tiskanja koristite ove postavke. Stvoreni PDF dokumenti mogu se otvoriti Acrobat i Adobe Reader 5.0 i kasnijim verzijama.) /HUN /ITA /JPN /KOR /LTH /LVI /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR /POL /PTB /RUM /RUS /SKY /SLV /SUO /SVE /TUR /UKR /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) >> /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ] >> setdistillerparams << /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice