Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. V (2010), No. 1, pp. 42-51 Mining Authoritativeness of Collaborative Innovation Partners J. Engler, A. Kusiak Joseph Engler Adaptive Systems Rockwell Collins, Inc. Cedar Rapids, IA 52498 USA E-mail cjengler1@msn.com Andrew Kusiak Mechanical and Industrial Engineering 3131 Seamans Center University of Iowa Iowa City, IA 52242-1527 USA E-mail: andrew-kusiak@uiowa.edu Abstract: The global marketplace over the past decade has called for innovative products and cost reduction. This perplexing duality has led companies to seek external collaborations to effec- tively deliver innovative products to market. External collaboration often leads to innovation at reduced research and development expenditure. This is especially true of companies which find the most authoritative entity (usually a company or even a person) to work with. Author- itativeness accelerates development and research-to-product transformation due to the inher- ent knowledge of the authoritative entity. This paper offers a novel approach to automatically determine the authoritativeness of entities for collaboration. This approach automatically dis- covers an authoritative entity in a domain of interest. The methodology presented utilizes web mining, text mining, and generation of an authoritativeness metric. The concepts discussed in the paper are illustrated with a case study of mining the authoritativeness of collaboration partners for microelectromechanical systems (MEMS). Keywords: Innovation, web mining, text mining. 1 Introduction Innovation most often occurs in one of two forms, incremental or radical. Radical (discontinuous) innovation assumes focuses on a completely new concept that is radically different from the existing ideas. This type of innovation occurs rarely and is not easily predictable. Incremental, or continuous, innovation builds upon previous concepts and therefore it is easier to be quantified. Kusiak [1] defined innovation as an iterative process aimed at the creation of new products, processes, knowledge or services by the use of new or even existing technology. This definition summarizes the typology of incremental innovation. Innovation has further been quantified into five generational models. The first generational model is linear, where innovation is unidirectionally pushed from the research phase to the commercial application phase [2], [3]. The second model, the pull model, holds the consumer as the main focus of innovation as opposed to the designer [4]. Feedback forms the third model and utilizes the consumer’s responses to an initial product/service offering to perform incremental innovation on that product and/or service [5]. The fourth model is known as the strategic model in which innovation lines up directly with the company’s strategy [2]. The model pursued in this paper is the fifth model, also known as the networked model. In this model extra- enterprise and cross-discipline organizations form a network to innovate. The term Open Innovation [6] is often used when describing this model. Collaborative networking involves a detection of the optimal sources of col- laboration. This can often be viewed as a challenge and often results in the local optimum as the choice of the collaborative source rather than the unseen, and most often unknown, global optima (e.g., the most authoritative person/company). The success of collaborative innovation depends greatly on the quality of the collaboration sources. The probability of innovation success, as measured by the market, can be considered proportional to the quality of the collaborative sources. Therefore, it is incumbent upon a company, in pursuit of innovation, to seek out the optimal sources for collaboration. A wealth of information is available upon the World Wide Web (WWW) for identifying the optimal sources of collaboration (e.g., white-papers written by authorities of specific domains). Copyright c© 2006-2010 by CCC Publications Mining Authoritativeness of Collaborative Innovation Partners 43 Various researchers have begun to investigate possible means by which collaboration produces effective re- sults. Unfortunately, the literature is lacking solid systematic methodologies by which collaboration authoritative- ness may be determined apriori of collaboration inception. Chapman et al. [22] proposed a process model for collaborative data mining in an electronic manner but fail to address the need for authoritativeness when determin- ing the collaborative partner selection process. Lavraĉ et al. [23] surveyed various methodologies for collaboration but fail to report a systematic methodology of determining the most authoritative entity for collaboration. Gajda [24] investigated assessment measures for determining the success of a collaborative partnership on a project. Un- fortunately, these measures are considered upon the completion of the project rather than to determine the effective partners for collaboration prior to entering in to a collaborative agreement. Other researchers have posited criteria and methodologies for collaboration partner selection. Geringer [25] proposed task-related selection criteria for international joint ventures but failed to present a systematic method- ology for automatically determining the authoritativeness of a collaborative partner. Hitt et al. [26] investigated resource-based and organizational learning for collaborative partner selection. Again, the methodologies in the literature fall short of the goal of automated systematic determination of authoritativeness for collaboration. This paper presents machine learning algorithms to extract collaborative innovation relationship information from various sources including the WWW. Utilizing machine learning algorithms to discover valuable knowledge from disparate sources has been presented in the literature. Chen [27] posited utilizing natural computing tech- niques, such as swarm intelligence, to foster a collective intelligence in a virtual learning environment. Grebla et al. [28] presented a Bayesian belief network for mining data from various databases to assist in predicting arteriosclerosis and cardiovascular disease. This paper offers a novel methodology by which the optimal authoritative sources of collaborative partnership may be discovered. Through the use of web mining, text mining, and the creation of an authoritativeness matrix, users may determine the optimal authority with which to perform collaborative innovation. Of course, optimality may depend upon more than just the most authoritative partner on a given subject. Other factors such as the availability of the collaborator or the cultural background of the collaborator (e.g., defense systems collaboration) may be involved. Thus, this paper advocates the creation of an authoritativeness matrix as opposed to simply defining global optima for collaboration. The remainder of this paper proceeds as follows. Section 2 discusses the focused mining of the World Wide Web to discover authoritative sources for collaboration. The distillation of these sources to form an authoritative- ness matrix is discussed in Section 3. An authoritativeness metric is presented in Section 4. Section 5 discusses clustering of the mined sources of collaboration. Section 6 offers a case study for determining the leading authori- ties on microelectromechanical systems (MEMS). Finally, Section 7 offers concluding remarks. 2 Focused Mining of the Web The first major step in forming a collaborative innovation relationship is to seek out and choose partners for the collaboration process. The World Wide Web (WWW) presents a proven search space for multiple concepts. A natural inclination is to manually search the internet for such sources of collaboration. Some companies hire business development teams to perform this task. Manually searching the web is a time intense process that often yields sub-optimal results. Sometimes searches can even present misguided or influenced results due to the ability for parties to influence their rank among the various search engines [7]. Increasing the difficulty of the manual search, many search engines utilize payer based rankings which facili- tate assigning a higher position in the search results. Additionally, many websites have multiple internal links thus boosting certain search engine ratings. Such forms of ranking manipulation may provide false results and thus the most suitable collaborative candidates could be missed. To overcome the limitations of a manual search of the internet for collaboration resources, a focused web miner is presented. The focused web miner used for this process includes user inputs of a specified phrase which become the search criteria. The focused web miner then proceeds, following standard focused web crawling methodologies as presented by Liu [8], to traverse the WWW in search of white papers, articles, and journal entries related to the search criteria. The presented web miner can easily be extended to handle other sources, e.g., information about companies which have reached Phase II funding from Small Business Innovation Research (SBIR) programs. The presented version of the focused web miner does not attempt to mine the web blog data or the standard html files. Rather, this version of the focused web miner seeks content mimicking academic writings. Thus, the focused web miner spends a fair amount of time searching academic web sites, scientific communities, and trade journals. It is from these types of internet resources that the, often academic, writings are extracted. 44 J. Engler, A. Kusiak The discovery of web pages containing white papers is a significant task involving crawling the internet and classifying the web pages that are examined as a review or non-review page. The standard approach to performing the crawl is to utilize a focused web crawler. A focused web crawler targets a specific corpus of web pages. Standard crawling does not consider a specific topic of inquiry; rather its job is to index all pages available on the internet. Even with the assistance of algorithms such as PageRank developed by Google [17], successful standard crawling requires massive hardware and bandwidth. This drawback prevents most corporations from performing this type of crawl internally. Focused crawling requires far less hardware and bandwidth but does require some sophistication of algorithms to weed out the undesirable links as they relate to the given query. The algorithms useful to focused web crawling involve basic classification algorithms. There is a great variety of classification algorithms for determining web pages containing white papers. Shih et al. [18] suggested the use of web page content structure as parameters of classification. The authors in [18] indicated that content providers tend to choose URLs and page layouts that coherently structure their content. This html structure may be useful in determining the likelihood that a web page contains reviews. Kules et al. [19] extended this idea to limit the features used for classification to items such as web page titles, URLs and text snippets. Jin et al. [20] took a different approach from the previous two and utilized a data-mining algorithm called Hidden Nad’ve Bayes. Their methodology considered a large corpus of web pages and calculated the probability that each page fit into a particular category. Fortunately for the focused web miner proposed here, document type is often the best indicator of a possible fit. Given the query string provided by the user and a set of allowable document types (e.g., pdf files only) the need for classification is reduced greatly. The need for classification increases dramatically when white papers other than standard academic content is searched. In such case the use of a combination of Bayesian classification and the web page structure algorithm of [18] is suggested (see [21]). Once the focused web miner has discovered a document of the standard academic writing, it attempts to down- load this document to a central repository that is dedicated to the search criteria. This repository often becomes extremely large, in the range of a terabyte or more, but is central to the process for collaboration resource detec- tion. No partitioning of this repository takes place at the time of download; rather, all documents are placed into the same location. It is upon these documents that the process of determining the most authoritative collaboration partners is performed. While this form of repository may seem excessive, the reader is reminded of the low cost of data storage. Additionally, a great deal of information will be gleaned from this repository over time. 3 Authoritativeness Matrix An authoritativeness matrix is generated from the documents that were obtained from the focused web mining process. Utilizing standard text mining techniques the documents are deconstructed to gain the information neces- sary for the generation of the authoritativeness matrix. A text-mining algorithm extracts from each document the author’s names and the references, other authors, cited by that particular paper. To assist in discovery of the authors who wrote, and are cited in, the papers a list of first names is utilized. The list of first names, freely available on many internet sites, allows for the detection of document patterns within the corpus such that most often names of authors are placed within a given context within the document (e.g., author names at the beginning, authors who are cited at the end). The text mining algorithm utilizes these patterns to classify portions of the document which will have the information extracted from. Additional, text- mining algorithms may be utilized to detect the sense, positive or negative, in which a citation appears within the document. From the information mined by the text-mining algorithm the authoritativeness matrix may be constructed. The authoritativeness matrix is a two dimensional matrix, or table, made up of columns representing individuals who have been referenced by the papers and rows representing paper authors. The authoritativeness matrix forms a concise but sparsely populated representation of the given, or presented, authorities in the documents. Figure 1 presents an example authoritativeness matrix for three documents. The rows represent the authors of the documents while the columns represent authorities who are cited as references in these documents. A cell of the matrix is 1, if the author in the row containing the cell has referenced an authority in the column containing the cell. Otherwise the cell is 0. Each row represents a single author for a single paper, thus there may exist multiple rows for a single paper. From the example of Figure 1 it is easily determined that JR Koza is the most authoritative person with which to conduct collaborative innovation for the example domain. This is due to the fact that JR Koza is the most cited Mining Authoritativeness of Collaborative Innovation Partners 45 Figure 1: Example authoritativeness matrix for three documents author within the tiny corpus of documents for the example. It will be demonstrated later in this section that other factors contribute to this outcome. It should be noted that an author may reference his/her own writings as well as be referenced by others. Thus, it is possible for this matrix to hold entities that are in both the rows and columns of the matrix. In fact, it is possible, although highly improbable that the authoritativeness matrix holds exactly the same authors in its rows as it does references in its columns. As will be explained in Section 4, there some caution needs to be exercised when an author often references their own work while others do not. This caution is the motivation for the storage of the document year in the authoritativeness matrix as will be explained. The representation of the authorities of the documents discovered by the focused web miner by the author- itativeness matrix ensures ease of storage and traversal. The authoritativeness matrix is compact enough to be stored in main memory especially given the sparseness of the matrix. This allows for efficient processing when determining the true authorities for the collaboration process as presented next. 4 Authoritativeness Metric To determine which entities in the authoritativeness matrix represent the optimal, or most, authoritative entity, within the search criteria, an authoritativeness metric is used. This metric accounts for the number of documents which were written by the authority, the number of times the authority is referenced in the body of work, discovered by the focused web miner, as well as the average age of the documents written by the authority. Additionally, the sense, positive or negative, in which the author is portrayed in the document can be collected. Thus, the first step in calculating the authoritativeness metric is to scan the authoritativeness matrix and calcu- late a number of measures. These measures will be stored in various hash tables for efficient referencing. During the scan of the authoritativeness matrix a hash table representing the authors, or rows, of the matrix is created. Each time an author is encountered in the scan of the matrix that author is either added to the hash table as a key value pair of <AuthorName, 1> or the value of the index of the author in the hash table is incremented. The same action is performed for the columns, or referenced authorities, in the matrix. Additionally, a hash table is created for the purpose of obtaining the average age of the documents for each author, or row, in the matrix. The use of the three hash tables makes for an efficient scanning methodology for the authoritativeness matrix since the matrix is scanned only a single time. A forth table may be required to represent sense. The hash table which represents the number of documents written by the authors, or rows, of the authorita- tiveness matrix is used to obtain the number of out-links of each author. Out-links are documents written by the author. Similarly, the hash representing the number of times an author is referenced is used to obtain the number of in-links for each authority. In-links are documents written by other authors referencing the given author, thus implicitly conveying authority on, or detracting from in the case of a negative sense, the referenced author. The conveyance of authority on an author by referencing their work is held here in a context similar to that of the PageRank algorithm discussed in [9]. Conveyance of authority plays a vital role in the determination of authoritative collaborative sources. Thus, the authoritativeness metric proposed weights the in-link measure higher than the out-link measure. Additionally, should sense be included, the in-link could have the ability to decrease the author’s authoritativeness. The initial authoritativeness metric is defined in (1). In (1), λ is a user defined parameter which allows for 46 J. Engler, A. Kusiak changing the weight of the out-links based on the average age of the documents written by author ai. This parameter assists in controlling the conveyance of authority to an author who is a prolific writer but perhaps not often cited by other authors. Additionally, the λ parameter allows for decreasing the weight of older documents if only recent documents are desired. If sense is utilized, the in-link, ini, can be a negative number. It should be noted that determination of the final authoritativeness metric measure is an iterative process as will be explained. Ai = ln(λ t ′ (outi) + ini) (1) where: outi is the number of out-links; ini is the number of in-links; t ′ is the average age of the documents of author ai in years from the current year; λ is a user parameter in the range of [0, 1]. Thus, the initial authoritativeness of an author ai is given by Ai. Once the initial individual authoritativeness metrics of the entire authoritativeness matrix is calculated, the iterative process of boosting the authoritativeness is performed. Similar to the methodology used in the PageRank algorithm [10], it is desirable to instill more authority to an author who is referenced by another author of high authority. Thus, if one author is considered a leading authority, deemed so by the authoritativeness metric, and that author references a second author, the second author’s authoritativeness metric measure should be increased. The iterative process of authoritativeness boosting is performed using the average of the in-links pointing to the current author. In-links of high authority contribute to the boosting of the current author’s authoritativeness, while in-links of less authoritative authors are not detrimental to the current author’s authoritativeness. Thus the average of the in-links during the authoritativeness boosting process is calculated by including the authoritativeness of the author who referenced the current author as shown in (2). ini = N∑ i= eA N (2) where: Ai is the authoritativeness of author ai; N is the number of in-links to ai. At each iteration of the calculation of the final authoritativeness metric the average of the in-links is used to calculate the new authoritativeness metric measure Ai for each author ai. Equation (3) describes how the new in-link measure is calculated iteratively. in ′ i = N∑ j− { eA − ini if eA > ini if eA ≤ ini (3) Thus, at each iteration of the boosting process, ini of (1) is replaced with ini for the calculation of the author- itativeness of each author. The boosting process is continued for n iterations, as set by the user, or until the order of the authorities remains unchanged which is the preferred method. With the final authoritativeness metric at hand for each author it is easy to determine which author and/or entity is the most authoritative in the subject matter of the search criteria. It is useful to determine the top k authorities in the subject matter to ensure that a good collaborative resource can be found available and willing to collaborate on the innovation at hand. As such it is useful to set a user defined parameter λ which is a threshold below which authoritativeness is discounted. This threshold is utilized in determining the authorities of the clustered documents as explained next. 5 Document Clustering Often the size of the search space or the generality of the search criteria can result in a document set of varied type which is large in size. To ensure that the collaborative partner chosen by the authoritativeness metric is the one Mining Authoritativeness of Collaborative Innovation Partners 47 that is most appropriate for the specific collaboration it is helpful to cluster the documents into similar categories. Once the clustering has been performed, a cluster that is most similar to, or most represents, the specific innovation topic is chosen. From that cluster it is the possible to determine the best collaborative source for the innovation. Note, the collaborative authority of a specific cluster may not be the authority whose overall authoritativeness metric is the highest. Rather, the cluster authority, or authorities, will be those who are most advantageous for the specific innovation subject. Clustering of the documents mined via the focused crawler begins with the generation of a word frequency matrix for the documents. The word frequency matrix represents the counts of each word in the individual docu- ments. Each row of the matrix represents a single document; while each column of the matrix represents a single word. There exists columns for every document word, which is not a stop word, thus the matrix can be somewhat sparse. Many words, known as stop words, do not assist in properly classifying the documents. Stop words are most common in everyday language and thus not specific to the topic. Words such as "the", "in" and "here" are removed from the word frequency matrix prior to clustering. Further, it is often favorable to generate the root of words as opposed to the actual words for this frequency matrix. Thus, words such as "innovation", "innovate", and "innovativeness" would all be placed in the root word frequency cell for the word "innov". Figure 2 below represents a partial frequency word matrix. Figure 2: Word frequency matrix Once the word frequency matrix is obtained it is important to reduce the dimension of the matrix to ensure efficient clustering. Dimensionality reduction techniques, such as singular value decomposition, that are used for standard data mining are especially helpful here. The word frequency matrix before dimensionality reduction can easily include thousands of words or attributes. Rarely are all the attributes of value to the clustering. Thus, by performing a dimensionality technique such as singular value decomposition, the attribute set can be reduced down to a size that is more manageable, typically of size 100 or less [11]. Once the dimensionality reduction has been performed, the reduced word frequency matrix is clustered with simple k-means clustering algorithm described in [11]. Thus, a brief review of the cluster centroids will help to determine which cluster most resembles the subject matter of the specified innovation. The authoritative collaboration partner(s) can easily be determined from those entities that have contributed work to the cluster that most resembles the subject of the innovation. Section 4 presented a threshold measure by which authorities could be weeded out of the collaborative search process. Following the Apriori Property discussed in [11] and [12], those authorities that are not authoritative for the entire group should not be considered authoritative for a subsection of that group. Therefore, only authorities with the authoritativeness metric higher than the user defined threshold should be sought within the clusters. 6 MEMS Case Study This section presents a case study on the discovery of the most authoritative person to perform collaborative innovation with for the domain of microelectromechanical systems (MEMS). In this study 2403 papers were mined from the internet on the subject of MEMS Simon [13] describes MEMS as a monolithically integrated device used for microwave applications such as switches, distributed phase shifters and BPSK modulators. Other applications for MEMS have also surfaced. In fact, according to Maeda et al. [14] MEMS is expected to be one of the most promising areas of research and development contributing to future success of electronics businesses. 48 J. Engler, A. Kusiak After the 2403 papers were mined from the internet using the focused web miner, the author’s names and references were parsed from the documents as described in Section 3 above and the authoritativeness matrix was generated. The authoritativeness metric, described in Section 4, was applied with λ set to 0.80 to slightly discount the average age of the documents. Figure 3 illustrates the top ten authorities after this initial calculation of author- itativeness. Figure 3 illustrates the results of running the algorithms presented in this paper prior to the iterative boosting discussed in Section 4. Thus, the results in Figure 3 are more indicative of a rapid manual search. Figure 3: The top 10 authors in the non-boosted authoritativeness metric list for MEMS As seen in Figure 3, GM Rebeiz is indicated as the leading authority on MEMS. A quick search of the internet with the name GM Rebeiz justifies his rank as the top authority in this non-boosted list. GM Rebeiz is a professor at the University of Michigan in the College of Engineering and leads a team of 8 PhD students in a focus on RF-MEMS [15]. Utilizing the authoritativeness matrix, discovered in the mining of the 2403 documents which generated the non-boosted results of Figure 3, the iterative boosting of authoritativeness is applied. Upon the application of the boosting of the authoritativeness the list changes in order as can be seen in Figure 4. Boosting has the effect of attributing higher authority to those whose papers have been cited by authors of higher authority. Thus, this is the list that a person for collaborative practices in the field of MEMS should be sought from. Figure 4: Top 10 authors in the boosted authoritativeness metric list for MEMS From the boosted authoritativeness it is easy to see that CL Goldsmith is the authoritative figure one would wish to collaborate with. In fact, with a quick search of the internet it is found that CL Goldsmith is the president of a company called Memtronics and received his PhD from the University of Texas [16]. The list contains other potential candidates who may be sought after should CL Goldsmith not be available for collaboration. For this case study, the focused web miner ran for approximately 18 hours to gather the 2403 documents. The parsing of the author’s references and document age took less than 2 minutes. The initial authoritativeness was then calculated from the matrix in approximately 1.5 minutes. The boosting of the authoritativeness took 16 iterations Mining Authoritativeness of Collaborative Innovation Partners 49 before order no longer changed and took less than 10 minutes to achieve (see Figure 5). Thus, overall, the process of mining the leading authority in the field of MEMS, based upon these documents, took less than 18.5 hours and required very little of the user’s time to perform. It is easily seen that this is a marked improvement upon a manual search. Figure 5: Summary of the algorithm run The case study illustrates the effectiveness of the authoritativeness metric presented in this paper. Further, the case study highlights the differences between boosted and non-boosted authoritativeness. From the perspective of a company that is seeking a collaborative partner, the boosted authoritativeness offers a list of highly respected candidates. Deriving this list in the short unmanned time frame of 18.5 hours offers companies a great benefit in discovering the most authoritative person(s) to perform collaborative innovation with. The effectiveness of the authoritativeness metric is further explained in a recent scenario that was encountered by an electronics manufacturer who required expertise with legacy 16 bit PCMCIA PC cards. Due to confiden- tiality the details of this episode cannot be related although a summary of the scenario can be provided. The electronics manufacturer, a government contractor, was contracted to design a laptop integrated testing device for a piece of electronic equipment for a foreign concern. One of the requirements for this testing device was for it to integrate with the laptop through a legacy 16 bit PCMCIA PC card. The contractor lacked the domain knowledge to effectively and rapidly design the testing device with this form of legacy interface. Therefore, the author’s were asked to apply the authoritativeness metric to determine which entities to best collaborate with on this issue. The results of the running of the algorithms, described herein, a list of authoritative entities was generated. The second entity on this list was eventually utilized to solve the domain issue. The first entity on the list was not completely suited for the task due to security restriction. The above presented scenario lends additional support towards the effectiveness of the authoritativeness metric. It is shown that the authoritativeness metric is applicable not only to academic, research, and scientific activities but also to integration of various domain expertise in a corporate setting as well. Further, it is shown that the list of authoritative entities is crucial for selection of collaborative partners due to various external constraints (e.g., security, geospatial reasons) that cannot be accounted for within the authoritativeness metric. By producing a list of authoritative entities the end user is capable of filtering for these external constraints while still achieving the results of finding the optimal collaborative partner. 7 Conclusions and Future Works Open innovation is the means by which companies seek external entities with which to collaborate to form innovation. This paper illustrated that finding the best source of collaboration for a given innovation in a manual fashion is sub-optimal. This paper presented a novel methodology for the automation of collaboration partner detection for the purpose of collaborative innovation. Furthermore, a process by which the authoritativeness of the collaborative partner is ensured to be optimal was presented. Using data mining, clustering, and analysis of the documents related to the innovation domain increases competitiveness of companies. Novel to this paper is the use of boosted authoritativeness. The iterative process of increasing, or decreasing the authoritativeness of possible candidates for collaborative innovation extends the search process, and represents an automated methodology for determining the best candidate entity (company, person) for collaborative innovation. Future research should includes increasing the efficiency of document detection during the web mining process as 50 J. Engler, A. Kusiak well as increasing the rate at which document classification takes place. Bibliography [1] A. Kusiak, Innovation: A Data-Driven Approach, International Journal of Production Economics, Vol. 122, No. 1, pp. 440-448, 2009. [2] G. Berkhout, P. van der Duin, Mobile Data Innovation: Lucio and the Cyclic Innovation Model, Proc. of the 6th Intl. Conf. on Electorinc Commerce, Delft, Netherlands, pp. 603-608, 2004. [3] Y. Sawatani, F. Nakarmura, A. Sakakibara, M. Hoshi, S. Masuda, Innovation Patterns, Proc. of the 2007 IEEE Intl. Conf. on Services and Computing, Salt Lake City, UT, pp. 427-434, July 2007. [4] J.B. Zhang, Y. Tao,The Interaction Based Innovation Process of Architectural Design Service, Industrial En- gineering and Engineering Management 2007 IEEE Intl. Conf., pp.1719 - 1723, Dec. 2007. [5] A.W. Ulwick, Turn Customer Input Into Innovation. Harvard Business Review, Vol. 80, No. 1, pp. 91-97, 2002. [6] L. Collins, Opening up the Innovation Process, Engineering Management Journal, Vol. 16, No. 1, pp. 14-17, 2006. [7] A. Langville, C. Meyer, Deeper Inside PageRank, Internet Mathematics, Vol. 1, No. 3, pp. 335-380. [8] B. Liu, Web Data Mining, Springer, Heidelberg, 2007. [9] T. Haveliwala, Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search, IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, pp. 784-796, 2003. [10] S. Kamvar, T. Haveliwala, C. Manning, G. Golub, Extrapolation Methods for Accelerating PageRank Com- putations, Proc. Of the 12th Intl. Conf. on World Wide Web, Budapest, Hungary, pp. 261-270, 2003. [11] I. Whitten, E. Frank, Data Mining, Practical Machine Learning Tools and Techniques, Morgan Kauffman, New York, 2005. [12] J. Han, Y. Yin, G. Dong, Efficient Mining of Partial Periodic Patterns in Time Series Databases, Proc. of the 15th IEEE Intl. Conf. on Data Engineering, Sydney, Australia, pp. 106-115, March 1999. [13] S. Simon, Modeling and Design Aspects of the MEMS Switch, Proc. Of the 2003 IEEE International Semi- conductor Conference, Sinaia, Romania, September 28 - October 2, pp. 128-132, 2003. [14] R. Maeda, M. Takahashi, S. Sasaki, Commercialization of MEMS and Nano Manufacturing, Proc. Of the 6th IEEE Intl. Conf. On Polymers and Adhesives in Microelectronics and Photonics, Tokyo, Japan, pp. 20-23, January 2007. [15] G. M. Rebeiz, Homepage htt p : //www.eecs.umich.edu/rebeiz/rebeiz.html. [16] C. L. Goldsmith, Homepage htt p : //www.memtronics.com/page.aspx?pageid = . [17] Y. Zhai, B. Liu, Web Data Extraction Based on Partial Tree Alignment, Proc. of the 2005 International World Wide Web Conference, May 10-14. Chiba, Japan, pp. 76-85, 2005. [18] L. Shih, D. Karger, Using URLs and Table Layout for Web Classification Tasks, Proc. of WWW 2004, May 17-22, New York, pp. 193-202, 2004. [19] B. Kules, J. Kustanowitz, B. Shneiderman, Categorizing Web Search Results into Meaningful and Stable Categories Using Fast-Feature Techniques, Proc. of JCDL’06, June 11-15, pp. 210-219, 2006. [20] X. Jin, R. Li, X. Shen, R. Bie, Automatic Web Pages Categorization with ReliefF and Hidden Nad’ve Bayes, Proc. of SAC ’07, March 11-15, pp. 617-621, 2007. [21] J. Engler, A. Kusiak, A. Mining the Requirements for Innovation, Mechanical Engineering, Vol. 130, No. 11, pp. 38-40, 2008. [22] P. Chapman et al., Step-by-step Data Mining Guide, CRSIP-DM Consortium, CRISP-DM 1.0, 2000. Mining Authoritativeness of Collaborative Innovation Partners 51 [23] N. Lavraĉ, H. Motoda, T. Fawcett, R. Holte, P. Langley, P. Adriaans, Introduction: Lessons Learned from Data Mining Applications and Collaborative Problem Solving, Machine Learning, Vol. 57, No. 1-2 , pp.13-34, 2004. [24] R. Gajda, Utilizing Collaboration Theory to Evaluate Strategic Alliances, American Journal of Evaluation, Vol. 25, No. 1, pp. 65-77, 2004. [25] M. Geringer, Strategic Determinants of Partner Selection Criteria in International Joint Ventures, Journal of International Business Studies, Vol. 22, No. 1, pp.755-786, 1991. [26] M. Hitt, M. Dacin, E. Levitas, J. Arregle, A. Borza, Partner Selection in Emerging and Developed Market Contexts, Academy of Management Journal, Vol. 43, No. 3, pp. 440-467, 2000. [27] Z. Chen, Learning about Learners: System Learning in Virtual Learning Environment, International Journal of Computers, Communications and Control, 3(1):33-40, 2008. [28] H. Grebla, C. Cenan, C., Distributed Machine Learning in a Medical Domain, International Journal of Com- puters, Communications & Control, 1(S):245-250, 2006.