Support Vector Representation Machine for superalloy investment casting optimization

Machine learning techniques have been widely applied to production processes with the aim of improving product quality, supporting decision-making, or implementing process diagnostics. These techniques proved particularly useful in the investment casting manufacturing industry, where huge variety of heterogeneous data, related to diﬀerent production processes, can be gathered and recorded but where traditional models fail due to the complexity of the production process. In this study, we apply Support Vector Representation Machine to production data from a manufacturing plant producing turbine blades through investment casting. We obtain an instance ranking that may be used to infer proper values of process parameter set-points.


Introduction
Turbine blades are critical components in aeronautics and gas industry. Blades operate under extreme and complex environmental conditions; they are subjected to air dynamics and centrifugal forces causing tensile and bending stresses: for example, in $ Partially supported by PON03PE-00111-01 "MATEMI" financed by the Italian Ministry for University and Research A C C E P T E D M A N U S C R I P T aeroplane engines, turbines often reach 1050 • C for several thousand hours [1,2].
Turbine performance significantly depends on the shape and dimensions of blades as well as their material strength. To warrant resistance against intense mechanical loads, the manufacturing process must satisfy high quality requirements. Indeed, blades must be produced respecting stringent dimensional and geometrical tolerances as well as ensuring material integrity and mechanical properties (e.g. high temperature tensile strength, endurance strength and creep strength). To attain production specifications, blades are manufactured through the investment casting or "lost-wax" process often employed in the production of high quality and net-shaped complex parts [3,4].
In the investment casting technique, a pattern of the desired blade shape, usually made of wax, is formed by injecting molten wax into a metallic die (see Figure 1).
The wax mold is subsequently invested with ceramic or refractory slurry, which then solidifies to build a ceramic shell around the wax pattern. Wax is then removed from the shell by melting or combustion, leaving a hollow void within the shell, which exactly matches the blade shape. After the resulting refractory shell has been hardened by heating, the actual casting operation is performed filling the shell with molten alloy. After the molten metal solidifies, the shell is broken to obtain perfectly shaped components, that will finally be refined [5].
Investment casting is a high precision and very expensive production process. Major efforts have been made to reduce scraps in the shell building stage and enhance product quality (see for example [6]); less significant improvements have been achieved in casting operation, even though this production stage is characterized by a significant scrap rate-varying from 20% to 30% according to product characteristics-and determines most finite product mechanical characteristics [7,8]. The lack of remarkable advancements aimed to enhance product quality by improved casting is due to the complex metallurgical phenomena arising in alloy injection and solidification, which have not been fully understood yet. Moreover, blade final characteristics depend on several different elements related to each production stage [9].
Scientific and industrial efforts to improve blade production have been mainly addressed to develop advanced casting simulation tools; these software reproduce molding stage and help to visualize several process variables, including flow of the molten metal A C C E P T E D M A N U S C R I P T in the cavity, heat transfer, solidification, grain formation, shrinkage, and stress evolution [10]. However, simulation tools are unsuitable to correlate process variables (e.g. casting temperature at different heights in the furnace, casting pressure, alloy composition) to blade attributes -both defects and desirable qualities-hence they are not helpful in predicting final product quality from process parameter values. To the best of our knowledge, only few recent studies [11,12] have tried to relate the performance of blade materials with parameter changes in the production process of superalloys and ceramic coatings. In [11] the author looks for possible correlations between product failures (typically blade rift) and possible causes such as structural defects of materials or mechanical fatigue. Our study instead aims at relating product characteristics (such as material strictures obtained during alloy melting and solidification process) to production features. Reference [12] studies possible relations of a specific blade defect-the margin plate warpage deformation-to the solidification phase via numerical simulation analysis of deformation.
In an attempt to improve blade quality and reduce defects or scraps, we developed a systematic strategy to determine the relationship between final product quality and casting process parameters. We studied and analyzed production data from Europea Microfusioni Aerospaziali (EMA), an investment casting foundry placed in Southern Italy producing turbine blades for jet-engines in civil and defence aerospace, marine and energy industry (see [13]). To achieve that goal, we relied on a machine learning technique, precisely the combination of a standard binary Support Vector Machine (SVM), and a one-class SVM called Support Vector Representation Machine (SVRM, [14]).
Machine learning techniques have been widely applied to production processes with the aim of improving product quality, supporting decision-making, or implementing process diagnostics (see [15,16] for an overview). SVM formulations (see for example [17] for the use of SVM in data mining applications) have often been successfully applied [18], mostly based on its scalability properties [19]. We chose SVMs because they are a theoretically sound tool, known to deal well with non linearly separable data while limiting overfitting [20]. Moreover, they are trainable in polynomial time by solving a convex quadratic programming problem. The choice of SVRM was motivated by the particular outcome of the SVM training, precisely the upper triangular form of the confusion matrix. Precisely, the outcome made clear that a subset of the data could be separated perfectly, hence a description of that subset could provide information about the meaningful features. The novelty of the paper is of methodological nature. Indeed, although the Support Vector Representation Machine is an existing tool, it is employed here in a novel way. In particular, rather then solving a classification/rejection problem [14], here the data are first classified using standard SVM and then a subset is ranked.
The ranking criterion is the distance from an implicitly defined vector that acts as a kind of cluster center. The ranking is meaningful because the decision surface of the original classification problem leaves to one side points belonging to the same class, hence identifies a subset of the data that are "easy to classify". Our machine learning approach proved particularly useful in the investment casting manufacturing industry, where huge variety of heterogeneous data, related to different production processes, can be gathered and recorded but where traditional models (i.e. continuous or discrete time models) fail due to the complexity of the production process. Moreover, we compared the results of SVRM with an alternative method-namely decision trees method- aiming at inferring rules from data; the comparison of the two methodologies show the benefits deriving from the application of SVRM. Finally the developed methodology has been applied to EMA manufacturing data as part of a research project funded by Italian Ministry of University and Research.
The paper is organized as follows. In Section 2 we give a brief description of the overall data set characteristics and detail the analyzed EMA plant data; then, we describe the methodology adopted and customized to EMA production process. Results of data analysis are reported in Section 3. In Section 4 we show how to apply to EMA data the decision trees method; we discuss the results obtained by applying this methodology and we compare them with the outcomes of SVRM. Comments and concluding remarks can be found in Section 5.

Analysis of products: searching for relationships between process data and quality
Data to be measured in process plant analysis can be classified as follow: • quality data, that is a set of data comprising measurements carried out during the conformity check, and related to attributes (quality or defect) of each monitored product. Therefore, for each statistical unit, the quality data will consist of information such as the compliance with dimensional tolerances, the presence or absence of defects (i.e. cracks, alloys grains undesirable structures), an assessment of the extent of non-compliance together with the maximum admissible threshold for the extent of the defect (if applicable).
• process data, that is all parameters detected during the production process. This set is composed of strictly technological variables, directly related to the materials used in the various stages (e.g. the chemical composition of the superalloy and the ceramic coating), or indirectly referring to the settings of the plants (for example the time trend of the temperature reference in the furnace); These two sets of data have been the source from which we identified the relationship between process parameters and product attributes. In the analysis we performed for A C C E P T E D M A N U S C R I P T EMA plant we focused on a specific product (referred to as Part Number #A (PN#A) ) and collected production data over approximately 5 months, obtaining 1224 samples.
A list of parameters related to the process has been provided by EMA (a pre-selection of the most significant has been made based on available process knowledge). These parameters, in the following referred to as "production features" or simply "features", are shown in Table 1, along with a general brief description. The features of Table 1 are the sole features under analysis. Samples having complete data (i.e. each feature has a value) are 1189. The analysis was carried out on these complete samples.
EMA's objective is to relate some product attributes -here classified with codes A01, A02, A03, A04, A05, A06 -to production's features, that is identify those production features that may be more likely responsible for product attributes. For the sake of simplicity from here on we will refer to the analyzed products attributes simply as "attributes". Different attributes have different code. The attributes codes and their occurrence in the dataset are reported in Table 2. Some analysis are carried out per single attribute, some other by aggregating all the different attributes. In details, the occurrences of the A04-A06 attributes represent around 1 − 5 ‰ of the available samples, when each attribute is considered separately. Moreover, EMA process experts considered the first three attributes (namely A01, A02, and A03) more significant than the latter ones for the production process of the specific part number PN#A. Owing the EMA experts remark and the low occurrence rate of attributes A04-A06, it was decided to restrict the single attribute analysis to the three most frequent, namely A01, A02, A03. In the aggregate case, all attributes have been considered. Any description of product's attributes can't be given here due to nondisclosure agreement.

Methodology
The analysis is based on Statistical Learning tools [17,20]. More specifically, it consists of two basic steps: 1. Designing and training of Support Vector Machine (SVM) classifiers; 2. instance ranking based on Support Vector Representation Machine.
The basic idea is that if a classifier can be trained to associate an attribute class to a feature vector with good performance, then: • the features contain sufficient information for predicting the attributes; • the structure of the classifier obtained can be exploited to characterize the feature vectors associated with the attribute.
In the following, we describe the steps 1 and 2 in more detail.

Designing and training of classifiers
By classifier we mean a function from the feature space to a finite set of classes.
The function (called the decision function) maps a feature vector to an integer (defining the class): where n is the number of features. We only consider binary attributes, thus the classifiers take the form: The classes 0 and 1 respectively represent the two possible values of a given attribute.
In cases where the analysis is aggregated 1 , we still use two classes: in such cases, the class 0 indicates that all the attributes take on zero value and class 1 that at least one attribute is non-zero. For simplicity, in the following we will say that "the attribute is present" rather than "the attribute takes on positive value", and similarly we will say that "the attribute is absent" in place of "the attribute takes on zero value".

Remark
We consider a supervised learning setting, where a training set (TS) is assigned namely a set of feature vectors x i , each corresponding to a label y i containing the attribute class. The training is the process by which, based on the training set, the decision function is chosen.

Type classifiers and their implementation
Among the possible classifiers it was decided to employ the Support Vector Machine (SVM) [21] due to the well-known capability of dealing with non linearly separable patterns while limiting the overfitting. Moreover, SVMs can be trained (in deterministic polynomial time) by solving a convex quadratic program. The SVM classifiers implement a decision function of the form where the coefficients α i and b are found during training and K (x, y) is a kernel function acting as a generalized dot product. The kernel is a design parameters and its choice affects the shape of the resulting decision surface. In the case in which K (x, y) is precisely the scalar product x, y between x and y, the classifier generates a linear decision surface (i.e. partitioning the feature space into two half-spaces delimited by a hyperplane). For general K (x, y) the separation surfaces can "bend" and adapt to data that are not linearly separable. Preliminary experiments have been conducted with a linear SVM, getting poor performance in terms of accuracy (i.e. the ratio between instances correctly classified and the total evaluated instances). Thus, in the subsequent experiments a non-linear kernel has been used. Precisely, a Radial Basis Function kernel (RBF) has been used: since it has been proven effective in a large number of applications [21,22,23]. As it is well-known, using a kernel amounts to implicitly define a map Φ between the feature space and a space (usually of larger size) in which a separating hyperplane is sought.
All trainings were performed in Matlab environment using the SVM implementation provided by the external tool LIBSVM [24], which is a free and well-established implementation. However, any SVM implementation providing binary classifiers with RBF kernels could be used. As a preprocessing, the features have been standardized, so that each feature exhibits zero mean and unit variance.

Selection of hyperparameters
The training of an SVM classifier consists in solving an optimization problem (precisely a convex quadratic programming problem). In the case of RBF kernel, such a problem, in addition to the training set, depends on two hyperparameters: the value γ appearing in (1), which defines the kernel, and a regularization parameter C, attributing a cost to instances of the training set that are incorrectly classified. In details, for each SVM classifier the choice of hyper-parameters was initially carried out by means of a 5-fold cross-validation [25,26], by a grid search within a range of reasonable values. During this phase, the whole set of available data has been used, taking into account the data imbalance, as described in Sect. 2.5. The best hyper-parameters were those corresponding to the minimum value of the 5-fold cross-validation error. As a second step of training of the SVM classifiers, a further training of each SVM has been performed, selecting for every SVM machine the best hyper-parameters of the previous phase. Every SVM classifier has been trained using the whole set of data (see [26] Sect. 7.10 for a detailed analysis of such training approach).

Unbalanced training set
Whether the 3 attributes are considered individually or aggregated, the training set is unbalanced, i.e. it presents many more instances of a class (precisely, class 0) than the other. Among the various techniques available to cope with this problem (that may introduce a bias toward the most represented class), we adopted the SVM-WEIGHT method for its high effectiveness (see [27] for an overview of different approaches able to take into account the problem of class imbalance and for an analysis of SVM-WEIGHT performance). In [27] the authors suggest the SVM-WEIGHT method as the first choice especially when the cardinality of the available data-set is about few thousands In particular, we decided to employ a higher penalty value in the case of false negatives than in the case of false positives. By using such a penalty values imbalance, we were looking for an SVM with the lowest number (possibly none) of false negatives. to consider all those instances correctly classified and selecting a subset of these lying away from the decision boundary (in other words, instances that are easy to be classified correctly). These instances can then be analyzed by the production specialists. The method of Support Vector Representation Machine (SVRM) [21] allows to perform the mentioned selection. First we state two properties of the RBF kernel type:

A C C E P T E D M A N U S C R I P T
The two properties guarantee that all the images of the input feature vectors according to the map Φ lie on the surface of the unit sphere of the extended feature space, and precisely in the first orthant, as shown in Fig. 2. The SVRM is a particular type of domain representation that uses the RBF kernel. The basic idea is to locate the smaller arc (or, more generally, the smaller solid angle) that contains all of the images of a given set A of feature points (represented in orange in Fig. 3). This arc can be found as follows:

A C C E P T E D M A N U S C R I P T
. . ] is the solution of a quadratic programming problem, namely: which is equivalent to the following: The SVRM can be applied to the case at hand in the following manner: 1. Train an SVM to discriminate between the products with or without a given In Table 7 we report the first, tenth and fiftieth percentile of the instances ordered in the proposed way, using the "true negatives" obtained when discriminating between the presence ob absence of any attribute (see the confusion matrix reported in Table 6).
The feature values for these instances provide guidance for determining the set point of the various production parameters.

Designing and training of SVM classifiers
As already mentioned, the attributes under analysis were A01, A02, A03. A total of 4 classifiers have been trained: • one to discriminate between the presence or absence of A01; • one to discriminate between the presence or absence of A02; • one to discriminate between the presence or absence of A03; • one to discriminate between the presence or absence of any attribute.
Each training was preceded by a selection of the γ and C hyper-parameters based on 5-fold cross validation. Tables 3-6 show the parameters and performances of the classifiers, evaluated on the training set (remember that the final aim of the work is to obtain a Support Vector Representation, that is a domain description and therefore it is not necessary to evaluate the performance on set different from the training set).

Instance ranking via Support Vector Representation Machine
The confusion matrices reported in Tables 3-6 (where the rows Fig. 6, in which the solid circles represent non-attribute instances and those empty instances with attribute, while the segment represents the separation surface. So it is reasonable to adopt the Support Vector Representation Machine to characterize the non-attribute instances that are correctly classified, that is to characterize the set highlighted in Fig. 7. In particular, the SVRM method was used to derive the first percentile of the most "central" instances of said set (in the sense already explained in Section 2.6). Precisely, for each instance without attribute and correctly classified as such, the following index was calculated A C C E P T E D M A N U S C R I P T   and the instances were sorted according to it, in descending order. In this way it was possible to obtain the first percentile (more precisely, the instances in the top 1%).
By way of example, a comparison is made between the average values of the features for the first percentile (relative to the attributes A01, A02, A03) and the average values for the instances with attribute. The comparison indicates differences that do not seem negligible in some features (boldface in Tables 10-13). What is the most appropriate way to derive operational indications from these results (that are, basically, an instance ranking) is not fully clear, at the moment, because it is essential to have a thorough knowledge of the process. Reasonably, the production parameters should be set as close as possible to the values they take for the top ranked instances. Bearing in mind that the parameter setting for each instance should be considered as a whole, some insights for individual parameters can be drawn. For instance, Table 11 suggests that a low T x 2 temperature is likely to reduce the occurrence of attribute A02, resulting A C C E P T E D M A N U S C R I P T in a "safety interval". A "safety region" could be constructed when more than a single parameter is considered: this may be done, for instance, taking the convex hull of the parameters setting of the top ranked instances. However, further investigation is necessary, involving the production specialists, especially for dealing with possibly conflicting suggestions corresponding to different parameters/attributes.

Significance to EMA of SVRM results
We explicitly notice that EMA products' attributes are undesirable characteristics (defects) thus the ultimate goal of our analysis was to suggest production parameters that may relate to them.
The discussion with EMA experts about SVRM results brought additional insights.
From Table 2 it is clear that attributes A01 and A02 have a greater weight (i.e. they are more frequent) than others. Data in Tables 10 and 11 show that these attributes relate to the independent variables T x 2 and T furn . Their measures refer to the process A C C E P T E D M A N U S C R I P T to furnace parameters using standard multivariate statistical techniques (such as those reported in [28]); however they did not obtain any relevant correlation.
Moreover SVRM results lead to some suggestions that EMA staff could exploit for practical applications. According to Tables 10-11, the variable T x 2 is significant for the two attributes. Specifically, the best ranked negative instances have quite different mean temperature. However, Table 11 shows that non defective instances have a moderate value of T x 2 , suggesting that by controlling that temperature some benefits could be provided. EMA process experts consider such a control as feasible and implementable.
Its potential benefits are remarkable: the decrease in the occurrences of A01 and A02 A C C E P T E D M A N U S C R I P T would lead to a reduction of scraps' cost. Referring scraps to the same production period when data have been analyzed, scraps costs related to non conforming products could be reduced up to 79% which translates in saving in production costs of 4% related to defect A01 and 7% for A02. EMA staff could either decide to make more specific analysis -as suggested in Section 3.2-or to start experiments on the production process to validate this study.

Comparison with alternative approaches
From a practical standpoint, the proposed methodology aims at inferring, from historical data, suitable values of production parameters in order to avoid some undesirable attributes. Thus, in principle, any technique aiming at inferring rules from data can be considered as an alternative. Decision trees are recognized as "simple" and "useful for interpretation" [17] thus seem an adequate approach for the problem at hand; in that case, indeed, interpreting the structure of the classifier is instrumental to getting the set point values of the production parameters. As a first comparison, in Table 8 we report the performance, in terms of AUC score, of the SVM classifiers and those of decision tree classifiers. The latter have been trained to detect separately the presence or the absence of the attributes A01, A02, A03, based on the same features of the SVMs.
We have used the porting of the CART algorithm [29] into the Matlab framework (namely the fitctree function), adopting a 10-fold cross validation based optimization of the decision tree parameters. Moreover, with the aim of taking into account the class unbalancing, we set a proper misclassification cost matrix, for the decision tree to be trained, employing a weighting approach similar to that applied in the SVM cases (misclassification cost inversely proportional to the number of elements belonging to a particular class). Notice, by comparing AUC scores, that the performance of the decision trees are similar to those of the SVM, although slightly worst. We don't report the whole set of confusion matrices of the decision tree classifiers, but it's worth noting that, as for the SVM, the confusion matrices are upper triangular, confirming that a subset of the instances not carrying the considered attribute are separated from all the remaining instances by the decision surface.

A C C E P T E D M A N U S C R I P T
To discuss the possibility of exploiting the decision tree for inferring parameter values, we take, as an example, the attribute A02. Using the training procedure described above, we obtained a decision tree whose confusion matrix is reported in Table 9. Notice, by comparing to the confusion matrix of Table 4, that the performance of the decision tree are close to those of the SVM; moreover, the confusion matrix is upper triangular, as in the SVM case. In other words, the situation is again the one represented in Fig. 6. As for the interpretation of the obtained classifier, a graphical representation of the decision tree is provided in the supplementary material; the tree has 24 levels, 187 nodes and 94 leaves, thus suitable ranges of production parameters can't be inferred by simply inspecting the decision tree (or at least this would require a lot of time). Moreover, at training step much more complex trees could be obtained. Thus, the decision tree alone cannot be considered as a systematic approach for this kind of problems, because there is no guarantee that the resulting tree will be human-readable. As a consequence, general methods based on decision trees must encompass some further step of analysis of the decision tree obtained in the first place. A first possibility is to follow [30] i.e.
formulating an inverse classification problem where the features of an incomplete set are sought in such a way that will result in a desired classification outcome. In [30], the inverse classification problem is cast as the relaxation of a combinatorial problem and requires the discretization of the feature domain and the training of another decision tree. We do not go any further in this direction here but we stress that the approach requires a considerable implementation effort. Another possibility could be performing an instance ranking, to find the "most central" feature, i.e. the same approach we propose here by means of SVRM. In the case of decision trees, however, an appropriate ranking criterion is far from clear, since there is no notion of "distance from the decision surface". Intuition could suggest, as a ranking criterion, the number of decision nodes to be evaluated for classifying a given instance: the more the nodes, the more difficult is the instance to be classified and therefore it is closer to the decision boundary. However, this heuristic seems not solid, since the number of nodes can vary, for a given instance, depending on some arbitrary and equivalent choices in the tree construction (different trees may lead to the same classifier). On the contrary, using the approach proposed in this paper, it is only necessary to compute, for each instance, the value (2) which is an elementary computation, based on the kernel and the support vectors obtained as the output of the training phase.

Conclusion
We have presented an analysis, based on the Support Vector Representation Machine approach, which aims to characterize a set of instances belonging to a class of interest.
That characterization is expressed in terms of an inequality that the instances must satisfy. The instances are ordered based on their distance from the failure to satisfy the inequality, and subsequently some statistics are computed on the percentiles thus obtained (the first percentile being the one farthest from the boundaries of the considered set). The proposed methodology can be applied when a classifier exhibiting zero false negative can be trained with the given data, meaning that a subset of instances that do not posses a given attribute can be discriminated with no error. The underlying idea is that those instances that are easily classified as not possessing an attribute, may suggest a production parameter setting likely not to produce that attribute. The study reported is intended as preliminary with respect to an analysis involving experts in the production process to derive operational indications (in terms of set-points of certain process parameters) to increase the presence of the attribute in case it is desirable or to decrease it in the opposite case. Such a further analysis is matter of future work.

ACCEPTED MANUSCRIPT
A C C E P T E D M A N U S C R I P T    Table 11: Mean value of each feature when only the top 1% of "no attribute A02" data has been considered versus mean value of features when considering the whole set of no-A02 labelled data (i.e. the true negatives in Table 4).
mean value (no-A02) A C C E P T E D M A N U S C R I P T Table 12: Mean value of each feature when only the top 1% of "no attribute A03" data has been considered versus mean value of features when considering the whole set of no-A03 labelled data (i.e. the true negatives in Table 5).
mean value (no-A03)  Table 13: Mean value of each feature when only the top 1% of all data corresponding to "no any attribute" has been considered versus mean value of features when considering the whole set of production data correctly classified as "without any attribute" (i.e. the true negatives in Table 6). mean value (no attributes)