CLUSTERING FOR ITEM DELIVERY USING RULE- K -MEANS

. In this paper, we introduce an alternative approach as model for cluster analysis. The data were analyzed by rule- k -means algorithm. It’s combine between k -means algorithm and rules. As an application, we use the simulate of item delivery data to classify items based on destination addresses. The goal is to map the item based on type of delivery vehicle. The clustering can be used as a recommendation to the item delivery service company.


INTRODUCTION
At present, online business is very supportive of one's economy. The item delivery service industry occupies one of the central positions in the economy of modern society and is a driver of doing business both long and near. This certainly supports an increase in prosperity, especially in developed countries. Whereas in developing countries, item delivery service industry is very important to expand the development foundation and meet the increasing needs of the community.
Development of online business which is supported by the availability of item delivery services is very suitable in Indonesia, considering that Indonesia is a developing country which consists of a vast archipelago. Both the development of online bussiness and the item delivery service industry have a positive impact on human life. The positive impact of these activities is to be able to reduce the percentage of unemployment, and attract investors to invest in Indonesia.
There are many young people and adults who start online business. Likewise accompanied by the emergence of item delivery service companies that are easily accessible in various regions. Especially for item delivery companies, item delivery activities must be carried out as effectively as possible so that the company can generate large profits. From here the company can determine how many tools and types of vehicles must be purchased, and must use public transportation modes if needed. So that online business people get good service.
In the item delivery, a lot of information is implied in the item. Based on statistical analysis, there are criteria for items that can be expressed as variables. The more criteria the item will be the more complicated the statistical analysis will be carried out. Multivariate analysis is a one of statistical method that is suitable for summarizing data with many variables.
The one of multivariate analysis that can be used to understand and simplify data interpretation is cluster analysis. Cluster analysis aims to classify objects based on the characteristics between these objects, so that they can be identified with the characteristics of each group. Specifically for item delivery services, many objects can be grouped on the item with cluster analysis such as weight, volume, accessibility of the destination address of the item, and others.
In cluster analysis, then we call clustering, is a method for finding and grouping data that has similar characteristics between one data and another data. In addition clustering is an one of the data mining methods that are unsupervised, this method is applied without training and without guidance, and does not require an output target. The data mining is a method of data processing to find hidden patterns of data, so that the results of data processing can be used to make decisions.
K-means method has been widely applied in various such as in education (Trivedi et al. [1]), general election party (Ralambondrainy [2]), credit approval and soybean desease(Huang [3]), heart desease and card credit (Huang et al. [4]), color quantization (Celebi [5]; Dhanachandra [6]), DNA Microarray (Sahu et al. [7]), etc. Next suitable for use in item delivery services. Cluster analysts that are used specifically for item delivery activities use non-hierarchical methods. Basically, there are many ways to allocate data back into each cluster during the iterative clustering process in this method. One of these methods is allocation by a strict method, where data items are expressly stated as one cluster member and not a member of another cluster. This type of method is called k-means.
Reallocation of data into each cluster in the k-means algorithm is based on a comparison of the distance between the data and centroid of each cluster. Data is allocated explicitly to clusters that have the closest data center to the data (Everitt et al. [8]; Oliveira and Pedrycz [9]; Sugiyama [10]; Anderberg [11]; MacQueen [12]). In this paper, we have the extention of k-means algorithm. The data were analyzed by rule-k-means algorithm. It's combine between k-means and rules to find new cluster.
From some descriptions of item delivery industry and information on cluster analysis, it is expected that the problem of the effectiveness of item delivery can be easily done. The goal is to map every item that will be sent to the type of item delivery vehicle. So that company owners can compete with others.
Exposure to the methods used is presented in section 2 after the introduction. For data simulation processing using rule-k-means is presented in section 3. The summary and future work of this paper are presented in section 4.

THE RULE-K-MEANS METHOD
The data used is item delivery simulation data that contains information item on the weight, volume, and type of road. It is must be distributed on that day. The data consist of 3 features, where 2 features are numerical and 1 feature are categorical. The categorical features needs to be quantified, here we use the weighting approach. Next, the data will be standardized.
Step 1. Let X = {X 1 , X 2 , ..., X n } be a set of n objects. X i = (x i,1 , x i,2 , ..., x i,m ) is characterized by set of m feature. The k-means type algorithms (Anderberg [11]; MacQueen [12]; Bezdek [14]) search for partition of X into k clusters that minimizes the objective function J with unknown varibles U and C as follows: where U is an n × k partition matrix, u i,l is 0 and 1,u i,l = 1 indicates that object i is allocated to cluster l; C = {C 1 , C 2 , ..., C k } is a set of k vectors representing the centroids of the k clusters; d(x i,j , c l,j ) is distance between object i and the centroid of cluster l on the jth feature. The distance is euclidean. If the feature is numerical, then If the feature is categorical, then X i = (x i,1 , x i,2 , ..., x i,m ) for Subject to where y is categorical feature, and w is weighting.
The above optimization problem can be solved by iteratifely solving the following two minimization problems: 1. Fix C =Ĉ and solve the reduced problem J(U, C). Problem J 1 is solved by 2. Fix U = Û and solve the reduced problem J(U, C). Problem J 2 is solved by Step 2. After obtaining clusters from step 1, and then to make new clusters based on the rules. Let K = {K 1 , K 2 , ..., K k } be a set of k clusters from step 1 where cluster K i = {X 1 , X 2 , ..., X i }, the new cluster G i = {X 1 , X 2 , ..., X i } is obtained by rules in Table 1. In the result of Step 2, we have end clusters.

RESULT AND DISCUSSION
The data of simulation contains information on items to be sent with a specific destination address. Data consists of 102 records and presented in Table 2. After each item is clustered based on features, then items are grouped for delivery using the type of item delivery vehicle based on rules.
This data set contains 2 numerical and 1 categorical. The features are standardized by Z score . The results of processing data with rule-k-means are; features are clustered into 3 clusters; initial centroid of clusters are determined randomly, that are shown in Table 3; the clusters results for each variable are shown in Table  4 and Table 5; the centroid of clusters are shown in Table 6.   The clusters in weight and volume, cluster 1 shows the characteristics of the most heavy items, while cluster 2 shows characteristics of items with a weight and volume smaller than cluster 1, and cluster 3 shows characteristics of items that weight and volume smaller than cluster 2. From 102 data, cluster 1 consists of 8 objects, cluster 2 consists of 14 objects, and cluster 3 consists of 80 objects. The clusters in type of road, cluster 2 shows the type of road with the largest weight, while cluster 3 shows the type of road with a weight smaller than cluster 2, and cluster 1 shows the type of road which is smaller than cluster 2. From 102 data, cluster 1 consists of 73 objects, cluster 2 consists of 7 objects, and cluster 3 consists of 22 objects. In this case, iteration of data clustering occurs in 2 iterations of cluster in weight and volume, 2 times iterations of clusters in type of road. In these iterations, the centroid of each cluster has not changed and there is no more data moving from one cluster to another.
For labeling the type of vehicle shown in Table 7. From the results of the clustering, a decision was made to determine the type of item delivery vehicles with the rules shown in Table 8. In this case, we have 9 rules for mapping item delivery. Example: if we have items with first cluster in weight and volume, and second cluster in type of road, then the item delivery use a large box car. All rules are shown in Table 8.
Based on the regulation, a decision is obtained in Table 9. The final decision obtained information that the large box car will deliver 8 objects, 21 objects will be delivered by the small box car, and 73 objects will be delivered by motorcycle.

SUMMARY AND FUTURE WORK
Quantification of categorical data through weighting results is better cluster characteristics than using the rank approach, and converts it into binary. The combination of k-means and rules (rule-k-means) to obtain a new cluster makes it easier to group objects according to the desired characteristics or objectives. For the grouping, the practice of item delivery services companies using clustering methods and then use rule-k-means algorithm to determine the types of vehicles to be used. In future works, we will work on rule-k-means algorithm with traveling salesman problem for effective route.