期刊名称:International Journal of Database Management Systems
印刷版ISSN:0975-5985
电子版ISSN:0975-5705
出版年度:2011
卷号:3
期号:1
DOI:10.5121/ijdms.2011.3106
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:The various problems in large volume of data area have been solved using frequent itemset discovery algorithms. As data mining techniques are being introduced and widely applied to non-traditional itemsets, existing approaches for finding frequent itemsets were out of date as they cannot satisfy the requirement of these domains. Hence, an alternate method of modeling the objects in the said data set, is graph. Modeling objects using graphs allows us to represent an arbitrary relation among entities. The graph is used to model the database objects. Within that model, the problem of finding frequent patterns becomes that of finding subgraphs that occur frequently over the entire set of graphs. In this paper, we present an efficient algorithm for ranking of such frequent subgraphs. This proposed ranking method is applied to the FP-growth method for discovering frequent subgraphs. In order to find out the ranking of subgraphs we present a new normalization technique which is the modified normalization technique applied at each position for a chosen value of Discounted Cumulative Gain (DCG) of a subgraph. Instead of DCG another modified approach called Modified Discounted Cumulative Gain (MDCG) is introduced. The MDCG alone cannot be used to achieve the performance from one query to the next in the search engine’s algorithm. To obtain the new normalization technique an ideal ordering of MDCG (IMDCG) at each position is to be found out. A Modified Discounted Cumulative Gain (MDCG) is calculated using “lift” as a new approach. IMDCG is also evaluated. Then the new approach for finding the normalized values are to be computed. Finally, the values for all rules can be averaged to get an average performance of a ranking algorithm. And also the ordering of obtained values as a result at each position will provide the order of evaluation of rules which in turn gives an efficient ranking of mined subgraphs.