In the content of this chapter material, you could read section 5.1, 5.2.1 and 5.2.2 to leaning details about the ideas and procedures to mine valid association rules, which are identical to the content Professor Chen introduced to you in class.

Note that you do not need to pay more attention to the algorithm or codes of this method. Instead, ideas and related examples are more important for you to understand this method and it is enough to help you complete the assignment.

Furthermore, to resolve the problem 2.(c) in EXERCISE 3, you need to read section 5.3.1 to know how to do. This part gives you the concept of multi-level association rule or generalized association rule.

基本阅读:英文资料 5.1,5.2.1 和 5.2.2,这部分内容与老师上课所介 绍的内容一致,不必过分专注于其中的算法和代码部分,更重要的是 理解方法意思,过程及其中的相关例子。扩展阅读:为了解决作业问 题 2 中的(c)小问,你还最好阅读 5.3.1 部分。

Mining Frequent Patterns, Associations, and Correlations
Frequent patterns are patterns (such as itemsets, subsequences, or substructures) that appear in a data set frequently. For example, a set of items, such as milk and bread, that appear frequently together in a transaction data set is a frequent itemset. A subsequence, such as buying first a PC, then a digital camera, and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential pattern. A substructure can refer to different structural forms, such as subgraphs, subtrees, or sublattices, which may be combined with itemsets or subsequences. If a substructure occurs frequently, it is called a (frequent) structured pattern. Finding such frequent patterns plays an essential role in mining associations, correlations, and many other interesting relationships among data. Moreover, it helps in data classification, clustering, and other data mining tasks as well. Thus, frequent pattern mining has become an important data mining task and a focused…...

Data Mining

Predictive analytics are used to automatically analyze large amounts of data with different variables; it includes clustering, decision trees, market basket analysis, regression modeling, etc. There are three main benefits of predictive analytics: minimizing risk, identifying fraud, and pursuing new sources of revenue. Being able to predict the risks involved with loan and credit origination, fraudulent insurance claims, and making predictions with regard to promotional offers and coupons are all examples of these benefits. This type of algorithm allows businesses to test all sorts of situations and scenarios it could take years to test in the real world. Investing in learning customer behavior gives businesses a competitive edge over competition in their market place.

Quantitative Association Rule Mining Using Information-Theoretic Approach

Quantitative Association Rule (QAR) mining has been recognized an influential research problem due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike Boolean Association Rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information than the boolean attributes. To develop a data mining system for huge database composed of numerical and categorical attributes, there exists necessary process to decide valid quantization of the numerical attributes. One of the main problems is to obtain interesting rules from continuous numeric attributes. In this paper, the Mutual Information between the attributes in a quantitative database is described and normalization on the Mutual Information to make it applicable in the context of QAR mining is devised. It deals with the problem of discretizing continuous data in order to discover a number of high confident association rules, which cover a high percentage of examples in the data set. Then a Mutual Information graph (MI graph), whose edges are attribute pairs that have normalized Mutual Information no less than a predefined information threshold is constructed. The cliques in the MI graph represent a majority of the frequent itemsets.

