1. Model Introduction
(1). The initial motivation for association rule is Market Basket Analysis. This process is to discover the association between different products that customers put in the “shopping basket”, and analyze the shopping habits that customers purchase different products frequently at the same time. Similarly, we use association rule to discover the relationship between users based on their interaction. Based on any of the three association rules: support, confidence, and lift, the relationship coefficient is calculated to screen out important relationship combinations. The larger the coefficient value, the more frequent the interaction and the closer the relationship, the smaller the coefficient value, the less the interaction and more distant the relationship.
(2). Statistical indicators
a. Node Degree: The node degree refers to the number of edges associated with the node, also known as the degree of association.
b. Degree Centrality: The total amount of direct connections between a node and other nodes, normalized by the maximum possible degree. Due to the existence of cycles, this value may be bigger than 1. In a directed graph, the in-center degree (or in-degree) and the out-of-center degree (or out-degree) of the points are divided according to the direction of the connection. It measures the individual value of the node.
c. Closeness Centrality: the reciprocal of the sum of the distances from the node to all other nodes, normalized by the minimum distance. It reflects the closeness of the node to other nodes, the greater the closeness centrality value, the faster to reach other nodes. It measures the value of the node’s network.
d. Betweenness Centrality: the number of shortest paths passing through a node, normalized by the largest possible value. It measures the ability of a node to adjust between other nodes.
e. Co-occurrence: the number of times two nodes appear together.
f. Network Density: is used to describe the density of interconnected edges between nodes in the network. It is commonly used in social networks to measure the density of social relationships and the evolutionary trend.
(3). Word classification (community structure)
In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the particular case of non-overlapping community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. Based on the heuristic method of modularity optimization proposed by Vincent D. Blondel et al. in 2008. See details:
2. R&D Bases
(1). Wang Gonghui, Liu Weijiang. Are based on keywords of the text information analysis method and its application – An example of credit rating. Sciencepaper Online. 2010.
3. Algorithm Description
Association rules:
(1) Support: the ratio of the number of relations that contain both character A and character B to the number of all relations in the set.
(2) Confidence: the ratio of the number of relations that contain both character A and character B to the number of relations that contain character A in the set (conditional probability).
(3) Lift: the probability of whether the number of co-occurrences is higher than appearing alone in the relation set after applying the rule.
4. Restrictions and Limitations
More user nodes will generate more arbitrary combinations between users and the longer calculation time, therefore the requirements for computing resources will increase accordingly. The process is to calculate the relationship coefficient of any pair of combinations, and filter the important combinations based on the association rules. The current upper limit for output display is 1000 groups of user relationship combinations.