Skip to content

Instantly share code, notes, and snippets.

@monapasan
Created September 30, 2021 13:16
Show Gist options
  • Save monapasan/adb95a22a8286b22897092934e81683e to your computer and use it in GitHub Desktop.
Save monapasan/adb95a22a8286b22897092934e81683e to your computer and use it in GitHub Desktop.
Request example slow query Prima
{
"context": {
"name": "bigdaddymax",
"body": "Social Networks 28 (2006) 466–484\nA Graph-theoretic perspective on centrality\u0001\nStephen P. Borgatti a,∗, Martin G. Everett b,1\na Department of Organization Studies, Boston College, Chestnut Hill, MA 02467, USA b University of Westminster, 35 Marylebone Road London NW1 5LS, UK\nAbstract\nThe concept of centrality is often invoked in social network analysis, and diverse indices have been\nproposed to measure it. This paper develops a unified framework for the measurement of centrality. All\nmeasures of centrality assess a node’s involvement in the walk structure of a network. Measures vary along\nfour key dimensions: type of nodal involvement assessed, type of walk considered, property of walk assessed,\nand choice of summary measure. If we cross-classify measures by type of nodal involvement (radial versus\nmedial) and property of walk assessed (volume versus length), we obtain a four-fold polychotomization\nwith one cell empty which mirrors Freeman’s 1979 categorization. At a more substantive level, measures\nof centrality summarize a node’s involvement in or contribution to the cohesiveness of the network. Radial\nmeasures in particular are reductions of pair-wise proximities/cohesion to attributes of nodes or actors.\nThe usefulness and interpretability of radial measures depend on the fit of the cohesion matrix to the onedimensional model. In network terms, a network that is fit by a one-dimensional model has a core-periphery\nstructure in which all nodes revolve more or less closely around a single core. This in turn implies that the\nnetwork does not contain distinct cohesive subgroups. Thus, centrality is shown to be intimately connected\nwith the cohesive subgroup structure of a network.\n© 2005 Elsevier B.V. All rights reserved.\n1. Introduction\nCentrality is a fundamental concept in network analysis. Bavelas (1948, 1950) and Leavitt\n(1951) used centrality to explain differential performance of communication networks and network members on a host of variables including time to problem solution, number of errors,\nperception of leadership, efficiency, and job satisfaction. Their work led to a great deal of\n\u0001 Earlier versions of this paper were presented at the 1991 NSF Conference on Measurement Theory and Networks\n(Irvine, CA), 1992 annual meeting of the American Anthropological Association (San Francisco), and the 1993 Sunbelt\nXII International Social Network Conference (Tampa, FL). ∗ Corresponding author. Tel.: +1 617 552 0452; fax: +1 617 552 4230.\nE-mail addresses: borgatts@bc.edu (S.P. Borgatti), M.Everett@wmin.ac.uk (M.G. Everett). 1 Tel.: +44 2083 318 716.\n0378-8733/$ – see front matter © 2005 Elsevier B.V. All rights reserved.\ndoi:10.1016/j.socnet.2005.11.005\nS.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484 467\nexperimental, empirical, and theoretical research on the implications of network structure for\nsubstantive outcomes, particularly in the context of organizations. Centrality has been used to\ninvestigate influence in interorganizational networks (Laumann and Pappi, 1976; Marsden and\nLaumann, 1977; Galaskiewicz, 1979), power (Burt, 1982; Knoke and Burt, 1983), advantage\nin exchange networks (Cook et al., 1983; Marsden, 1982), competence in formal organizations\n(Blau, 1963), employment opportunities (Granovetter, 1974), adoption of innovation (Coleman\net al., 1966), corporate interlocks (Mariolis, 1975; Mintz and Schwartz, 1985; Mizruchi, 1982),\nstatus in monkey grooming networks (Sade, 1972, 1989), power in organizations (Brass, 1984)\nand differential growth rates among medieval cities (Pitts, 1979). In addition, many other studies\nuse well-known measures of centrality but do not identify them as such. For example, researchers\nworking with ego-networks use the term “network size” (Campbell et al., 1986; Deng and\nBonacich, 1991) to refer to a variable that in another context we would recognize as degree\ncentrality.\nWhile many measures of centrality have been proposed, the category itself is not well defined\nbeyond general descriptors such as node prominence or structural importance. In addition, people\npropose all kinds of interpretations of centrality measures, such as (potential for) autonomy,\ncontrol, risk, exposure, influence, belongingness, brokerage, independence, power and so on. The\none thing that all agree on is that centrality is a node-level construct. But what specifically defines\nthe category? What do all centrality measures have in common? Are there any structural properties\nof nodes that are not measures of centrality?\nSabidussi (1966) tried to provide a mathematical answer to these questions. He suggested a set\nof criteria that measures must meet in order to qualify as centrality measures. For example, he felt\nthat adding a tie to a node should always increase the centrality of the node, and that adding a tie\nanywhere in the network should never decrease the centrality of any node. These requirements are\nattractive: it is easy to see the value of separating measures that are “well-behaved” from measures\nthat behave less intuitively. However, there are problems with Sabidussi’s approach. For one thing,\nit turns out that his criteria eliminate most known measures of centrality, including betweenness\ncentrality. This is clearly unsatisfactory. Furthermore, while his criteria provide some desirable,\nprescriptive characteristics for a centrality measure, they do not actually attempt to explain what\ncentrality is.\nFreeman (1979) provided another approach to answering the ‘what is centrality’ question. He\nreviewed a number of published measures and reduced them to three basic concepts for which he\nprovided canonical formulations. These were degree, closeness and betweenness. He noted that\nall three attain their maximum values for the center of a star-shaped network. It can be argued\nthat this property serves as a defining characteristic of proper centrality measures.\nBorgatti (2005) has recently proposed a dynamic model-based view of centrality that focuses\non the outcomes for nodes in a network where something is flowing from node to node across the\nedges. He argues that the fundamental questions one wants to ask about individual nodes in the\ndynamic flow context are (a) how often does traffic flow through a node and (b) how long do things\ntake to get to a node. Once these questions are set, it becomes easier to construct graph-theoretic\nmeasures based on the structure of the network that predict the answers to these questions. Hence,\nin this approach, measures of centrality are cast as predictive models of specific properties of\nnetwork flows.\nIn this paper, we present an alternative perspective that eschews the dynamic element and is\nfundamentally structural in character. It is a graph-theoretic review of centrality measures that\nclassifies measures according to the features of their calculation. Whereas the model-based view is\ncentered on the outcomes of centrality, the graph-theoretic view is centered on the way centrality\n468 S.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484\nmeasures are calculated. In short, the present perspective is a means-based classification rather\nthan the ends-based classification presented by Borgatti (2005).\n2. Terminology\nFor simplicity (and in accordance with centrality convention), we will assume that all networks\non which we might compute centrality measures consist of undirected graphs G(V, E), in which\nV is a set of nodes (also called vertices, points or actors) and E is a set of edges (also called ties or\nlines) that connect them. Many centrality measures can be discussed in terms of directed graphs\nas well, but this topic is not treated here. It will be helpful to represent a graph in terms of its\nadjacency matrix A, in which aij = 1 if (i, j) is in E.\nNodes that are not adjacent may nevertheless be reachable from one to the other. A walk from\nnode u to node v is a sequence of adjacent nodes that begins with u and ends with v. A trail is a\nwalk in which no edge (i.e., pair of adjacent nodes) is repeated. A path is a trail in which no node\nis visited more than once.\nThe length of a walk is defined as the number of edges it contains, and the shortest path between\ntwo nodes is known as a geodesic. The length of a geodesic path between two nodes is known\nas the geodesic or graph-theoretic distance between them. We can represent the graph theoretic\ndistances between all pairs of nodes as a matrix D in which dij gives the length of the shortest\npath from node i to node j.\n3. Comparison of methods\nTo explain the graph-theoretic perspective, we begin by considering a sample of centrality\nmeasures and examining how they are computed. In a process similar to the anthropological\ntechnique of componential analysis, we extract dimensions along which measures vary. These are\nthen used to develop a three-way typology of measures. We organize the discussion around the\nthree best-known measures of centrality: degree, closeness and betweenness (Freeman, 1979).\n3.1. Degree-like measures\nWe begin by considering one of the simplest and best-known measures of centrality: degree\ncentrality. As defined by Freeman (1979), degree centrality is a count of the number of edges\nincident upon a given node. As shown in Eq. (1), it can be computed as the marginals of the\nadjacency matrix A:\ncDEG\ni = \u0001\nj\naij (1)\nWe can express this in matrix notation as CDEG = A1, where 1 is a column vector of ones.\nIt is useful to recognize that every edge is a walk of length 1. Consequently, we can think\nof degree centrality as counting the number of paths of length 1 that emanate from a node.2\nDegree centrality is therefore a special case of the measure proposed by Sade (1989) called k2 As noted earlier, we assume an undirected graph. Hence, we can equally well describe degree in terms of the number\nof paths of length 1 that terminate at a node.\nS.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484 469\npath centrality3 which counts all paths of length k or less that emanate from a node. When k = 1\n(its minimum value), the measure is identical to degree centrality. When k = n − 1 (its maximum\nvalue), the measure counts the total number of paths of any length that originate at a given\nnode.\nIn order to establish the commonality of structure across measures of centrality, it is useful to\nnote that k-path centrality may be computed as the marginals of a matrix W in which wij is the\nnumber of paths of length k or less from node i to node j. That is, CK − path = W1.\nOther variations on this theme may be obtained by choosing different restrictions on the kinds\nof paths counted. For example, if we are only interested in shortest paths, we can define geodesic\nk-path centrality as the number of geodesic paths up to length k emanating from a given node. We\ncan think of this as measuring the amount of direct involvement that a node has in the geodesic\nstructure of the network.\nAnother variation is to count only edge-disjoint paths. Edge-disjoint paths are paths which share\nno edges. Counting the number of edge-disjoint paths up to length k that originate or terminate at a\ngiven node yields a centrality measure we shall call edge-disjoint k-path centrality. Disjoint k-path\ncentrality measures can be thought of as inverse measures of vulnerability. This interpretation is\nbased on a theorem by Ford and Fulkerson (1956) which states that the number of edge-disjoint\npaths linking two nodes is equal to the minimum number of edges that must be deleted in order\nto disconnect the two nodes.4 In a network in which ties are subject to destruction (as in roads\nin a war zone), a disjoint k-path centrality measure assesses how difficult it would be to isolate a\ngiven node.\nA variant of disjoint k-path centrality counts the number of vertex-disjoint paths up to length\nk rather than edge-disjoint paths. Vertex-disjoint paths are those which share no vertices (except\nthe two end nodes). The set of such paths in a graph is a subset of the set of edge-disjoint paths.\nMenger (1927) showed that the number of vertex-disjoint paths linking two nodes is equal to the\nnumber of nodes that must be removed from a graph in order to isolate the two nodes from each\nother. The measure of social proximity developed by Alba and Kadushin (1976), and used as a\nbasis for detecting social circles, is a (normalized) count of all vertex-disjoint paths of length 2\nor less connecting any two nodes. We call a measure counting the number of vertex-independent\npaths that originate or terminate at a given node a vertex-disjoint k-path centrality measure. The\nGPI power measure of Markovsky et al. (1988) is a vertex-disjoint k-path centrality measure,\nwhich subtracts the number of even-length vertex-disjoint paths emanating from a node from\nthe number of odd-length vertex-disjoint paths emanating from the same node. All reachability\nmeasures (Higley et al., 1991), which count the number of nodes a given node can reach in a given\nnumber of links, are vertex-disjoint k-path centrality measures.\nThus far, we have only considered variations of degree centrality which count true graphtheoretic paths. However, a number of measures count all walks, including those that visit the\nsame nodes repeatedly. Katz’s (1953) measure of centrality is a weighted count of the number\nof walks originating (or terminating) at a given node. The walks are weighted inversely by their\nlength so that long, highly indirect walks count for little, while short, direct walks count for a\ngreat deal. The extent to which the weights attenuate with length is controlled by an arbitrary\n3 Sade actually used the term n-path centrality, but since n is usually reserved for the number of nodes in a network,\nwe have used k instead. 4 The number of edge-disjoint paths between two nodes is also equal to the maximum flow between them (Ford and\nFulkerson, 1962).\n470 S.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484\nparameter supplied by the researcher. Katz’s measure is defined as follows:\nwij = baij + b2(a2)ij +···+ bk(ak)ij +··· =\ninfinity\n\u0001\nk=1\nbk(ak)ij\nci = \u0001\nj\nwij\n(2)\nIn matrix notation, CKATZ = W1. Eq. (2) is based on the fact that the number of walks of length\nk between all pairs of nodes is given by the kth power of the adjacency matrix. The series is\nguaranteed to converge only if b is chosen to be smaller than the reciprocal of the largest eigenvalue\nofA. Hubbell (1965) proposed a measure very similar to Katz’s, but which allows for the possibility\nof taking a weighted row sum of W. The weights are potentially but not necessarily derived from\nthe network itself. If the weighting vector e is chosen to be all ones, Hubbell’s measure equals\nKatz’s minus 1. If e is chosen to be the degree of each node, as Hoede (1978) suggested, the result\nis that Katz’s and Hubbell’s measures are identical. Friedkin (1991) has developed a measure\ncalled total effects centrality which is equal to Katz’s divided by the constant 1 − b.\nBonacich (1987) writes a variant of Katz’s measure in slightly more general terms as follows:\nwij = δ(aij + b(a2)ij + b2(a3)ij +···) = δ\ninfinity\n\u0001\nk=1\nbk(ak+1)ij\nci = \u0001\nj\nwij\nC = W1\n(3)\nWhen b is positive, Bonacich’s and Katz’s measures are perfectly correlated. When b is negative,\nthe two measures are perfectly negatively correlated. A key contribution of Bonacich’s was to\nrealize that b could be negative and that this would have a substantive interpretation in exchange\nnetworks (Cook et al., 1983; Markovsky et al., 1988). Indeed, Bonacich’s measure predicts power\nuse in experimental exchange networks very nicely. This is interesting because a negative value\nfor b means that Eq. (3) effectively subtracts the number of even-length walks from the number\nof odd-length walks. This is exactly the same as the other well-known measure of power that\nemerges from the experimental exchange network literature, the GPI index of Markovsky et al.\nAs Markovsky et al., point out, having many alters one link away from a node enhances that\nnode’s bargaining power, but having many alters two links away enhances the power of the node’s\nfirst-order alters, and so on. More generally, a basic principle in exchange networks is that a node\nis powerful to the extent that it is connected to weak alters. In turn, a node is weak if it is connected\nto powerful alters. Interestingly, these descriptions resemble the hub and bridge distinctions of\nMintz and Schwartz (1981a,b) and Mizruchi et al. (1986). The principal difference between GPI\nand the Bonacich power measures is that the former counts only vertex-disjoint paths while the\nlatter counts all walks (weighted inversely by length).\nAnother way of interpreting the walk-based measures is in terms of an intuitive notion5 that\na person’s centrality should be a function of the centrality of the people he or she is associated\nwith. In other words, rather than measure the extent to which a given actor “knows everybody”,\n5 Probably originating with Alexander (1963), but clearly evident in Bonacich (1972) as well.\nS.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484 471\nwe should measure the extent to which the actor “knows everybody who is anybody”. Hubbell’s\nmeasure can be written as follows:\ncHUB = XcHUB + e (4)\nwhere X is a matrix derived from A, and e is the weighting vector of possibly exogenous contributions to status. Katz’s and Hoede’s measures are a special case of Hubbell’s in which e is\nequal to the row sums of X, and X = bA. Thus, both Katz and Hubbell can be seen as “implicit”\ncentrality measures in which the centrality of a node is given by the weighted row sums of an\nadjusted adjacency matrix, where the weights are the centralities of the columns.\nBonacich (1972) noted the similarity of Eq. (4) to the definition of an eigenvector (Eq. (5)) and\nrecommended that the principal eigenvector (associated with the largest eigenvalue) be used as a\ncentrality measure. He has shown (Bonacich, 1991) that the eigenvector of A is the limit of Katz’s\nmeasure as b approaches 1/λ from below. Thus, the eigenvector can be regarded as an elegant\nsummary of Katz’s, Hoede’s and Hubbell’s measures.\nv = λ−1Av (5)\nHaving defined the eigenvector v of adjacency matrix A, we can calculate the W matrix in Eq. (3)\nmore simply, as follows:\nwij = aijvj (6)\nColeman’s (1973) Power and Burt’s (1982) prestige are applications of the eigenvector measure\nto specific types of data. In Coleman’s case, the matrix A’ is restricted to the relation “depends\non”, while in Burt’s case A’ is a non-symmetric relation such as “likes”.6\nIt is apparent that the variations among the degree-based measures are due entirely to the kinds\nof restrictions placed on the kinds of walks counted. This defines one typological dimension that\nwe can use to classify measures. We refer to this dimension as Walk Type.\n3.2. Closeness-like measures\nIt is also apparent that all of the measures considered so far count the number or volume of\nwalks (of some kind) joining each node to all others. We shall refer to these as volume measures.\nAnother set of centrality measures assesses the lengths of the walks that a node is involved in.\nWe call these length measures. The distinction between volume measures and length measures\nforms another classificatory dimension, which we call Walk Property. It refers to what property\nof paths (their number or their length) is being measured.\nThe best-known distance measure is Freeman’s (1979) closeness centrality, which is defined\nas the total geodesic distance from a given node to all other nodes. As shown in Eq. (7), it is\ncomputed as the marginals of a geodesic distance matrix D:\ncCLO\ni = \u0001\nj\ndij (7)\nIn matrix notation CCLO = D1. This is clearly parallel to the degree-based measures discussed in\nthe previous section, with D playing the role of W. Since the number of nodes is fixed in a network,\n6 The descriptions are written in terms of A’ instead of A because Coleman and Burt take column sums rather than row\nsums as we do here.\n472 S.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484\nthe measure is equivalent to the mean distance of a node to other nodes. Closeness centrality is an\ninverse measure of centrality since larger values indicate less centrality. In this sense, it technically\nmeasures farness rather than closeness7.\nDirect measures of closeness (rather than farness) can be obtained by transforming the distance\nmatrix into a “nearness” matrix prior to computing row marginals. For example, Høivik and\nGleditsch (1975) recommend a linear transformation, as do Valente and Foreman (1998). The\nlatter approach is to take either the row sums (which they call radiality) or the column sums\n(integration) of the geodesic distance matrix subtracted from a constant.8 In contrast, Burt (1991)\nrecommends9 the following exponential transformation:\nsij = αdij (8)\nOther variants of closeness can be obtained by varying the way the initial distance matrix is\ndefined. In Freeman’s measure, closeness is based on geodesic distances. Each dij entry in the\ngeodesic distance matrix can be viewed as the minimum of the vector of lengths of all paths from\ni to j. However, if we do not believe that a given substantive phenomenon, such as diffusion of\ninformation, always makes use of the shortest paths, it makes sense to take into account all paths\nfrom i to j, perhaps by taking the median or mean length of all paths. The latter option is in fact\nthe approach taken by Friedkin (1991) in developing his immediate effects centrality, which is\ndefined as the reciprocal of the average distance from a given node to all others, where the distance\nbetween two nodes is defined (apart from scaling constants) as the average length of all paths\nbetween them. The problem with this, as we have discussed for other measures, is that many of\nthe paths we shall be averaging together will not be totally distinct from each other. The question\nis, should we give full weight to all paths, or should we try to take into account the fact that some\npaths are largely redundant?\nIf we think of each path from i to j as a vector, we can see that what we are looking for is the\nlength of a linear combination of the vectors in which some vectors are weighted more heavily\nthan others according to their distinctiveness. Thus, we seek a set of weights or coefficients that\nare optimal with respect to some well-specified criterion. The question is, what criterion? In a way,\nthe linear combination we want is the opposite of a factor or principal component. As Nunnally\n(1967) observes, the variance of a linear combination is high if the vectors are highly (positively)\ncorrelated, and low if they are not. Hence, we are looking for a linear combination that has as little\nvariance as possible, given some constraint on the weights. If we denote the kth path between\ntwo nodes as pk, the variance of the linear combination w1p1 + w2p2 +··· wkpk = Σkwkpk is\ngiven by\nVar \u0001\nk\nwkpk\n\n= \u0001\nk\n\u0001\nl\nwkwlσkl (9)\nwhere σkl refers to the covariance between the kth and lth paths. Thus, we seek a set of weights w\nthat minimize Eq. (9), subject to the constraint that Σwk = 1. After differentiating and rearranging\n7 A normalized version of closeness, in which the reciprocals of CCLO are multiplied by the number of nodes minus 1,\nsolves this terminological problem.\n8 It is sometimes claimed that this linear transformation enables us to measure closeness in disconnected graphs (i.e.,\nthose containing undefined distances) but this is not the case. In fact, if the constant is taken to be n, this approach gets\nthe same results (linearly rescaled) as simply replacing undefined distances with n, which is clearly unsatisfactory. 9 In somewhat different context.\nS.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484 473\nterms, we find that the optimal weights are the row marginals of the inverse of the covariance\nmatrix, divided by the sum of all entries. Using these weights, we can theoretically construct a\ncombined path from i to j whose length can be used as the distance between i and j. Computing\nthis distance for all pairs of nodes, a new measure of closeness centrality can be constructed by\ncomputing the row marginals of the distance matrix, or, as above, from the reciprocals of the\ndistance matrix.\nThe difficulty in all this, of course, is that we have not said how paths are to be represented as\nvectors. One possibility is a 0/1 indicator matrix X in which paths are columns, rows are edges,\nand values xij of the matrix indicate whether or not the ith edge occurs in the jth path. Note that\nthe length of a path is given by the column sums minus 1. We can then compute covariance in the\nusual way, solve for the optimal weights, and construct a minimum variance linear combination.\nThe length of the combined path is then given by the sum of its values.\nA different approach is taken by Stephenson and Zelen (1989), who propose that we simply\ndeclare the covariance of two paths to be the number of edges they have in common10. If we\npretend that Eq.(9)still holds, we can then solve for the linear combination of paths with minimum\n“variance”. Further, the variance of the best linear combination is then interpreted as its length,\nand therefore gives us the distance between i and j. The distances are converted to “nearnesses”\nby taking reciprocals, and a closeness measure is constructed by taking the harmonic mean of\neach row of the nearness matrix. Stephenson and Zelen invoke information theory to interpret the\nnearness matrix as “information”, and so they name their measure “information centrality”.\nA variation on closeness is what we shall call centroid centrality. The idea is that one first\nidentifies one or more nodes as the network centroid. Then, to calculate the centrality of any node,\none measures the distance from that node to the centroid. One obvious choice for the centroid is\nthe graph-theoretic center (Harary, 1969), which is the node (or pair of nodes if not unique) that\nhas the least eccentricity. A node’s eccentricity is the length of its longest geodesic path to another\nnode. We can then measure closeness as a node’s geodesic distance from the center. Of course,\nany criteria could be used to identify the central node(s), including other centrality measures.\nAnother approach is to embed the graph into a multidimensional metric space (Freeman, 1983),\nfind the location least distant from all nodes, and use the distance from that point to all others\nas centrality. This approach was used by Laumann and Pappi (1973). Here, the term “distance”\nrefers to Euclidean distance, or any other distance metric used to define the vector space.\n3.3. Betweenness-like measures\nAll of the measures considered so far—including both the volume and the length\nmeasures—assess walks that emanate from or terminate with a given node. We shall refer to these\nas radial measures. Another class of centrality measures exists which are based on the number\n10 Interestingly, when all paths are the same length, the covariance between paths (represented by the edge-path incidence\nmatrix described previously) really is related to the number of edges they have in common. Recall that covariance between\npaths k and l is defined as\n1\nn\n\u0001n\ni\nxikxil −\n\n1\nn\n\u0001n\ni\nxik 1\nn\n\u0001n\ni\nxil\nWhen all paths have the same length, the term to the right of the subtraction becomes a constant, so the covariance is a\nlinear transformation of Σixikxil, which is the number of edges shared by paths k and l.\n474 S.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484\nof walks that pass through a given node. We call these medial measures. The distinction between\nradial and medial measures forms the third classificatory dimension, which we call Walk Position.\nThe best known medial measure is Freeman’s betweenness centrality (Freeman, 1980). Loosely\ndescribed, the betweenness centrality of a node is the number of times that any actor needs a given\nactor to reach any other actor. A more precise definition is as follows: Let gikj denote the number\nof geodesic paths from node i to node j, and let gikj denote the number of geodesic paths from i\nto j that pass through intermediary k. Then the betweenness centrality is defined as follows:\nCBET\nk = \u0001\ni\n\u0001\nj\ngikj\ngij\n(10)\nThe measure is, in effect, k’s share of all shortest-path traffic from i to j, summed across all choices\nof i and j. If there is only one shortest path from any point to any other, the measure is equal to the\nnumber of geodesic paths that pass through a given node k. One can easily imagine that when the\nnetwork being studied consists of ties that are very costly to build, betweenness will indeed index\nan ability to extort benefits from flows through the network. For example, if the network represents\ntrade routes between medieval cities, the cities with high betweenness centrality have opportunities\nfor amassing wealth and exerting control that other cities would not have (Pitts, 1979).\nSeveral variations on betweenness centrality are possible. First of all, the reliance on geodesic\npaths alone may be undesirable. While inter-city trade might well take only shortest paths (in\norder to minimize costs), information might flow equally well across all possible paths. Hence we\nwould modify gikj to record the number of paths of any kind that link i and j via k. Borgatti (2002,\n2005) considers betweenness for all possible paths, as well as all possible trails, as well as walks\n(weighted inversely by length). However, he uses simulation to estimate the betweenness values\nrather than formulas. Newman (2005) provides closed-form equations for the case of random\ntraversal via walks.\nAnother set of variants is obtained by limiting the length of paths, on the idea that very long\npaths are seldom used and should not contribute to a node’s betweenness. Such measures might\nbe called k-betweenness centrality, where k gives the maximum length of paths counted. Friedkin\n(1991) proposes a measure that is essentially k-betweenness measures with k = 2. Similarly, Gould\nand Fernandez (1989) develop brokerage measures that are specific variants of 2-betweenness\nmeasures. A sophisticated variant would be a class of betweenness measures that count paths of\nall lengths, but weight them inversely in proportion to their length.\nOf course, if we count all paths or walks we are in danger of double-counting since many\npaths can share the same subset of edges. If this is a concern, we would want to count only edgedisjoint paths. This is exactly what Freeman et al. (1991) have done. Their measure is called flow\nbetweenness 11. It is called flow betweenness because of the well-known relationship between\nthe number of edge-independent paths between a pair of nodes, and the amount of material that\ncan flow from one node to another via all possible edges (Ford and Fulkerson, 1956). Since flow\nbetweenness assesses the proportion of edge-independent paths that involve a given node, it is\nin effect measuring the amount of flow in the network that would not occur if the node were not\npresent (or were choosing not to transmit). This is really the essence of any betweenness measure:\nthe potential for withholding flow, otherwise known as gatekeeping.\n11 Flow betweenness bears the same relationship to geodesic betweenness that information centrality bears to Freeman’s\ncloseness measure. Both information centrality and flow betweenness take into account all edge-disjoint paths, whereas\nFreeman’s closeness and geodesic betweenness consider only geodesic paths.\nS.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484 475\nAn interesting aspect of flow betweenness is how it is computed. Because the sets of edgeindependent paths between any two nodes are not unique, flow betweenness cannot be calculated\ndirectly by counting paths. Instead, programs like UCINET (Borgatti et al., 2002) essentially\nsimulate the gatekeeping process by calculating flows between all pairs of nodes in the network, then removing the node whose centrality is being measured, and then recalculating the\nflows.\nMore specifically, let us denote by W the matrix of maximum flows between nodes (i.e., the\nnumber of edge-independent paths between them) and denote by kW the matrix derived from W\nby deleting row and column k. In addition, let us denote by kW* the matrix obtained by deleting\nnode k from the original network, and recalculating the flow matrix. We can then define flow\ncentrality as follows:\nck = \u0001\ni,j\nkwi,j − kw∗\ni,j\nkwi,j\n(11)\nWe can reformulate all betweenness measures in accordance with Eq. (11), simply by changing\nthe W matrix. For example, to calculate Freeman’s betweenness, we take W to be the geodesic\ncount matrix in which wij gives the number of geodesic paths from i to j. Applying Eq. (11) gives\nscores exactly equal to twice Freeman’s values.\nThus, betweenness-type measures might be thought of as “proportion reduction in cohesion”\n(PRC) measures, analogous to proportion reduction in error (PRE) measures in statistics. Where\nsomething flows over the links of a network, PRC measures quantify the potential of a node to\ndisrupt flows throughout the network by ceasing its own transmissions.\nIf we define cohesion in terms of reachability, a PRC measure essentially indexes the network\nfragmentation that results from removing a node. Such a measure, called fragmentation (F),\nwas introduced by Borgatti (2003, forthcoming). The F measure is simply the proportion of\ndisconnected pairs of nodes that results when a given node is removed from a network. The bigger\nthe value, the more important the node in maintaining cohesion. As implemented in UCINET\n(Borgatti et al., 2002), the measure is accompanied by a normalization in which the level of\nfragmentation of the original network is subtracted from the fragmentation after removing a\ngiven node, divided by the fragmentation of the original network. Large positive values indicate\nnodes that contribute to the cohesion of the network, while large negative values indicate nodes\nthat reduce the cohesiveness of the network.\nDefining cohesion in terms of distance yields a different set of measures. As a specific example, let us define a cohesion matrix W as the reciprocals of the geodesic distance between all\npairs of nodes in a network (with the convention that the reciprocal of an undefined distance\nis 0). Now, we remove a given node (whose centrality we are measuring) from the W matrix,\nyielding kW, and also remove the node from the original network and re-compute the reciprocals of geodesic distances among all pairs of remaining nodes, yielding kW*. Then we apply Eq.\n(11), subtracting the values of kW* from the values of kW and dividing by kW. Summing all of\nthese adjusted values, we obtain the relative decrease in cohesion obtained by removing a given\nnode.\nA similar measure was proposed by Borgatti (2003, forthcoming). Called “distance-weighted\nfragmentation” (DF), this measure is defined as the average reciprocal distance among nodes after\nremoval of a given node. The measure is 1 when all nodes are distance 1 from each other (i.e., a\ncomplete graph), and 0 when all nodes are isolates. Intermediate values index the extent to which\nthe presence of a node tends to reduce distances in the network.\n476 S.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484\nTable 1\nCross-classification of centrality measures\nRadial Medial\nVolume Freeman degree, Sade k-path, Bonacich eigenvector, Katz status,\nHubbell status, Hoede status, Doreian iterated, Hubbell, Markovsky\net al. GPI, Friedkin TEC, Coleman power, Bonacich power, Burt\nprestige\nAnthonisse rush, Freeman\nbetweenness, Freeman et al.\nflow betweenness, Friedkin\nMEC, Newman RWB\nLength Freeman closeness, Stephenson-Zelen information, Friedkin IEC Borgatti DF\n4. A Typology of measures\nIt is apparent in this review of measures that all of the measures evaluate a node’s involvement in\nthe walk structure of a network. That is, they evaluate the volume or length of walks of some kind\nthat originate, terminate, or pass through a node. Furthermore, all are based on the marginals of an\nappropriately constructed node-by-node matrix, although the method of calculating marginals can\nvary from simple sums to averages and weighted averages to harmonic means, and so on. Thus\nfour basic dimensions distinguish between centrality measures: the types of walks considered\n(called Walk Type, such as geodesic or edge-disjoint), the properties of walks measured (called\nWalk Property, namely volume or length), the type of nodal involvement (called Walk Position,\nnamely radial or medial), and type of summarization (called Summary Type, such as sum or\naverage).12\nThe Walk Type dimension concerns the restrictions that some measures impose on the kind of\nwalks considered, such as only geodesics, only true paths, limited length walks, and so on. The\nWalk Property dimension distinguishes between measures that evaluate the number of walks a node\nis involved in and measures that evaluate the length of those walks. The Walk Position dimension\ndistinguishes between measures that evaluate walks emanating from a node and measures that\nevaluate walks passing through a node. The Summary Type dimension distinguishes measures\nusing different ways to summarize rows of the walk matrix13. For simplicity, Table 1 gives a list\nof centrality measures cross-classified by just two of the dimensions: Walk Property (volume vs.\nlength), and Walk Position (radial vs. medial).\nWhile each centrality measure uses a different W matrix, in all cases the W matrix is an indicator\nof social proximity/cohesion among nodes.14 Almost all of the W matrices we have seen have\nbeen identified specifically in the network literature as measures of cohesion. The i, jth cell of\n12 A fourth attribute, less important, is the choice of summary statistic. All centrality measures can be computed by\nsummarizing the rows of an actor-by-actor matrix W (or A). Typically, this statistic is a simple sum or average. However,\nother summary statistics, such as weighted means, medians, modes, minimums and maximums, are used as well. This\ndecision point is well known in cluster analysis, where the difference between some clustering methods is whether the\nminimum, maximum or median is used to compute the distance from a point to a set of points (Johnson, 1967; D’Andrade,\n1978). An important example of a weighted mean is the eigenvector, which is used by Bonacich (1972), Burt (1982), and\nDoreian (1986). 13 This decision point is well known in cluster analysis, where the difference between some clustering methods is\nwhether the minimum, maximum or median is used to compute the distance from a point to a set of points (Johnson, 1967;\nD’Andrade, 1978). An important example of a weighted mean is the eigenvector, which is used by Bonacich (1972), Burt\n(1982), and Doreian (1986). 14 We use the terms proximity and cohesion interchangeably to refer to social closeness or strength of connection between\npairs of nodes.\nS.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484 477\nany W matrix indicates the “connectedness” or “relatedness” of i and j, which is often assumed\nto correspond to a potential for transmission of attitudes, diseases, resources, etc. Of course, the\nvalues of W can be scaled such that large numbers indicated greater cohesion (as in a matrix of\nvalued strengths of ties) or lesser cohesion (as in a geodesic distance matrix).\nIf the W matrix reflects dyadic cohesion, it is no surprise that summarizing the values of the W\nmatrix gives the well-known measures of network cohesion such as density and characteristic path\nlength. Density is simply the average of the adjacency matrix across all cells, while characteristic\npath length is the average (or other summary measure) of the geodesic distance matrix.\nThe cohesion matrix is also used as the basis for all methods of detecting cliques and other\ncohesive subgroups (e.g. Alba and Kadushin, 1976; Burt, 1991). The myriad definitions and\ntechniques of cohesive subgroups can be viewed as identifying clusters within the cohesion\nmatrix. Interestingly, the Walk Property dimension that we have identified for centrality measures\nis also a key dimension in the classification of cohesive subgroups (Borgatti et al., 1990). As\nis well-known, the cohesive subgroup known as a clique is characterized by having maximum\ndensity (number of ties) and minimum distance (length of paths). Since its formalization by Luce\nand Perry (1949), subsequent researchers have sought to relax the definition of clique by relaxing\neither the distances among members of the subgroup, or the number of ties (i.e., walks of length\n1). Concepts such as n-cliques, n-clans, and n-clubs relax the minimal distance property of the\nclique, while concepts such as k-plexes, k-cores, ls-sets, and lambda-sets relax the maximum\ndensity property of cliques. In short: length versus volume.\nSimilarly, what centrality measures do is summarize the amount (and type) of dyadic cohesion\nthat each node is involved in. In effect, centrality measures are indices of the share of dyadic\ncohesion attributable to each node. For comparison, whole network measures of cohesion, such\nas density or characteristic path length, are exactly like centrality measures except that instead\nof breaking out the summary by node, the entire cohesion matrix is summarized, much like the\ngrand marginal in a contingency table.\nThus, a basic claim of the graph-theoretic typology presented here is that dyadic cohesion\nprovides a common basis for not only centrality measures but subgroups and network cohesion.\nIn this way, our analysis provides a sense of continuity with the other major areas of network\nanalysis.\nAnother benefit that the typology provides is a partial answer to the commonly-asked question\nof how to choose among centrality measures. The typology essentially divides measures into\ngroups that, to put it in marketing terms, are more competitive with each other than with other\nmeasures. Our claim is that measures within the same box in Table 1 are similar enough on key\nattributes that they can be thought of as competitive, i.e., as potentially substitutable alternatives\nfor each other. Among measures within each box, we can reasonably ask which is better. In\ncontrast, measures in different boxes differ in fundamental ways, and are perhaps best viewed as\ncomplementary.\n5. Radial measures and the core periphery assumption\nIt is apparent that all radial measures are constructed the same way. First one defines an actorby-actor matrix W that records the number or length of walks of some kind linking every pair of\nactors. Then one summarizes each row of W by taking some kind of mean or total. Thus, centrality\nprovides an overall summary of a node’s participation in the walk structure of the network. It is a\nmeasure of how much of the walk structure is due to a given node. It is quite literally the node’s\nshare of the total volume or length of walks in the network.\n478 S.P. Borgatti, M.G. Everett / Social Networks 28 (2006) 466–484\nThus, the essence of a radial centrality measure is this: radial centrality summarizes a node’s\nconnectedness with the rest of the network. To the extent that dyadic cohesion is seen as indexing\ninfluence (e.g., Katz, 1953; Hubbell, 1965; Friedkin, 1991), the centrality measure judges the\noverall influence of an actor. If the cohesion matrix is adjacency (as in degree centrality), and\nthe adjacency matrix represents the “friend of” relation, then centrality summarizes an actor’s\n“friendliness”.\nThis raises a question. Under what conditions does it make sense to summarize, with a single\nvalue, a node’s cohesion with all others? Consider the mean of any list of numbers. It can always\nbe computed, but only serves as a summary when the distribution of the numbers is unimodal.\nIndeed, if the list is known to be normally distributed, the mean and standard deviation alone\ncan generate the entire distribution. But if the shape of the distribution is bimodal, the mean is\na very poor summary. If the ideal serving temperature of tea is in the range of 35–50◦ for much\nof the population (because they like iced tea) and the ideal serving temperature is in the range of\n130–160◦ for the other half of the population (because they like hot tea), does the average of the\nideal temperatures provide a good assessment of the population’s tastes? That is, does a luke-warm\ntemperature of 97.5◦ provide a good picture of what the people’s tastes are? Probably not.\nA radial centrality measure is clearly interpretable in a network in which dyadic cohesion is\nunimodal, but not in one which is multimodal. That is to say, radial centrality makes sense in\nnetworks which have, at most, one center. This means that a cohesive subgroup analysis would\nfind only one subgroup (a core) to which all nodes belong to a greater or lesser extent. The network\nwould not be divided in two or more subgroups. In that case, radial centrality would effectively\nserve as a measure of “coreness” (Borgatti and Everett, 1999; Everett and Borgatti, 2004), which\nis to say, strength of membership in the one and only group. Bonacich (1972) has noted that\nif a network contains more than one component (i.e., a maximal set of nodes that are mutually\nreachable), eigenvector centrality will assign zeros to all nodes not in the largest component,\neven if they are highly central in their own component. Rather, those nodes load highly on the\nremaining eigenvectors. In other words, the eigenvectors of a cohesion matrix measure strength\nof involvement of each node to each major subgroup (component).\nBefore we can interpret a radial measure of centrality, we must determine whether the network\nsatisfies the one-group requirement. A network will exhibit a core-periphery pattern whenever its\ncohesion matrix can be modeled as a non-negative function of its marginals (Borgatti and Everett,\n1999). For example, if the cohesion matrix W is a valued adjacency matrix in which wij gives the\nnumber of interactions observed between i and j, and the model of independence holds (i.e., no\ninteraction), then the network has a core-periphery structure. This is because the independence\nmodel specifies that the extent of dyadic cohesion between nodes i and j is proportional to the\nproduct of their general proximity to anyone (i.e., their centrality). Hence, the only region of the\nnetwork with high densities of proximity will be populated by high centrality nodes, and there will\nonly be one such region. This pattern of distribution of proximities is precisely a core-periphery\nstructure."
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment