Title: | Generates Incidence Matrices and Bipartite Graphs |
---|---|
Description: | Functions to generate incidence matrices and bipartite graphs that have (1) a fixed fill rate, (2) given marginal sums, (3) marginal sums that follow given distributions, or (4) represent bill sponsorships in the US Congress <doi:10.31219/osf.io/ectms>. It can also generate an incidence matrix from an adjacency matrix, or bipartite graph from a unipartite graph, via a social process mirroring team, group, or organization formation <doi:10.48550/arXiv.2204.13670>. |
Authors: | Zachary Neal [aut, cre] |
Maintainer: | Zachary Neal <[email protected]> |
License: | GPL-3 |
Version: | 1.0.2 |
Built: | 2024-12-25 03:25:57 UTC |
Source: | https://github.com/zpneal/incidentally |
add.blocks
shuffles an incidence matrix to have a block structure or planted partition while preserving the row and column sums
add.blocks( I, rowblock = sample(1:2, replace = T, nrow(I)), colblock = sample(1:2, replace = T, ncol(I)), density = 0.5, sorted = FALSE )
add.blocks( I, rowblock = sample(1:2, replace = T, nrow(I)), colblock = sample(1:2, replace = T, ncol(I)), density = 0.5, sorted = FALSE )
I |
An incidence matrix or igraph bipartite graph |
rowblock |
numeric: vector indicating each row node's block membership |
colblock |
numeric: vector indicating each column node's block membership |
density |
numeric: desired within-block density |
sorted |
boolean: if TRUE, return incidence matrix permuted by block |
Stochastic block and planted partition models generate graphs in which the probability that two nodes are connected
depends on whether they are members of the same or different blocks/partitions. Functions such as sample_sbm
can randomly sample from stochastic block models with given probabilities. In contrast add.blocks
adds a block
structure to an existing incidence matrix while preserving the row and column sums. Row nodes' and column nodes'
block memberships are supplied in separate vectors. If block membership vectors are not provided, then nodes are
randomly assigned to two groups.
An incidence matrix or igraph bipartite graph with a block structure
Neal, Z. P., Domagalski, R., and Sagan, B. 2021. Comparing alternatives to the fixed degree sequence model for extracting the backbone of bipartite projections. Scientific Reports, 11, 23929. doi:10.1038/s41598-021-03238-3
Neal, Z. P. 2022. incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints doi:10.31219/osf.io/ectms
I <- incidence.from.probability(R = 100, C = 100, P = .1) blocked <- add.blocks(I, density = .7) all(rowSums(I)==rowSums(blocked)) all(colSums(I)==colSums(blocked)) B <- igraph::sample_bipartite(100, 100, p=.1) blocked <- add.blocks(B, density = .7) all(igraph::degree(B)==igraph::degree(blocked))
I <- incidence.from.probability(R = 100, C = 100, P = .1) blocked <- add.blocks(I, density = .7) all(rowSums(I)==rowSums(blocked)) all(colSums(I)==colSums(blocked)) B <- igraph::sample_bipartite(100, 100, p=.1) blocked <- add.blocks(B, density = .7) all(igraph::degree(B)==igraph::degree(blocked))
curveball
randomizes an incidence matrix or bipartite graph, preserving the row and column sums
curveball(M, trades = 5 * nrow(M), class = NULL)
curveball(M, trades = 5 * nrow(M), class = NULL)
M |
a binary matrix of class |
trades |
integer: number of trades; the default is 5 * nrow(M) (approx. mixing time) |
class |
string: Return object as |
Strona et al. (2014) provided an initial implementation of the Curveball algorithm in R. curveball()
is a modified R
implementation that is slightly more efficient. For an even more efficient algorithm, see backbone::fastball()
.
An incidence matrix of class matrix
or Matrix
, or a bipartite graph of class igraph.
Strona, Giovanni, Domenico Nappo, Francesco Boccacci, Simone Fattorini, and Jesus San-Miguel-Ayanz. 2014. A Fast and Unbiased Procedure to Randomize Ecological Binary Matrices with Fixed Row and Column Totals. Nature Communications, 5, 4114. doi:10.1038/ncomms5114
Godard, Karl and Neal, Zachary P. 2022. fastball: A fast algorithm to sample bipartite graphs with fixed degree sequences. arXiv:2112.04017
Neal, Z. P. 2022. incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints doi:10.31219/osf.io/ectms
M <- incidence.from.probability(5,5,.5) #A matrix Mrand <- curveball(M) #Random matrix with same row/col sums all.equal(rowSums(M), rowSums(curveball(M))) all.equal(colSums(M), colSums(curveball(M)))
M <- incidence.from.probability(5,5,.5) #A matrix Mrand <- curveball(M) #Random matrix with same row/col sums all.equal(rowSums(M), rowSums(curveball(M))) all.equal(colSums(M), colSums(curveball(M)))
incidence.from.adjacency
generates an incidence matrix from an adjacency matrix or network using
a given generative model
incidence.from.adjacency( G, k = 1, p = 1, blau.param = c(2, 1, 10), maximal = TRUE, model = "team", class = NULL, narrative = TRUE )
incidence.from.adjacency( G, k = 1, p = 1, blau.param = c(2, 1, 10), maximal = TRUE, model = "team", class = NULL, narrative = TRUE )
G |
A symmetric, binary adjacency matrix of class |
k |
integer: Number of artifacts to generate |
p |
numeric: Tuning parameter for artifacts, 0 <= p <= 1 |
blau.param |
vector: Vector of parameters that control blau space in the organizations model (see details) |
maximal |
boolean: Should teams/clubs models be seeded with maximal cliques? |
model |
string: Generative model, one of c("team", "club", "org") (see details) |
class |
string: Return object as |
narrative |
boolean: TRUE if suggested text & citations should be displayed. |
Given a unipartite network composed of i agents (i.e. nodes) that can be represented by an i x i adjacency
matrix, incidence.from.adjacency
generates a random i x k incidence matrix that indicates whether agent
i is associated with artifact k. Generative models differ in how they conceptualize artifacts and how
they associate agents with these artifacts.
The Team Model (model == "team"
) mirrors a team formation process, where each artifact represents a new team
formed from the incumbants of a prior team (with probability p
) and newcomers (with probability 1-p
).
The Club Model (model == "club"
) mirrors a social club formation process, where each artifact represents
a social club. Club members attempt to recruit non-member friends, who join the club if it would have a
density of at least p
.
The Organizations Model (model == "org"
) mirrors an organization (the artifact) recruiting members from social
space, where those within the organization's niche join with probability p
, and those outside the niche join
with probability 1-p
. blau.param
is a vector containing three values that control the characteristics of the
blau space. The first value is the space's dimensionality. The second two values are shape parameters of a Beta
distribution that describes niche sizes. The default is a two-dimensional blau space, with organization niche
sizes that are strongly positively skewed (i.e., many specialist organizations, few generalists).
An incidence matrix of class matrix
or Matrix
, or a bipartite graph of class igraph.
Neal, Z. P. 2023. The duality of networks and groups: Models to generate two-mode networks from one-mode networks. Network Science.
G <- igraph::erdos.renyi.game(10, .4) I <- incidence.from.adjacency(G, k = 1000, p = .95, model = "team", narrative = TRUE)
G <- igraph::erdos.renyi.game(10, .4) I <- incidence.from.adjacency(G, k = 1000, p = .95, model = "team", narrative = TRUE)
incidence.from.congress()
uses data from https://www.congress.gov/ to construct an incidence
matrix or bipartite graph recording legislators' bill (co-)sponsorships.
incidence.from.congress( session = NULL, types = NULL, areas = "all", nonvoting = FALSE, weighted = FALSE, format = "data", narrative = FALSE )
incidence.from.congress( session = NULL, types = NULL, areas = "all", nonvoting = FALSE, weighted = FALSE, format = "data", narrative = FALSE )
session |
numeric: the session of congress |
types |
vector: types of bills to include. May be any combination of c("s", "sres", "sjres", "sconres") OR any combination of c("hr", "hres", "hjres", "hconres"). |
areas |
string: policy areas of bills to include (see details) |
nonvoting |
boolean: should non-voting members be included |
weighted |
boolean: should sponsor-bill edges have a weight of 2, but cosponsor-bill edges have a weight of 1 |
format |
string: format of output, one of c("data", "igraph") |
narrative |
boolean: TRUE if suggested text & citations should be displayed. |
The incidence.from.congress()
function uses data from https://www.congress.gov/ to
construct an incidence matrix or bipartite graph recording legislators' bill (co-)sponsorships. In an incidence matrix
I, entry Iik = 1 if legislator i sponsored or co-sponsored bill k, and otherwise is 0. In a bipartite graph
G, a legislator i is connected to a bill k if i sponsored or co-sponsored k.
In the US Congress, the law making process begins when a sponsor legislator introduces a bill in their chamber (House of Representatives or Senate). Additional legislators in the same chamber can support the bill by joining as a co-sponsor. The bill is discussed, revised, and possibly voted on in the chamber. If it passes in one chamber, it is sent to the other chamber for further discussion, revision, and possibly a vote. If it passed both chambers, it is sent to the President. If the President signs the bill, it becomes law.
In the House of Representatives, legislators can introduce four types of bills: a House Bill (hr), a House Joint Resolution (hjres), a House Concurrent Resolution (hconres), and a House Simple Resolution (hres). Similarly, in the Senate, legislators can introduce four types of bills: a Senate Bill (s), a Senate Joint Resolution (sjres), a Senate Concurrent Resolution (sconres), and a Senate Simple Resolution (sres). In both chambers, concurrent and simple resolutions are used for minor procedural matters and do not have the force of law. Only bills and joint resolutions require the President's signature and have the force of law if signed.
Each bill is assigned a policy area by the Congressional Research Service. By default, bills from all policy areas are included,
however the areas
parameter can be used to include only bills addressing certain policy areas. The areas
takes a vector of
strings listing the desired policy areas (e.g., areas = c("Congress", "Animals")
). A complete list of policy areas and brief
descriptions is available at https://www.congress.gov/help/field-values/policy-area.
If format = "data"
, a list containing an incidence matrix, a dataframe of legislator characteristics, and a dataframe of bill characteristics.
If format = "igraph"
, a bipartite igraph object composed of legislator vertices and bill vertices, each with vertex attributes.
For both formats, legislator characteristics include: BioGuide ID, full name, last name, party affiliation, and state. Bill characteristics include: bill ID, introduction date, title, policy area, status, sponsor's party, and number of co-sponsors from each party.
Tutorial: Neal, Z. P. 2022. Constructing legislative networks in R using incidentally and backbone. Connections, 42. doi:10.2478/connections-2019.026
Package: Neal, Z. P. 2022. incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints doi:10.31219/osf.io/ectms
## Not run: D <- incidence.from.congress(session = 116, types = "s", format = "data") D <- incidence.from.congress(session = 116, types = "s", format = "data", areas = "animals") G <- incidence.from.congress(session = 115, types = c("hr", "hres"), format = "igraph") ## End(Not run)
## Not run: D <- incidence.from.congress(session = 116, types = "s", format = "data") D <- incidence.from.congress(session = 116, types = "s", format = "data", areas = "animals") G <- incidence.from.congress(session = 115, types = c("hr", "hres"), format = "igraph") ## End(Not run)
incidence.from.distribution
generates a random incidence matrix with row and column
sums that approximately follow beta distributions with given parameters.
incidence.from.distribution( R, C, P, rowdist = c(1, 1), coldist = c(1, 1), class = "matrix", narrative = TRUE )
incidence.from.distribution( R, C, P, rowdist = c(1, 1), coldist = c(1, 1), class = "matrix", narrative = TRUE )
R |
integer: number of rows |
C |
integer: number of columns |
P |
numeric: probability that a cell contains a 1 |
rowdist |
vector length 2: Row marginals will approximately follow a Beta(a,b) distribution |
coldist |
vector length 2: Column marginals will approximately follow a Beta(a,b) distribution |
class |
string: the class of the returned backbone graph, one of c("matrix", "Matrix", "igraph"). |
narrative |
boolean: TRUE if suggested text & citations should be displayed. |
An incidence matrix of class matrix
or Matrix
, or a bipartite graph of class igraph.
Neal, Z. P., Domagalski, R., and Sagan, B. 2021. Comparing alternatives to the fixed degree sequence model for extracting the backbone of bipartite projections. Scientific Reports, 11, 23929. doi:10.1038/s41598-021-03238-3
Neal, Z. P. 2022. incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints doi:10.31219/osf.io/ectms
I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(10000,10000), coldist = c(10000,10000)) #Constant I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(1,1), coldist = c(1,1)) #Uniform I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(1,10), coldist = c(1,10)) #Right-tailed I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(10,1), coldist = c(10,1)) #Left-tailed I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(10,10), coldist = c(10,10), narrative = TRUE) #Normal
I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(10000,10000), coldist = c(10000,10000)) #Constant I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(1,1), coldist = c(1,1)) #Uniform I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(1,10), coldist = c(1,10)) #Right-tailed I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(10,1), coldist = c(10,1)) #Left-tailed I <- incidence.from.distribution(R = 100, C = 100, P = 0.1, rowdist = c(10,10), coldist = c(10,10), narrative = TRUE) #Normal
incidence.from.probability
generates a random incidence matrix in which each cell is filled
with a 1 with a given probability.
incidence.from.probability( R, C, P = 0, constrain = TRUE, class = "matrix", narrative = FALSE )
incidence.from.probability( R, C, P = 0, constrain = TRUE, class = "matrix", narrative = FALSE )
R |
integer: number of rows |
C |
integer: number of columns |
P |
numeric: probability that a cell contains a 1; if P = 0 a probability will be chosen randomly |
constrain |
boolean: ensure that no rows or columns sum to 0 (i.e., contain all 0s) or to 1 (i.e., contain all 1s) |
class |
string: the class of the returned backbone graph, one of c("matrix", "Matrix", "igraph"). |
narrative |
boolean: TRUE if suggested text & citations should be displayed. |
An incidence matrix of class matrix
or Matrix
, or a bipartite graph of class igraph.
Neal, Z. P., Domagalski, R., and Sagan, B. 2021. Comparing alternatives to the fixed degree sequence model for extracting the backbone of bipartite projections. Scientific Reports, 11, 23929. doi:10.1038/s41598-021-03238-3
Neal, Z. P. 2022. incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints doi:10.31219/osf.io/ectms
I <- incidence.from.probability(R = 10, C = 10) I <- incidence.from.probability(R = 10, C = 10, P = .5) I <- incidence.from.probability(R = 10, C = 10, P = .5, class = "igraph", narrative = TRUE)
I <- incidence.from.probability(R = 10, C = 10) I <- incidence.from.probability(R = 10, C = 10, P = .5) I <- incidence.from.probability(R = 10, C = 10, P = .5, class = "igraph", narrative = TRUE)
incidence.from.vector
generates a random incidence matrix with given row and column sums
incidence.from.vector(R, C, class = "matrix", narrative = FALSE)
incidence.from.vector(R, C, class = "matrix", narrative = FALSE)
R |
numeric vector: row marginal sums |
C |
numeric vector: column marginal sums |
class |
string: the class of the returned backbone graph, one of c("matrix", "Matrix", "igraph"). |
narrative |
boolean: TRUE if suggested text & citations should be displayed. |
An incidence matrix of class matrix
or Matrix
, or a bipartite graph of class igraph.
Neal, Z. P., Domagalski, R., and Sagan, B. 2021. Comparing alternatives to the fixed degree sequence model for extracting the backbone of bipartite projections. Scientific Reports, 11, 23929. doi:10.1038/s41598-021-03238-3
Neal, Z. P. 2022. incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints doi:10.31219/osf.io/ectms
I <- incidence.from.vector(R = c(1,1,2), C = c(1,1,2)) I <- incidence.from.vector(R = c(1,1,2), C = c(1,1,2), class = "igraph", narrative = TRUE)
I <- incidence.from.vector(R = c(1,1,2), C = c(1,1,2)) I <- incidence.from.vector(R = c(1,1,2), C = c(1,1,2), class = "igraph", narrative = TRUE)
Functions to generate incidence matrices and bipartite graphs that have (1) a fixed fill rate, (2) given marginal sums, (3) marginal sums that follow given distributions, or (4) represent bill sponsorships in the US Congress. It can also generate an incidence matrix from an adjacency matrix, or bipartite graph from a unipartite graph, via a social process mirroring team, group, or organization formation.
Incidence matrices can be generated:
...with a fixed fill rate: incidence.from.probability()
.
...with given marginals: incidence.from.vector()
.
...with marginals that follow given distributions: incidence.from.distribution()
.
...from a network, by a social process mirroring team, group, or organization formation incidence.from.adjacency()
.
...with a block structure or planted partition: add.blocks()
.
...from US Congress bill sponsorships: incidence.from.congress()
.