An overview of BradleyTerryScalable

The Bradley-Terry model

The Bradley-Terry model can…

  • rank a set of items…

  • based on a series of pairwise comparisons…

  • where each comparison results in a win/loss or a draw

The model

The Bradley-Terry probability that item \(i\) beats item \(j\) is

\[p_{ij} = \frac{\pi_i}{\pi_i + \pi_j},\]


where \(\pi_k\) is a strength parameter for player \(k\), \(1 \leq k \leq K\) and \(\sum_{i=1}^K{\pi_i} = K\).

The comparison graph


The MLE exists and is finite whenever the comparison graph is fully connected.

Fitting the model

MLE

\[\pi_i^{(n+1)} = \frac{W_i}{\sum_{j=1}^K \frac{n_{ij}}{\pi_i^{(n)} + \pi_j^{(n)}}}\]

MAP

\[\pi_i^{(n+1)} = \frac{a - 1 + W_i}{b + \sum_{j=1}^K \frac{n_{ij}}{\pi_i^{(n)} + \pi_j^{(n)}}} \]


where \(W_i = \sum_{j=1}^K w_{ij}\) is the number of wins for item \(i\) and \(n_{ij} = w_{ij} + w_{ji}\) is the number of comparisons between item \(i\) and item \(j\) and where \(a\) and \(b\) are the shape and rate parameters of a gamma-distributed prior on \(\pi\): \(p(\pi) = \prod_{i=1}^K \mathcal{G}(\pi_i; a, b)\).

Fitting the model

MLE

\[\pi_i^{(n+1)} = \frac{W_i}{\sum_{j=1}^K \frac{n_{ij}}{\pi_i^{(n)} + \pi_j^{(n)}}}\]

MAP

\[\pi_i^{(n+1)} = \frac{{\color{#D4006A}{a - 1}} + W_i}{{\color{#D4006A}b} + \sum_{j=1}^K \frac{n_{ij}}{\pi_i^{(n)} + \pi_j^{(n)}}} \]


where \(W_i = \sum_{j=1}^K w_{ij}\) is the number of wins for item \(i\) and \(n_{ij} = w_{ij} + w_{ji}\) is the number of comparisons between item \(i\) and item \(j\) and where \(a\) and \(b\) are the shape and rate parameters of a gamma-distributed prior on \(\pi\): \(p(\pi) = \prod_{i=1}^K \mathcal{G}(\pi_i; a, b)\).

BradleyTerryScalable

Aims

  • Fit the Bradley-Terry model to large and sparse data sets

  • Has to be fast enough

  • Has to be able to deal with cases when the comparison graph is not fully connected

  • Easy to use, both in interface and workflow

Workflow: data

  • btdata(x) to create object of class "btdata"

    • x can be a data frame, graph, matrix or contigency table

    • may need to call codes_to_counts() first

    • summary(btdata)

    • select_components(btdata, subset)

Workflow: fit

  • btfit(btdata, a) to fit model and create object of class "btfit"

    • If a = 1, finds MLE on each fully connected component

    • If a > 1, finds the MAP estimate of \(\pi\)

    • Methods for btfit object:

      • summary, fitted, coef, vcov, simulate
    • btprob(object) for Bradley-Terry probabilities \(p_{ij}\)

Examples

library(BradleyTerryScalable)
citations
              citing
cited          Biometrika Comm Statist JASA JRSS-B
  Biometrika          714          730  498    221
  Comm Statist         33          425   68     17
  JASA                320          813 1072    142
  JRSS-B              284          276  325    188
citations_btdata <- btdata(citations)
summary(citations_btdata) 
Number of items: 4 
Density of wins matrix: 1 
Fully-connected: TRUE 

toy_data
   player1 player2 outcome
1      Cyd     Amy      W1
2      Amy     Ben       D
3      Ben     Eve      W2
4      Cyd     Dan      W2
5      Ben     Dan       D
6      Dan     Eve      W2
7      Fin     Eve      W2
8      Fin     Gal      W2
9      Fin     Han      W2
10     Eve     Gal      W1
11     Fin     Gal       D
12     Han     Gal      W1
13     Han     Gal      W2
14     Amy     Dan      W1
15     Cyd     Amy      W1
16     Ben     Dan       D
17     Dan     Amy      W2

toy_btdata <- toy_data |>
  codes_to_counts(c("W1", "W2", "D")) |>
  btdata()
summary(toy_btdata)
Number of items: 8 
Density of wins matrix: 0.25 
Fully-connected: FALSE 
Number of fully-connected components: 3 
Summary of fully-connected components: 
  Component size Freq
1              1    1
2              3    1
3              4    1

toy_fit_MAP <- btfit(toy_btdata, a = 1.1)
summary(toy_fit_MAP)  
$call
btfit(btdata = toy_btdata, a = 1.1)

$item_summary
# A tibble: 8 × 3
  component    item  estimate
  <chr>        <chr>    <dbl>
1 full_dataset Eve     1.90  
2 full_dataset Cyd     0.472 
3 full_dataset Han     0.245 
4 full_dataset Amy    -0.0766
5 full_dataset Gal    -0.102 
6 full_dataset Ben    -0.423 
7 full_dataset Dan    -0.536 
8 full_dataset Fin    -1.48  

$component_summary
# A tibble: 1 × 4
  component    num_items iters converged
  <chr>            <int> <int> <lgl>    
1 full_dataset         8   101 TRUE     

toy_fit_MLE <- btfit(toy_btdata, a = 1)
summary(toy_fit_MLE, SE = TRUE) 
$call
btfit(btdata = toy_btdata, a = 1)

$item_summary
# A tibble: 7 × 4
  component item  estimate    SE
  <chr>     <chr>    <dbl> <dbl>
1 2         Han     0.696  0.911
2 2         Gal     0.413  0.768
3 2         Fin    -1.11   1.05 
4 3         Cyd     0.592  0.991
5 3         Amy     0.0325 0.699
6 3         Ben    -0.243  0.944
7 3         Dan    -0.382  0.712

$component_summary
# A tibble: 2 × 4
  component num_items iters converged
  <chr>         <int> <int> <lgl>    
1 2                 3     6 TRUE     
2 3                 4    10 TRUE     

btprob(object)

Gives the Bradley-Terry probabilities \(\frac{\pi_i}{\pi_i + \pi_j}\)

citations |>
  btdata() |>
  btfit(1) |>
  btprob() |>
  round(3)
4 x 4 sparse Matrix of class "dgCMatrix"
              citing
cited          JRSS-B Biometrika  JASA Comm Statist
  JRSS-B        .          0.567 0.679        0.962
  Biometrika    0.433      .     0.618        0.950
  JASA          0.321      0.382 .            0.922
  Comm Statist  0.038      0.050 0.078        .    

btprob(object)

Gives the Bradley-Terry probabilities \(\frac{\pi_i}{\pi_i + \pi_j}\)

citations |>
  btdata() |>
  btfit(1) |>
  btprob(as_df = TRUE)
# A tibble: 6 × 5
  component    cited      citing       prob1wins prob2wins
  <chr>        <chr>      <chr>            <dbl>     <dbl>
1 full_dataset JRSS-B     Biometrika       0.567    0.433 
2 full_dataset JRSS-B     JASA             0.679    0.321 
3 full_dataset Biometrika JASA             0.618    0.382 
4 full_dataset JRSS-B     Comm Statist     0.962    0.0384
5 full_dataset Biometrika Comm Statist     0.950    0.0498
6 full_dataset JASA       Comm Statist     0.922    0.0780

A real-world example

comp_cites
# A tibble: 415,496 × 3
   cited_pdpass citing_pdpass count
   <chr>        <chr>         <dbl>
 1 10032645     10030580          2
 2 10083166     10030580          4
 3 10170329     10030580          2
 4 10191810     10030580          2
 5 10223803     10030580          2
 6 10254226     10030580          1
 7 11490383     10030580          2
 8 11720482     10030580          2
 9 11811663     10030580          1
10 11862620     10030580          4
# … with 415,486 more rows

btdata: timing

system.time(comp_cites_data <- btdata(comp_cites))
   user  system elapsed 
  1.577   0.092   1.671 
summary(comp_cites_data)
Number of items: 27137 
Density of wins matrix: 0.0005642131 
Fully-connected: FALSE 
Number of fully-connected components: 11297 
Summary of fully-connected components: 
  Component size  Freq
1              1 11285
2              2    10
3              3     1
4          15829     1

btfit: timing

system.time(comp_cites_fit_mle <- btfit(comp_cites_data, 1))
   user  system elapsed 
  6.073   0.868   7.115 
system.time(comp_cites_fit_map <- btfit(comp_cites_data, 1.1)) 
   user  system elapsed 
 15.562   2.234  17.968 

btfit() timing simulations

Time in seconds to run btfit() given number of items and win matrix densities 0.5, 0.1, 0.01 and 0.001.

To improve

  • Allow additional algorithms (would involve rethinking arguments to btfit)

  • Leverage updates in dependent packages

  • Additional usability improvements

  • Write tests

  • Fix bugs

  • Return to CRAN

Thank you!

I look forward to your questions