An overview of BradleyTerryScalable

The Bradley-Terry model

The Bradley-Terry model can…

rank a set of items…
based on a series of pairwise comparisons…
where each comparison results in a win/loss or a draw

The model

The Bradley-Terry probability that item beats item is

where is a strength parameter for player , and .

The comparison graph

The MLE exists and is finite whenever the comparison graph is fully connected.

Fitting the model

MLE

MAP

where is the number of wins for item and is the number of comparisons between item and item and where and are the shape and rate parameters of a gamma-distributed prior on : .

Fitting the model

MLE

MAP

where is the number of wins for item and is the number of comparisons between item and item and where and are the shape and rate parameters of a gamma-distributed prior on : .

`BradleyTerryScalable`

Aims

Fit the Bradley-Terry model to large and sparse data sets
Has to be fast enough
Has to be able to deal with cases when the comparison graph is not fully connected
Easy to use, both in interface and workflow

Workflow: data

btdata(x) to create object of class "btdata"
- x can be a data frame, graph, matrix or contigency table
- may need to call codes_to_counts() first
- summary(btdata)
- select_components(btdata, subset)

Workflow: fit

btfit(btdata, a) to fit model and create object of class "btfit"
- If a = 1, finds MLE on each fully connected component
- If a > 1, finds the MAP estimate of
- Methods for btfit object:
  - summary, fitted, coef, vcov, simulate
- btprob(object) for Bradley-Terry probabilities

Examples

library(BradleyTerryScalable)
citations

              citing
cited          Biometrika Comm Statist JASA JRSS-B
  Biometrika          714          730  498    221
  Comm Statist         33          425   68     17
  JASA                320          813 1072    142
  JRSS-B              284          276  325    188

citations_btdata <- btdata(citations)
summary(citations_btdata)

Number of items: 4 
Density of wins matrix: 1 
Fully-connected: TRUE

toy_data

   player1 player2 outcome
1      Cyd     Amy      W1
2      Amy     Ben       D
3      Ben     Eve      W2
4      Cyd     Dan      W2
5      Ben     Dan       D
6      Dan     Eve      W2
7      Fin     Eve      W2
8      Fin     Gal      W2
9      Fin     Han      W2
10     Eve     Gal      W1
11     Fin     Gal       D
12     Han     Gal      W1
13     Han     Gal      W2
14     Amy     Dan      W1
15     Cyd     Amy      W1
16     Ben     Dan       D
17     Dan     Amy      W2

toy_btdata <- toy_data |>
  codes_to_counts(c("W1", "W2", "D")) |>
  btdata()
summary(toy_btdata)

Number of items: 8 
Density of wins matrix: 0.25 
Fully-connected: FALSE 
Number of fully-connected components: 3 
Summary of fully-connected components: 
  Component size Freq
1              1    1
2              3    1
3              4    1

toy_fit_MAP <- btfit(toy_btdata, a = 1.1)
summary(toy_fit_MAP)

$call
btfit(btdata = toy_btdata, a = 1.1)

$item_summary
# A tibble: 8 × 3
  component    item  estimate
  <chr>        <chr>    <dbl>
1 full_dataset Eve     1.90  
2 full_dataset Cyd     0.472 
3 full_dataset Han     0.245 
4 full_dataset Amy    -0.0766
5 full_dataset Gal    -0.102 
6 full_dataset Ben    -0.423 
7 full_dataset Dan    -0.536 
8 full_dataset Fin    -1.48  

$component_summary
# A tibble: 1 × 4
  component    num_items iters converged
  <chr>            <int> <int> <lgl>    
1 full_dataset         8   101 TRUE

toy_fit_MLE <- btfit(toy_btdata, a = 1)
summary(toy_fit_MLE, SE = TRUE)

$call
btfit(btdata = toy_btdata, a = 1)

$item_summary
# A tibble: 7 × 4
  component item  estimate    SE
  <chr>     <chr>    <dbl> <dbl>
1 2         Han     0.696  0.911
2 2         Gal     0.413  0.768
3 2         Fin    -1.11   1.05 
4 3         Cyd     0.592  0.991
5 3         Amy     0.0325 0.699
6 3         Ben    -0.243  0.944
7 3         Dan    -0.382  0.712

$component_summary
# A tibble: 2 × 4
  component num_items iters converged
  <chr>         <int> <int> <lgl>    
1 2                 3     6 TRUE     
2 3                 4    10 TRUE

`btprob(object)`

Gives the Bradley-Terry probabilities

citations |>
  btdata() |>
  btfit(1) |>
  btprob() |>
  round(3)

4 x 4 sparse Matrix of class "dgCMatrix"
              citing
cited          JRSS-B Biometrika  JASA Comm Statist
  JRSS-B        .          0.567 0.679        0.962
  Biometrika    0.433      .     0.618        0.950
  JASA          0.321      0.382 .            0.922
  Comm Statist  0.038      0.050 0.078        .

`btprob(object)`

Gives the Bradley-Terry probabilities

citations |>
  btdata() |>
  btfit(1) |>
  btprob(as_df = TRUE)

# A tibble: 6 × 5
  component    cited      citing       prob1wins prob2wins
  <chr>        <chr>      <chr>            <dbl>     <dbl>
1 full_dataset JRSS-B     Biometrika       0.567    0.433 
2 full_dataset JRSS-B     JASA             0.679    0.321 
3 full_dataset Biometrika JASA             0.618    0.382 
4 full_dataset JRSS-B     Comm Statist     0.962    0.0384
5 full_dataset Biometrika Comm Statist     0.950    0.0498
6 full_dataset JASA       Comm Statist     0.922    0.0780

A real-world example

comp_cites

# A tibble: 415,496 × 3
   cited_pdpass citing_pdpass count
   <chr>        <chr>         <dbl>
 1 10032645     10030580          2
 2 10083166     10030580          4
 3 10170329     10030580          2
 4 10191810     10030580          2
 5 10223803     10030580          2
 6 10254226     10030580          1
 7 11490383     10030580          2
 8 11720482     10030580          2
 9 11811663     10030580          1
10 11862620     10030580          4
# … with 415,486 more rows

btdata: timing

system.time(comp_cites_data <- btdata(comp_cites))

   user  system elapsed 
  1.577   0.092   1.671

summary(comp_cites_data)

Number of items: 27137 
Density of wins matrix: 0.0005642131 
Fully-connected: FALSE 
Number of fully-connected components: 11297 
Summary of fully-connected components: 
  Component size  Freq
1              1 11285
2              2    10
3              3     1
4          15829     1

btfit: timing

system.time(comp_cites_fit_mle <- btfit(comp_cites_data, 1))

   user  system elapsed 
  6.073   0.868   7.115

system.time(comp_cites_fit_map <- btfit(comp_cites_data, 1.1))

   user  system elapsed 
 15.562   2.234  17.968

btfit() timing simulations

Time in seconds to run btfit() given number of items and win matrix densities 0.5, 0.1, 0.01 and 0.001.

To improve

Allow additional algorithms (would involve rethinking arguments to btfit)
Leverage updates in dependent packages
Additional usability improvements
Write tests
Fix bugs
Return to CRAN

An overview of `BradleyTerryScalable`

The Bradley-Terry model

The Bradley-Terry model can…

The model

The comparison graph

Fitting the model

Fitting the model

`BradleyTerryScalable`

Aims

Workflow: data

Workflow: fit

Examples

`btprob(object)`

`btprob(object)`

A real-world example

btdata: timing

btfit: timing

btfit() timing simulations

To improve

Thank you!

I look forward to your questions