Data Sources and Update Timing

Where Macfax data comes from and how the pipeline works

Macfax collects box-score data for every Division I men's basketball game from public NCAA and ESPN data sources. That raw data flows through a multi-stage computation pipeline — from possession estimates and four-factor aggregation through iterative opponent adjustment, resume metrics, and player evaluation — before ratings are published. This page explains the pipeline, what data is and is not available, and the current update cadence.

What It Measures

This page covers data provenance and process, not a specific metric. It explains what inputs power Macfax, how games are ingested and validated, what stages the computation pipeline runs through, and when and how often ratings are updated.

Why It Matters

Analytics are only as reliable as the data and processes behind them. Understanding where data comes from, what gets included and excluded, and how recently ratings were last computed helps users interpret Macfax numbers with appropriate confidence — and know when to treat a number as stable versus provisional.

How to Interpret

Each metric on Macfax reflects the most recently completed pipeline run. If a game was played very recently, it may not yet be incorporated — ratings do not update in real time. During tournament stretches, the pipeline is typically run more frequently. The computation pipeline runs in sequential stages, so all metrics on the site reflect the same common data snapshot from the most recent update.

Technical Notes

  • Box score data is collected from public NCAA and ESPN data sources, with ESPN used as a secondary source when the primary source is unavailable for a given game.
  • Team name matching uses fuzzy matching to reconcile naming variations between data sources. Canonical team identifiers are maintained across all external sources.
  • The pipeline runs in sequential stages: game ingestion → raw four-factor aggregation → national averages → iterative opponent adjustment → adjusted four factors and FFI → resume metrics (NET, SOR, SOS, WAB) → player evaluation. Each stage depends on the output of prior stages.
  • Possession calculations use the standard formula: FGA − OREB + TOV + 0.475 × FTA. The 0.475 multiplier accounts for and-one plays and technical fouls.
  • Opponent adjustment runs iteratively — each team's ratings depend on their opponents' ratings, which depend on their opponents' ratings, and so on. The process runs until ratings converge.
  • AP Poll Week 6 rankings are loaded separately and incorporated as a reference signal in applicable metrics.
  • NCAA NET Rankings are fetched from the NCAA's published data and displayed for reference. Macfax does not compute or control the NET.
  • Tournament bracket information (seeds and regions) is loaded separately when the selection committee releases the bracket.
  • All ingestion and computation passes are idempotent — re-running the pipeline for a given day does not double-count games or corrupt prior results.
Known Limitations
  • Macfax does not collect injury data, roster availability, or lineup changes. The pipeline only knows what happened on the court in completed games.
  • Play-by-play data beyond scoring sequences is not currently stored. Box-score-level efficiency calculations are estimates; possession counts from play-by-play would be more precise.
  • Data from exhibition games and scrimmages is excluded from ratings.
  • Forfeit and administrative results may be handled differently from normal game results.
  • Source data occasionally contains errors — incorrect scores, missing players, or misidentified teams. Corrections are applied on the next pipeline run when the source corrects the data.
  • Ratings do not update in real time. They reflect the most recently completed pipeline run. Very recently completed games may not be incorporated yet.
  • Non-Division I opponents are excluded from all rating computations. Games against non-D1 opponents do not count toward adjusted ratings, WAB, or SOR.

Related Methodology

Last updated: 2025-11 · Version 2.1