Methodology

How we score accounts

Every OSIRIS rating is built from scored individual posts. This document explains what we score, how we classify posts, how scores are calculated, and what the grades mean. The methodology is published in full and updated if the approach changes.

The Content Integrity Score (CIS)

The CIS is the primary grade displayed on every scorecard. It is a letter grade from A to F, reflecting a weighted average of post accuracy ratings across the reviewed sample. It is a measure of overall content reliability — combining factual accuracy with source transparency and the presence of adequate context.

The CIS is separate from the Factuality Score. A post can have accurate factual claims while still lacking sourcing or context. The CIS captures both. The Factuality Score captures only raw accuracy.

Grade Meaning Score range
A Very high reliability. Strong sourcing, high accuracy, appropriate context. 90–100
B Good reliability. Generally accurate with minor sourcing gaps or occasional contextual shortfalls. 75–89
C Mixed reliability. Noticeable accuracy issues, weak sourcing, or consistent framing problems. 55–74
D Poor reliability. Frequent inaccuracies or systematic sourcing failures. 35–54
F Very poor reliability. Documented disinformation, fabricated content, or consistent falsehoods. 0–34

Grades may carry + or – modifiers within each band.

Post accuracy ratings

Each reviewed post is given one of the following accuracy ratings:

  • Accurate — Factually correct and appropriately sourced.
  • Mostly Accurate — Correct in substance with minor gaps or caveats needed.
  • Mixed Content — Factual data combined with personal interpretation or editorial framing presented as established fact.
  • Inaccurate — Contains significant factual errors.
  • False — Demonstrably false or deliberate disinformation.
  • Unverifiable — Claims cannot currently be independently verified.
  • Pending — Awaiting further information before a rating can be assigned.

Pending and Unverifiable posts are excluded from the weighted average. They do not count for or against the account. The scorecard shows how many posts in the sample fall into each category so readers can judge how thoroughly an account has been assessed.

Post categories

Posts are categorised by type before scoring. This allows accuracy to be assessed in context — a battlefield update and an opinion post are not the same kind of claim.

  • Factual reporting — direct claims about events
  • Analysis — interpretations of events or data
  • Media posts — imagery or video with sourcing assessed separately
  • Mapping or data visualisation
  • Commentary or opinion — explicitly editorial posts
  • Repost or quote — accuracy of the original plus any framing applied
  • Breaking news — assessed with appropriate recency weighting

Source quality and media integrity

Where an account sources its claims, the quality of that sourcing is assessed. Primary sources, verifiable official records, and established news organisations carry more weight than anonymous sources or unattributed claims.

Media integrity is assessed separately from textual accuracy. Imagery and video are checked for the following:

  • AI-generated content — imagery or video produced or substantially altered by generative tools
  • Archival media — footage or images presented as contemporary when they are not
  • Misattributed media — genuine media from a different event or location
  • Verified media — confirmed original and contextually accurate
  • Unverified media — cannot confirm provenance at time of review

Position and Stance Tags

Editorial positioning is recorded separately from accuracy. Tags are applied to reflect an account's observable stance rather than its claimed neutrality. These tags describe editorial direction — they do not affect the CIS grade.

Common position tags include regional or national alignment (Pro-Ukraine, Pro-Russia, Pro-Israel, Pro-Palestine, Western-aligned, and so on) and broader editorial tags (Anti-NATO, Anti-West, Government-aligned).

Stance Signals distinguish between factual accounts, editorial accounts, and accounts that appear to advocate for a position:

  • Factual — Posts are primarily factual with minimal observable editorial positioning.
  • Editorial — Posts include regular interpretation or commentary presented alongside factual content.
  • Advocacy — The account clearly promotes a position or agenda.

Recency weighting

More recent posts are weighted more heavily than older ones. This reflects the fact that account behaviour can change over time — an account that was accurate two years ago may not be now, and vice versa. When an account is re-reviewed with a new post sample, the score updates to reflect the current weighting.

Sample size and confidence

Every scorecard shows the number of posts the rating is based on. A score based on a small sample carries less confidence than one based on a large sample. Accounts are reviewed periodically and scores updated as the sample grows. The scorecard always shows the current sample size so readers can calibrate how much weight to give the rating.

What OSIRIS does not do

OSIRIS does not offer accounts a right of reply before publication. Accounts are not notified when they are being rated. Advertisers cannot influence scores. No account can pay to be reviewed, removed, or re-reviewed.

The methodology is updated as the project matures. Changes are documented here. Transparency about those changes is a commitment of this project.