Home/Methodology/How We Score

Accuracy & Confidence

How SportsReflector Scores Athletic Technique

Most AI coaching apps publish marketing claims. We publish numbers. This page documents our pose estimation accuracy rates, form score confidence intervals, scoring weight breakdowns, and validation methodology — the same data we use internally to evaluate model performance.

Top-Line Accuracy Metrics

These figures represent performance on our held-out test set, not training data. All accuracy measurements use standard computer vision benchmarks ([email protected] for pose estimation). Repeatability and inter-rater agreement are measured independently.

94.4%
Landmark Detection Accuracy (avg)
[email protected] on held-out test set (n=12,400 frames)
±3.0 pts
Form Score Repeatability
Same video analyzed 10× across 500 clips; SD of scores
87.3%
Inter-Rater Agreement (AI vs. expert coach)
Cohen's κ = 0.81 across 1,200 scored movements

Full System Performance Metrics

All metrics are measured on held-out data not seen during training. Validation datasets include proprietary sport-specific footage collected under controlled and real-world conditions.

MetricValueMeasurement Method
Landmark Detection Accuracy (avg)94.4%[email protected] on held-out test set (n=12,400 frames)
Form Score Repeatability±3.0 ptsSame video analyzed 10× across 500 clips; SD of scores
Inter-Rater Agreement (AI vs. expert coach)87.3%Cohen's κ = 0.81 across 1,200 scored movements
False Positive Rate (injury risk flags)6.2%Validated against certified PT assessments (n=340)
False Negative Rate (injury risk flags)8.9%Validated against certified PT assessments (n=340)
Latency (live analysis mode)41ms avgiPhone 14 Pro, 30fps, measured over 10,000 frames
Minimum Recommended Resolution720p (1280×720)Below this threshold, wrist/ankle accuracy drops >8%
Minimum Recommended Frame Rate30fps (60fps for fast sports)Boxing, tennis serve, golf swing require 60fps for full accuracy
Model Parameters~6.2MOptimized MobileNet-based architecture for on-device inference
Training Dataset Size~480K annotated framesCombination of public datasets (COCO, MPII) and proprietary sport-specific data

Per-Exercise Pose Estimation Accuracy

Accuracy varies by exercise due to differences in movement speed, joint occlusion, and body position complexity. Fast movements (boxing, tennis serve) require higher frame rates for full accuracy. The following table reports landmark detection accuracy ([email protected]) and form score variance for each exercise category.

Exercise / MovementMin. FPSLandmark AccuracyScore VarianceJoints Tracked
Bench Press3094.2%±2.8 pts17
Back Squat3096.1%±2.1 pts19
Deadlift3095.4%±2.4 pts18
Golf Swing6093.7%±3.1 pts21
Tennis Serve6092.9%±3.4 pts22
Basketball Free Throw3095.8%±2.2 pts16
Push-Up3097.3%±1.6 pts14
Overhead Press3094.9%±2.5 pts18
Pull-Up3093.1%±3.0 pts15
Boxing Jab6091.8%±3.7 pts20
Running Gait6094.6%±2.6 pts23
Pickleball Dink6092.4%±3.2 pts19

[email protected]: Percentage of Correct Keypoints within 50% of head segment length. Industry standard metric for human pose estimation benchmarking.

Scoring Weight Breakdowns by Sport

Each sport's technique score is a weighted composite of multiple biomechanical parameters. Weights are derived from a combination of coaching literature, biomechanics research, and empirical correlation with expert coach ratings. The following breakdowns show exactly what contributes to each sport's score.

Golf Swing

Hip-Shoulder Separation (X-Factor)
22%
Spine Angle Maintenance
18%
Weight Transfer Timing
16%
Clubface Angle at Impact
15%
Follow-Through Completion
14%
Knee Flex at Address
15%

Back Squat

Depth (hip crease below knee)
25%
Knee Tracking (valgus/varus)
22%
Torso Angle
18%
Bar Path Consistency
15%
Symmetry (L/R)
12%
Descent/Ascent Tempo
8%

Tennis Serve

Trophy Position Alignment
20%
Hip-Shoulder Separation
19%
Elbow Height at Contact
18%
Pronation Timing
17%
Knee Bend (leg drive)
14%
Follow-Through
12%

Basketball Free Throw

Elbow Alignment (under ball)
28%
Release Angle (45–52°)
24%
Follow-Through (wrist snap)
20%
Foot Alignment
15%
Balance at Release
13%

Known Limitations & Edge Cases

We document limitations openly. Understanding where the model performs below average helps athletes get the most accurate results.

Low-Light Conditions

Accuracy drops approximately 8–12% in environments below 200 lux. Outdoor night training and poorly lit gyms are the primary affected contexts.

Loose or Baggy Clothing

Clothing that obscures joint landmarks (e.g., very baggy shorts covering the knee) reduces knee and hip tracking accuracy by up to 9%.

Extreme Camera Angles

Angles beyond 45° from the sagittal or frontal plane reduce accuracy for the obscured side. Overhead and worm's-eye views are not supported.

Multiple People in Frame

The model analyzes the primary subject (largest bounding box). Accuracy degrades if a second person occupies more than 25% of the frame.

Very High-Speed Movements

Movements exceeding ~4 m/s limb velocity (e.g., a fastball pitch) require 120fps+ for full accuracy. At 60fps, score variance increases to ±5–6 points.

Injury Risk Flags Are Not Medical Diagnoses

Injury risk assessment identifies movement patterns associated with elevated risk. It is not a medical device and does not diagnose injuries or medical conditions.

Validation Methodology

Pose Estimation Validation: Landmark accuracy is measured using the [email protected] metric on a held-out test set of 12,400 annotated frames across 22 sports and exercise categories. Ground truth annotations were produced by two independent annotators with disagreements resolved by a third. The test set was not used during model training or hyperparameter tuning.

Form Score Repeatability: The same video clip was analyzed 10 times each for 500 clips spanning all supported movements. Score variance (±3.0 points average) represents the standard deviation across repeated analyses of identical input. This measures model determinism, not accuracy.

Inter-Rater Agreement: 1,200 movements were independently scored by the SportsReflector AI and by certified coaches (NSCA-CSCS, USPTA, PGA-certified instructors). Agreement was calculated using Cohen's κ (weighted), yielding κ = 0.81 — classified as "almost perfect agreement" under the Landis & Koch scale. Disagreements were most common in borderline cases (scores within 5 points of a threshold).

Injury Risk Validation: Injury risk flags were validated against assessments by certified physical therapists (DPT) on a dataset of 340 movement samples. False positive rate (flagged as risk when PT assessed as safe) was 6.2%. False negative rate (not flagged when PT identified a risk pattern) was 8.9%. These figures are consistent with published accuracy rates for clinical movement screening tools.

Citing This Data

Researchers, journalists, and developers are welcome to cite this accuracy data. Please reference the source URL and date accessed.

SportsReflector. (2026). AI Accuracy & Confidence Intervals — How We Score Athletic Technique. https://sportsreflector.com/how-we-score. Accessed March 2026.