Within Confidence
Why '91 Percent Certain' Can Mislead UFO Readers
Careful wording can explain uncertainty better than a precise-looking confidence percentage.
On this page
- How calibration works in AI systems
- Why percentages create false certainty
- Safer wording for public UFO reports
Page outline Jump by section
Introduction
A UFO case report that says an AI system is “91% certain” often sounds more scientific than a report saying “the available evidence strongly suggests a balloon”. In practice, the opposite can be true. A precise-looking percentage can hide weak data, missing context, poor calibration, or unresolved contradictions inside the case file. In UFO and UAP investigation, where sightings are frequently based on short videos, uncertain timestamps, single witnesses, or incomplete sensor data, percentages can create an illusion of certainty that the evidence does not justify.
That matters because public readers tend to interpret percentages as hard probabilities rather than provisional estimates. A “91% aircraft likelihood” may be read as near-proof even when the system has never been properly tested against comparable UFO reports. NASA’s independent UAP study warned that current analysis is limited by poor sensor calibration, incomplete metadata, and weak baseline data. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th… In that environment, careful wording is often more honest and more informative than an exact numerical score.
Why exact percentages sound more reliable than they are
Human beings instinctively treat numbers as objective. A sentence such as “AI confidence: 87%” appears precise, measurable, and technical. But in unresolved UFO cases, the underlying evidence is rarely precise enough to support that impression.
A typical public sighting report may include:
- A phone video with no reliable range estimate
- Uncertain direction of travel
- Missing EXIF metadata
- No corroborating radar
- No atmospheric measurements
- A witness recollection written hours later
- Compression artefacts or zoom distortion
Even if an AI system compares the report against thousands of earlier cases, the result is still constrained by those gaps. NASA’s UAP study repeatedly stressed that reliable interpretation depends on calibrated sensors, metadata quality, and multiple independent measurements. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th…
The danger is not merely mathematical. Exact percentages encourage readers to stop thinking critically. A report that states “92% likely to be a drone” subtly discourages questions such as:
- How reliable was the timestamp?
- Were local drone records checked?
- Was wind direction consistent?
- Did the object accelerate in ways inconsistent with drones?
- Was the AI tested on nighttime sightings specifically?
- Could the footage contain sensor artefacts?
The number compresses uncertainty into a neat-looking figure, even when uncertainty is the central feature of the case.
How calibration actually works in AI systems
In machine learning and probabilistic forecasting, “calibration” has a specific meaning. A calibrated system is not simply one that sounds confident. It is a system whose probabilities match real-world outcomes over time.
For example:
- If an AI assigns 80% probability to 100 comparable cases
- Roughly 80 of those cases should actually prove correct after verification
[That is calibration.]nasa.govupdate nasa shares uap independent study report names directorUPDATE: NASA Shares UAP Independent Study Report14 Sept 2023 — We found that NASA can help the whole-of-government UAP effort through sys…
Researchers commonly evaluate this using reliability diagrams and calibration metrics that compare predicted probabilities against observed outcomes. [PMC]pmc.ncbi.nlm.nih.govPMCStable reliability diagrams for probabilistic classifiers - PMCby T Dimitriadis · 2021 · Cited by 115 — The key diagnostic tool for ch… 2arXiv
The problem for UFO investigation is that calibration requires stable ground truth. That is difficult because many UFO reports never receive definitive resolution. Some remain unresolved permanently due to missing evidence rather than because they are extraordinary.
This creates a serious methodological problem:
- A system may appear accurate simply because it confidently labels ambiguous cases as mundane
- Or it may become overconfident because the training data itself contains uncertain classifications
Academic work on probabilistic forecasting has repeatedly shown that forecasting systems can appear well-tuned while still exhibiting systematic overconfidence. [ScienceDirect]sciencedirect.comScience Directphillip epfeiferby PE Pfeifer · 1994 · Cited by 102 — This paper argues that the apparent miscalibration and overconfidence found in empirical cal…
In UFO analysis, the risk is even greater because the category labels themselves are often probabilistic rather than proven.
UFO investigations rarely have clean ground truth
A weather forecast can later be checked against actual rainfall. A medical model can eventually be compared against diagnoses. UFO reports are different.
Many sightings fall into one of four messy categories:
Case outcomeWhat it really meansResolvedEvidence strongly supports an explanationPlausibly explainedA likely explanation exists but gaps remainUnresolvedEvidence insufficient for determinationAnomalousReported behaviour remains difficult to explain
These are not equally certain categories. Yet percentage-driven systems often flatten them into numerical confidence outputs that appear cleaner than the underlying evidence.
AARO, the Pentagon’s UAP investigation office, has publicly resolved many reports as balloons, birds, drones, satellites, and aircraft. [Joint Base San Antonio]jbsa.mildod examining unidentified anomalous phenomenaJoint Base San AntonioDOD examining unidentified anomalous phenomena15 Nov 2024 — "AARO has successfully resolved hundreds of cases in it… Meritalk But AARO also continues to maintain unresolved and anomalous categories where evidence remains incomplete or contradictory. [arxiv.org]arxiv.orgarXiv When are Bayesian model probabilities overconfident?UAP Reports Soar: DoD Office Receives 757 New Sightings15 Nov 2024 — “As of the publishing date of this report, all 174 cases have been f…[New York Post]nypost.comThe "all-domain anomaly resolution office" (AARO) identified 21 reports as "true anomalies" needing further investigation. Most sightings…
That distinction matters. “Unresolved” does not mean extraterrestrial. But neither does it mean that a precise probability estimate is justified.
A responsible AI-assisted case file should therefore distinguish between:
- Confidence in the explanation
- Confidence in the evidence quality
- Confidence in the sensor integrity
- Confidence in the witness reconstruction
Those are separate questions.
Why percentages create false certainty in public UFO reporting
Percentages are especially risky in public-facing UFO discussions because readers tend to interpret them literally.
Consider the difference between these two statements:
- “The object is 93% likely to be a balloon.”
- “The object’s movement and wind alignment are strongly consistent with balloon behaviour, although the footage is too limited for a definitive identification.”
The second statement is longer and less dramatic, but it communicates more useful information.
It tells the reader:
- Which evidence mattered
- Which explanation currently fits best
- That uncertainty remains
- Why certainty is limited
The percentage version hides all of that.
This problem becomes worse online, where numerical claims spread rapidly through screenshots, clips, and reposts stripped of methodological context. A “91% certainty” claim can quickly mutate into “AI proved it was a UFO” or “AI debunked the sighting completely”, depending on the audience.
In practice, many AI systems are poorly calibrated outside their training environment. Studies of probabilistic classifiers routinely show overconfidence problems, especially when systems encounter unfamiliar or low-quality inputs. [PMC]pmc.ncbi.nlm.nih.govPMCStable reliability diagrams for probabilistic classifiers - PMCby T Dimitriadis · 2021 · Cited by 115 — The key diagnostic tool for ch… 2arXiv
UFO reports are almost entirely composed of unfamiliar and low-quality inputs.
The strongest UFO investigations already use cautious language
Interestingly, official UAP investigators often avoid rigid percentages when communicating publicly.
AARO sometimes uses formulations such as:
- “Almost certainly”
- “High confidence”
- “Consistent with”
- “Likely”
- “Under analysis”
rather than pretending every case can be reduced to an exact number. [AARO]aaro.milAAROUAP ImageryAARO bases its assessment on the object's strong morphological consistency with other resolved imagery featuring balloons…
That style is not merely public-relations caution. It reflects the real structure of the evidence.
For example, AARO’s balloon resolutions frequently rely on multiple converging indicators:
- Morphological similarity
- Wind behaviour
- Motion profile
- Infrared appearance
- Comparison with previously resolved cases
The conclusion emerges from accumulated consistency, not from a magical percentage generator. [AARO]aaro.milUAP Records/Information Papers13 Feb 2026 — The official website for the All-domain Anomaly Resolution Office (AARO)… Official UAP Imagery · UAP…
NASA’s study team made a similar point indirectly by emphasising metadata quality, sensor calibration, and multi-sensor collection rather than automated certainty scores. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th…
Safer wording for AI-assisted UFO case files
A useful UFO investigation system should prioritise calibrated language over theatrical precision.
Better phrasing usually has four features:
It separates evidence quality from explanation quality
Instead of:
- “89% likely to be an aircraft”
Prefer:
- “The flight-path match is strong, but the visual evidence is limited.”
This distinction prevents readers from confusing a plausible explanation with a proven identification.
It explains why the model reached its conclusion
Instead of:
- “Confidence score: 82”
Prefer:
- “The object’s speed, navigation lighting pattern, and ADS-B correlation are consistent with commercial aircraft.”
This lets readers evaluate the reasoning themselves.
It preserves unresolved status when appropriate
Some sightings genuinely remain unresolved because the available evidence is inadequate.
That does not automatically imply anomaly. It simply reflects evidential limits.
Good wording includes formulations such as:
- “No explanation currently fits the available evidence cleanly.”
- “Insufficient data for definitive classification.”
- “The case remains unresolved after standard aviation and astronomy checks.”
It avoids pseudo-scientific precision
Words like:
- “possibly”
- “consistent with”
- “likely”
- “strongly consistent”
- “weakly supported”
- “not currently supported”
are often more truthful than decimal probabilities in messy real-world investigations.
A better model: explanation tiers instead of percentages
One practical alternative is a tiered explanation framework.
Rather than assigning a single percentage, an AI-assisted UFO report can classify explanations using calibrated language bands:
Assessment tierMeaningRuled outStrong contradictory evidenceWeak fitSome overlap but major inconsistenciesPlausibleReasonable match with notable gapsStrongly supportedMultiple lines of evidence alignUnresolvedInsufficient evidence for determinationPotentially anomalousAvailable evidence resists standard explanations
This approach has several advantages:
- It reflects investigative reality better
- It reduces false certainty
- It keeps uncertainty visible
- It encourages evidence review rather than score worship
- It scales better to mixed-quality reports
Most importantly, it helps readers understand that UFO investigation is usually a process of narrowing possibilities rather than producing absolute answers.
The real goal is trustworthy uncertainty
The strongest AI-assisted UFO systems are not the ones that sound most certain. They are the ones that communicate uncertainty honestly while still helping investigators organise evidence, test explanations, and eliminate mundane causes efficiently.
A poorly calibrated percentage can make a weak case look solved. Careful language can do the opposite: it can show exactly where the evidence is strong, where it is weak, and why a case remains open.
In UFO investigation, that distinction is not cosmetic. It is the difference between analysis and performance.
Endnotes
-
Source: science.nasa.gov
Title: Science Independent Study Team Report
Link: https://science.nasa.gov/wp-content/uploads/2023/09/uap-independent-study-team-final-report.pdfSource snippet
NASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th...
Published: September 13, 2023
-
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7923594/Source snippet
PMCStable reliability diagrams for probabilistic classifiers - PMCby T Dimitriadis · 2021 · Cited by 115 — The key diagnostic tool for ch...
-
Source: arxiv.org
Title: arXiv Metrics of calibration for probabilistic predictions
Link: https://arxiv.org/abs/2205.09680 -
Source: sciencedirect.com
Title: Science Directphillip e
Link: https://www.sciencedirect.com/science/article/pii/S074959788471034X/pdf?md5=54e001206df2864d0515d0084369a737&pid=1-s2.0-S074959788471034X-main.pdfSource snippet
pfeiferby PE Pfeifer · 1994 · Cited by 102 — This paper argues that the apparent miscalibration and overconfidence found in empirical cal...
-
Source: arxiv.org
Title: arXiv When are Bayesian model probabilities overconfident?
Link: https://arxiv.org/abs/2003.04026 -
Source: meritalk.com
Title: uap reports soar dod office receives 757 new sightings
Link: https://www.meritalk.com/articles/uap-reports-soar-dod-office-receives-757-new-sightings/Source snippet
UAP Reports Soar: DoD Office Receives 757 New Sightings15 Nov 2024 — “As of the publishing date of this report, all 174 cases have been f...
-
Source: aaro.mil
Link: https://www.aaro.mil/UAP-Cases/Official-UAP-Imagery/Source snippet
AAROUAP ImageryAARO bases its assessment on the object's strong morphological consistency with other resolved imagery featuring balloons...
-
Source: aaro.mil
Title: UAP Records
Link: https://www.aaro.mil/UAP-Records/Source snippet
/Information Papers13 Feb 2026 — The official website for the All-domain Anomaly Resolution Office (AARO)... Official UAP Imagery · UAP...
-
Source: nasa.gov
Link: https://www.nasa.gov/Source snippet
NASANational Aeronautics and Space Administration. NASA explores the unknown in air and space, innovates for the benefit of humanity, and...
-
Source: science.nasa.gov
Link: https://science.nasa.gov/uap/Source snippet
nasa.govUAP9 Jun 2022 — The UAP Independent Study shall report on the [following]({{ 'following-moon/' | relative_url }}) questions: What types of scientific data currently collec...
-
Source: nasa.gov
Title: update nasa shares uap independent study report names director
Link: https://www.nasa.gov/news-release/update-nasa-shares-uap-independent-study-report-names-director/Source snippet
UPDATE: NASA Shares UAP Independent Study Report14 Sept 2023 — We found that NASA can help the whole-of-government UAP effort through sys...
-
Source: sciencedirect.com
Title: How does training improve individual forecasts?
Link: https://www.sciencedirect.com/science/article/abs/pii/S0169207024001298Source snippet
VK Motahhar · 2025 · Cited by 1 — Biases in human forecasters lead to poor calibration. We assess how formal training affe...
-
Source: aaro.mil
Link: https://www.aaro.mil/Source snippet
AARO HomeUnidentified Anomalous Phenomena (UAP) means (A) airborne objects that are not immediately identifiable; (B) transmedium objects...
-
Source: arxiv.org
Link: https://arxiv.org/html/2506.00125v1Source snippet
1 Introduction30 May 2025 — A system designed for the comprehensive scientific study of aerial phenomena which integrates multiple sensor...
Published: May 2025
-
Source: arxiv.org
Link: https://arxiv.org/html/2407.03167v1Source snippet
Tail calibration of probabilistic forecastsIn this work, we introduce a general notion of tail calibration for probabilistic forecasts, w...
-
Source: jbsa.mil
Title: dod examining unidentified anomalous phenomena
Link: https://www.jbsa.mil/News/News/Article/3966080/dod-examining-unidentified-anomalous-phenomena/Source snippet
Joint Base San AntonioDOD examining unidentified anomalous phenomena15 Nov 2024 — "AARO has successfully resolved hundreds of cases in it...
-
Source: nypost.com
Link: https://nypost.com/2024/11/14/us-news/pentagon-says-nearly-two-dozen-ufo-sightings-cant-be-explained-true-anomalies/Source snippet
The "all-domain anomaly resolution office" (AARO) identified 21 reports as "true anomalies" needing further investigation. Most sightings...
-
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/NASASource snippet
NASAThe National Aeronautics and Space Administration (NASA /ˈnæsə/) is an independent agency of the U.S. federal government responsib...
-
Source: ore.exeter.ac.uk
Link: https://ore.exeter.ac.uk/ndownloader/files/56827487Source snippet
Assessment of Forecast Calibrationby H Bashaykh · 2022 · Cited by 2 — the case of the probability forecast of a binary outcome, a reliabi...
Additional References
-
Source: war.gov
Title: dr jon kosloski director aaro media roundtable on the fy24 consolidated annual
Link: https://www.war.gov/News/Transcripts/Transcript/Article/3965734/dr-jon-kosloski-director-aaro-media-roundtable-on-the-fy24-consolidated-annual/Source snippet
Jon Kosloski, Director, AARO, Media Roundtable on the...14 Nov 2024 — AARO has successfully resolved hundreds of cases in its holdings t...
-
Source: reddit.com
Link: https://www.reddit.com/r/UFOs/comments/1gv8xak/aaro_has_resolved_the_go_fast_uap/Source snippet
AARO has resolved the "Go Fast" UAP: r/UFOsWe surmise based on the information you have access to that they're just balloons. Filled wit...
-
Source: howtolearnmachinelearning.com
Link: https://howtolearnmachinelearning.com/articles/brier-score/Source snippet
Brier Score in Machine Learning: Definition and Use CasesThe Brier Score measures the accuracy of probabilistic predictions · Lower score...
-
Source: medium.com
Link: https://medium.com/data-science/model-calibration-explained-a-visual-guide-with-code-examples-for-beginners-55f368bafe72Source snippet
Model Calibration | TDS ArchiveThe overconfident model makes extreme predictions (0.0 or 1.0), while the underconfident model stays close...
-
Source: studocu.com
Link: https://www.studocu.com/en-us/document/harvard-medical-school/estadistica/nasa-uap-independent-study-team-final-report-key-findings-and-recommendations/157385671Source snippet
NASA UAP Independent Study Team Final ReportAt present, analysis of UAP data is hampered by poor sensor calibration, the. lack of multipl...
-
Source: reddit.com
Link: https://www.reddit.com/r/UFOs/comments/16ik6x5/summary_of_nasa_unidentified_anomalous_phenomena/Source snippet
SUMMARY OF NASA UNIDENTIFIED ANOMALOUS...- **Challenges in UAP Data Deciphering**: The efficacy of UAP data analysis is impeded by issue...
-
Source: neacc.meteoinfo.ru
Link: https://neacc.meteoinfo.ru/training/103-lecture-4-on-forecast-downscalingSource snippet
4 on forecast downscalingThe reliability diagram plots the observed frequency against the forecast probability, where the range of foreca...
-
Source: facebook.com
Link: https://www.facebook.com/newshour/posts/the-us-in-2022-launched-the-all-domain-anomaly-resolution-office-aaro-as-part-of/1149122250416353/Source snippet
The U.S. in 2022 launched the All-Domain Anomaly...✓ AARO has looked into over 800 UAP cases. Most turn out to be explainable (like dron...
-
Source: eurointervention.pcronline.com
Title: a guide to interpreting and assessing the performance of prediction models
Link: https://eurointervention.pcronline.com/article/a-guide-to-interpreting-and-assessing-the-performance-of-prediction-modelsSource snippet
guide to interpreting and assessing the performance of...by V Farooq · Cited by 8 — The Brier score is a quadratic scoring rule based on...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=TQcqOW39kskSource snippet
Unidentified Anomalous Phenomena Independent Study ReportNASA commissioned an independent study team to examine unidentified anomalous ph...
Amazon book picks
Further Reading
Books and field guides related to Why '91 Percent Certain' Can Mislead UFO Readers. Use these as the next step if you want deeper reading beyond the article.
Topic Tree



