Why ‘91 Percent Certain’ Can Mislead UFO Readers

Introduction

A UFO case report that says an AI system is “91% certain” often sounds more scientific than a report saying “the available evidence strongly suggests a balloon”. In practice, the opposite can be true. A precise-looking percentage can hide weak data, missing context, poor calibration, or unresolved contradictions inside the case file. In UFO and UAP investigation, where sightings are frequently based on short videos, uncertain timestamps, single witnesses, or incomplete sensor data, percentages can create an illusion of certainty that the evidence does not justify.

Calibration illustration 1 That matters because public readers tend to interpret percentages as hard probabilities rather than provisional estimates. A “91% aircraft likelihood” may be read as near-proof even when the system has never been properly tested against comparable UFO reports. NASA’s independent UAP study warned that current analysis is limited by poor sensor calibration, incomplete metadata, and weak baseline data. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th…Published: September 13, 2023 In that environment, careful wording is often more honest and more informative than an exact numerical score.

Why exact percentages sound more reliable than they are

Human beings instinctively treat numbers as objective. A sentence such as “AI confidence: 87%” appears precise, measurable, and technical. But in unresolved UFO cases, the underlying evidence is rarely precise enough to support that impression.

A typical public sighting report may include:

A phone video with no reliable range estimate
Uncertain direction of travel
Missing EXIF metadata
No corroborating radar
No atmospheric measurements
A witness recollection written hours later
Compression artefacts or zoom distortion

Even if an AI system compares the report against thousands of earlier cases, the result is still constrained by those gaps. NASA’s UAP study repeatedly stressed that reliable interpretation depends on calibrated sensors, metadata quality, and multiple independent measurements. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th…Published: September 13, 2023

The danger is not merely mathematical. Exact percentages encourage readers to stop thinking critically. A report that states “92% likely to be a drone” subtly discourages questions such as:

How reliable was the timestamp?
Were local drone records checked?
Was wind direction consistent?
Did the object accelerate in ways inconsistent with drones?
Was the AI tested on nighttime sightings specifically?
Could the footage contain sensor artefacts?

The number compresses uncertainty into a neat-looking figure, even when uncertainty is the central feature of the case.

How calibration actually works in AI systems

In machine learning and probabilistic forecasting, “calibration” has a specific meaning. A calibrated system is not simply one that sounds confident. It is a system whose probabilities match real-world outcomes over time.

For example:

If an AI assigns 80% probability to 100 comparable cases
Roughly 80 of those cases should actually prove correct after verification

[That is calibration.]nasa.govupdate nasa shares uap independent study report names directorUPDATE: NASA Shares UAP Independent Study Report14 Sept 2023 — We found that NASA can help the whole-of-government UAP effort through sys…

Researchers commonly evaluate this using reliability diagrams and calibration metrics that compare predicted probabilities against observed outcomes. [PMC]pmc.ncbi.nlm.nih.govPMCStable reliability diagrams for probabilistic classifiers - PMCby T Dimitriadis · 2021 · Cited by 115 — The key diagnostic tool for ch… 2arXiv

The problem for UFO investigation is that calibration requires stable ground truth. That is difficult because many UFO reports never receive definitive resolution. Some remain unresolved permanently due to missing evidence rather than because they are extraordinary.

This creates a serious methodological problem:

A system may appear accurate simply because it confidently labels ambiguous cases as mundane
Or it may become overconfident because the training data itself contains uncertain classifications

Academic work on probabilistic forecasting has repeatedly shown that forecasting systems can appear well-tuned while still exhibiting systematic overconfidence. [ScienceDirect]sciencedirect.comScience Directphillip epfeiferby PE Pfeifer · 1994 · Cited by 102 — This paper argues that the apparent miscalibration and overconfidence found in empirical cal…

In UFO analysis, the risk is even greater because the category labels themselves are often probabilistic rather than proven.

UFO investigations rarely have clean ground truth

A weather forecast can later be checked against actual rainfall. A medical model can eventually be compared against diagnoses. UFO reports are different.

Many sightings fall into one of four messy categories:

Case outcomeWhat it really meansResolvedEvidence strongly supports an explanationPlausibly explainedA likely explanation exists but gaps remainUnresolvedEvidence insufficient for determinationAnomalousReported behaviour remains difficult to explain

These are not equally certain categories. Yet percentage-driven systems often flatten them into numerical confidence outputs that appear cleaner than the underlying evidence.

AARO, the Pentagon’s UAP investigation office, has publicly resolved many reports as balloons, birds, drones, satellites, and aircraft. [Joint Base San Antonio]jbsa.mildod examining unidentified anomalous phenomenaJoint Base San AntonioDOD examining unidentified anomalous phenomena15 Nov 2024 — "AARO has successfully resolved hundreds of cases in it… Meritalk But AARO also continues to maintain unresolved and anomalous categories where evidence remains incomplete or contradictory. [arxiv.org]arxiv.orgarXiv When are Bayesian model probabilities overconfident?UAP Reports Soar: DoD Office Receives 757 New Sightings15 Nov 2024 — “As of the publishing date of this report, all 174 cases have been f…[New York Post]nypost.comThe "all-domain anomaly resolution office" (AARO) identified 21 reports as "true anomalies" needing further investigation. Most sightings…

That distinction matters. “Unresolved” does not mean extraterrestrial. But neither does it mean that a precise probability estimate is justified.

A responsible AI-assisted case file should therefore distinguish between:

Confidence in the explanation
Confidence in the evidence quality
Confidence in the sensor integrity
Confidence in the witness reconstruction

Those are separate questions.

Why percentages create false certainty in public UFO reporting

Percentages are especially risky in public-facing UFO discussions because readers tend to interpret them literally.

Consider the difference between these two statements:

“The object is 93% likely to be a balloon.”
“The object’s movement and wind alignment are strongly consistent with balloon behaviour, although the footage is too limited for a definitive identification.”

The second statement is longer and less dramatic, but it communicates more useful information.

It tells the reader:

Which evidence mattered
Which explanation currently fits best
That uncertainty remains
Why certainty is limited

The percentage version hides all of that.

This problem becomes worse online, where numerical claims spread rapidly through screenshots, clips, and reposts stripped of methodological context. A “91% certainty” claim can quickly mutate into “AI proved it was a UFO” or “AI debunked the sighting completely”, depending on the audience.

In practice, many AI systems are poorly calibrated outside their training environment. Studies of probabilistic classifiers routinely show overconfidence problems, especially when systems encounter unfamiliar or low-quality inputs. [PMC]pmc.ncbi.nlm.nih.govPMCStable reliability diagrams for probabilistic classifiers - PMCby T Dimitriadis · 2021 · Cited by 115 — The key diagnostic tool for ch… 2arXiv

UFO reports are almost entirely composed of unfamiliar and low-quality inputs.

Calibration illustration 2

The strongest UFO investigations already use cautious language

Interestingly, official UAP investigators often avoid rigid percentages when communicating publicly.

AARO sometimes uses formulations such as:

“Almost certainly”
“High confidence”
“Consistent with”
“Likely”
“Under analysis”

rather than pretending every case can be reduced to an exact number. [AARO]aaro.milAAROUAP ImageryAARO bases its assessment on the object's strong morphological consistency with other resolved imagery featuring balloons…

That style is not merely public-relations caution. It reflects the real structure of the evidence.

For example, AARO’s balloon resolutions frequently rely on multiple converging indicators:

Morphological similarity
Wind behaviour
Motion profile
Infrared appearance
Comparison with previously resolved cases

The conclusion emerges from accumulated consistency, not from a magical percentage generator. [AARO]aaro.milUAP Records/Information Papers13 Feb 2026 — The official website for the All-domain Anomaly Resolution Office (AARO)… Official UAP Imagery · UAP…

NASA’s study team made a similar point indirectly by emphasising metadata quality, sensor calibration, and multi-sensor collection rather than automated certainty scores. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th…Published: September 13, 2023

Safer wording for AI-assisted UFO case files

A useful UFO investigation system should prioritise calibrated language over theatrical precision.

Better phrasing usually has four features:

It separates evidence quality from explanation quality

Instead of:

“89% likely to be an aircraft”

Prefer:

“The flight-path match is strong, but the visual evidence is limited.”

This distinction prevents readers from confusing a plausible explanation with a proven identification.

Calibration illustration 3

It explains why the model reached its conclusion

Instead of:

“Confidence score: 82”

Prefer:

“The object’s speed, navigation lighting pattern, and ADS-B correlation are consistent with commercial aircraft.”

This lets readers evaluate the reasoning themselves.

It preserves unresolved status when appropriate

Some sightings genuinely remain unresolved because the available evidence is inadequate.

That does not automatically imply anomaly. It simply reflects evidential limits.

Good wording includes formulations such as:

“No explanation currently fits the available evidence cleanly.”
“Insufficient data for definitive classification.”
“The case remains unresolved after standard aviation and astronomy checks.”

It avoids pseudo-scientific precision

Words like:

“possibly”
“consistent with”
“likely”
“strongly consistent”
“weakly supported”
“not currently supported”

are often more truthful than decimal probabilities in messy real-world investigations.

A better model: explanation tiers instead of percentages

One practical alternative is a tiered explanation framework.

Rather than assigning a single percentage, an AI-assisted UFO report can classify explanations using calibrated language bands:

Assessment tierMeaningRuled outStrong contradictory evidenceWeak fitSome overlap but major inconsistenciesPlausibleReasonable match with notable gapsStrongly supportedMultiple lines of evidence alignUnresolvedInsufficient evidence for determinationPotentially anomalousAvailable evidence resists standard explanations

This approach has several advantages:

It reflects investigative reality better
It reduces false certainty
It keeps uncertainty visible
It encourages evidence review rather than score worship
It scales better to mixed-quality reports

Most importantly, it helps readers understand that UFO investigation is usually a process of narrowing possibilities rather than producing absolute answers.

The real goal is trustworthy uncertainty

The strongest AI-assisted UFO systems are not the ones that sound most certain. They are the ones that communicate uncertainty honestly while still helping investigators organise evidence, test explanations, and eliminate mundane causes efficiently.

A poorly calibrated percentage can make a weak case look solved. Careful language can do the opposite: it can show exactly where the evidence is strong, where it is weak, and why a case remains open.

In UFO investigation, that distinction is not cosmetic. It is the difference between analysis and performance.

Endnotes

Source: science.nasa.gov
Title: Science Independent Study Team Report
Link: https://science.nasa.gov/wp-content/uploads/2023/09/uap-independent-study-team-final-report.pdf
Source snippet
NASA ScienceIndependent Study Team ReportSeptember 13, 2023 — At present, analysis of UAP data is hampered by poor sensor calibration, th...

Published: September 13, 2023
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7923594/
Source snippet
PMCStable reliability diagrams for probabilistic classifiers - PMCby T Dimitriadis · 2021 · Cited by 115 — The key diagnostic tool for ch...
Source: arxiv.org
Title: arXiv Metrics of calibration for probabilistic predictions
Link: https://arxiv.org/abs/2205.09680
Source: sciencedirect.com
Title: Science Directphillip e
Link: https://www.sciencedirect.com/science/article/pii/S074959788471034X/pdf?md5=54e001206df2864d0515d0084369a737&pid=1-s2.0-S074959788471034X-main.pdf
Source snippet
pfeiferby PE Pfeifer · 1994 · Cited by 102 — This paper argues that the apparent miscalibration and overconfidence found in empirical cal...
Source: arxiv.org
Title: arXiv When are Bayesian model probabilities overconfident?
Link: https://arxiv.org/abs/2003.04026
Source: meritalk.com
Title: uap reports soar dod office receives 757 new sightings
Link: https://www.meritalk.com/articles/uap-reports-soar-dod-office-receives-757-new-sightings/
Source snippet
UAP Reports Soar: DoD Office Receives 757 New Sightings15 Nov 2024 — “As of the publishing date of this report, all 174 cases have been f...
Source: aaro.mil
Link: https://www.aaro.mil/UAP-Cases/Official-UAP-Imagery/
Source snippet
AAROUAP ImageryAARO bases its assessment on the object's strong morphological consistency with other resolved imagery featuring balloons...
Source: aaro.mil
Title: UAP Records
Link: https://www.aaro.mil/UAP-Records/
Source snippet
/Information Papers13 Feb 2026 — The official website for the All-domain Anomaly Resolution Office (AARO)... Official UAP Imagery · UAP...
Source: nasa.gov
Link: https://www.nasa.gov/
Source snippet
NASANational Aeronautics and Space Administration. NASA explores the unknown in air and space, innovates for the benefit of humanity, and...
Source: science.nasa.gov
Link: https://science.nasa.gov/uap/
Source snippet
nasa.govUAP9 Jun 2022 — The UAP Independent Study shall report on the [following]({{ 'following-moon/' | relative_url }}) questions: What types of scientific data currently collec...
Source: nasa.gov
Title: update nasa shares uap independent study report names director
Link: https://www.nasa.gov/news-release/update-nasa-shares-uap-independent-study-report-names-director/
Source snippet
UPDATE: NASA Shares UAP Independent Study Report14 Sept 2023 — We found that NASA can help the whole-of-government UAP effort through sys...
Source: sciencedirect.com
Title: How does training improve individual forecasts?
Link: https://www.sciencedirect.com/science/article/abs/pii/S0169207024001298
Source snippet
VK Motahhar · 2025 · Cited by 1 — Biases in human forecasters lead to poor calibration. We assess how formal training affe...
Source: aaro.mil
Link: https://www.aaro.mil/
Source snippet
AARO HomeUnidentified Anomalous Phenomena (UAP) means (A) airborne objects that are not immediately identifiable; (B) transmedium objects...
Source: arxiv.org
Link: https://arxiv.org/html/2506.00125v1
Source snippet
1 Introduction30 May 2025 — A system designed for the comprehensive scientific study of aerial phenomena which integrates multiple sensor...

Published: May 2025
Source: arxiv.org
Link: https://arxiv.org/html/2407.03167v1
Source snippet
Tail calibration of probabilistic forecastsIn this work, we introduce a general notion of tail calibration for probabilistic forecasts, w...
Source: jbsa.mil
Title: dod examining unidentified anomalous phenomena
Link: https://www.jbsa.mil/News/News/Article/3966080/dod-examining-unidentified-anomalous-phenomena/
Source snippet
Joint Base San AntonioDOD examining unidentified anomalous phenomena15 Nov 2024 — "AARO has successfully resolved hundreds of cases in it...
Source: nypost.com
Link: https://nypost.com/2024/11/14/us-news/pentagon-says-nearly-two-dozen-ufo-sightings-cant-be-explained-true-anomalies/
Source snippet
The "all-domain anomaly resolution office" (AARO) identified 21 reports as "true anomalies" needing further investigation. Most sightings...
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/NASA
Source snippet
NASAThe National Aeronautics and Space Administration (NASA /ˈnæsə/) is an independent agency of the U.S. federal government responsib...
Source: ore.exeter.ac.uk
Link: https://ore.exeter.ac.uk/ndownloader/files/56827487
Source snippet
Assessment of Forecast Calibrationby H Bashaykh · 2022 · Cited by 2 — the case of the probability forecast of a binary outcome, a reliabi...

Additional References

Source: war.gov
Title: dr jon kosloski director aaro media roundtable on the fy24 consolidated annual
Link: https://www.war.gov/News/Transcripts/Transcript/Article/3965734/dr-jon-kosloski-director-aaro-media-roundtable-on-the-fy24-consolidated-annual/
Source snippet
Jon Kosloski, Director, AARO, Media Roundtable on the...14 Nov 2024 — AARO has successfully resolved hundreds of cases in its holdings t...
Source: reddit.com
Link: https://www.reddit.com/r/UFOs/comments/1gv8xak/aaro_has_resolved_the_go_fast_uap/
Source snippet
AARO has resolved the "Go Fast" UAP: r/UFOsWe surmise based on the information you have access to that they're just balloons. Filled wit...
Source: howtolearnmachinelearning.com
Link: https://howtolearnmachinelearning.com/articles/brier-score/
Source snippet
Brier Score in Machine Learning: Definition and Use CasesThe Brier Score measures the accuracy of probabilistic predictions · Lower score...
Source: medium.com
Link: https://medium.com/data-science/model-calibration-explained-a-visual-guide-with-code-examples-for-beginners-55f368bafe72
Source snippet
Model Calibration | TDS ArchiveThe overconfident model makes extreme predictions (0.0 or 1.0), while the underconfident model stays close...
Source: studocu.com
Link: https://www.studocu.com/en-us/document/harvard-medical-school/estadistica/nasa-uap-independent-study-team-final-report-key-findings-and-recommendations/157385671
Source snippet
NASA UAP Independent Study Team Final ReportAt present, analysis of UAP data is hampered by poor sensor calibration, the. lack of multipl...
Source: reddit.com
Link: https://www.reddit.com/r/UFOs/comments/16ik6x5/summary_of_nasa_unidentified_anomalous_phenomena/
Source snippet
SUMMARY OF NASA UNIDENTIFIED ANOMALOUS...- **Challenges in UAP Data Deciphering**: The efficacy of UAP data analysis is impeded by issue...
Source: neacc.meteoinfo.ru
Link: https://neacc.meteoinfo.ru/training/103-lecture-4-on-forecast-downscaling
Source snippet
4 on forecast downscalingThe reliability diagram plots the observed frequency against the forecast probability, where the range of foreca...
Source: facebook.com
Link: https://www.facebook.com/newshour/posts/the-us-in-2022-launched-the-all-domain-anomaly-resolution-office-aaro-as-part-of/1149122250416353/
Source snippet
The U.S. in 2022 launched the All-Domain Anomaly...✓ AARO has looked into over 800 UAP cases. Most turn out to be explainable (like dron...
Source: eurointervention.pcronline.com
Title: a guide to interpreting and assessing the performance of prediction models
Link: https://eurointervention.pcronline.com/article/a-guide-to-interpreting-and-assessing-the-performance-of-prediction-models
Source snippet
guide to interpreting and assessing the performance of...by V Farooq · Cited by 8 — The Brier score is a quadratic scoring rule based on...
Source: youtube.com
Link: https://www.youtube.com/watch?v=TQcqOW39ksk
Source snippet
Unidentified Anomalous Phenomena Independent Study ReportNASA commissioned an independent study team to examine unidentified anomalous ph...

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

UFO Proggramme Original concert Souvenir Booklet UK Tour 1981

Search eBay.co.uk: UFO memorabilia

Browse similar on eBay.co.uk

Example eBay listing

UFO 1982 Tour Programme Book With Poster

Search eBay.co.uk: UFO memorabilia

Browse similar on eBay.co.uk

Example eBay listing

UFO Programme Michael Schenker Original Official Misdemeanor World Tour 1986

Search eBay.co.uk: UFO memorabilia

Browse similar on eBay.co.uk

Example eBay listing

UFO Concert poster - Glasgow 2019 live music band show tour -- Gig memorabilia.

Search eBay.co.uk: UFO memorabilia

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Why '91 Percent Certain' Can Mislead UFO Readers

Introduction

Why exact percentages sound more reliable than they are

How calibration actually works in AI systems

UFO investigations rarely have clean ground truth

Why percentages create false certainty in public UFO reporting

The strongest UFO investigations already use cautious language

Safer wording for AI-assisted UFO case files

It separates evidence quality from explanation quality

It explains why the model reached its conclusion

It preserves unresolved status when appropriate

It avoids pseudo-scientific precision

A better model: explanation tiers instead of percentages

The real goal is trustworthy uncertainty

Endnotes

Additional References

Further Reading

UFOs

The UFO Experience

UFOs: Generals, Pilots, and Government Officials Go on the Re...

The UFO Experience: A Scientific Inquiry

Marketplace Samples

UFO Proggramme Original concert Souvenir Booklet UK Tour 1981

UFO 1982 Tour Programme Book With Poster

UFO Programme Michael Schenker Original Official Misdemeanor World Tour 1986

UFO Concert poster - Glasgow 2019 live music band show tour -- Gig memorabilia.

Follow this branch

Parent topic

Related pages 4

More on this topic 3