How Certain Can a UFO Explanation Be?

Introduction

AI confidence scoring in a UFO sighting investigation should not be a machine-generated verdict. Its purpose is to show how well each candidate explanation fits the evidence, where the evidence is thin, and what would be needed to move the case from “unresolved” towards “plausibly explained” or “genuinely anomalous”. A useful score says, in plain language, “this looks like an aircraft because the timing and path match, but the visual record is too weak to be certain,” not “AI is 92% sure it was a plane.”

Overview image for Confidence That caution matters because UAP investigation is often limited by incomplete observation: short video clips, missing camera metadata, uncertain viewing direction, absent range data, and single-witness accounts. NASA’s UAP independent study stressed that AI and machine learning are promising only when the underlying UAP data are collected to rigorous standards, with reliable calibration and metadata. AARO’s published case material likewise shows a range of outcomes: some reports are resolved as balloons or birds, while others remain unresolved because the footage is insufficient for a determination. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — For any scientific analysis purposes, including UAP analysis, it is essenti…Published: September 13, 2023 [AARO]aaro.milOfficial UAP ImageryAAROUAP ImageryIn 2024, the United States Africa Command submitted a report of an unidentified anomalous phenomenon to the All-domain Ano…

A confidence score is not a truth machine

In an AI-assisted UFO case file, “confidence” should mean confidence in a specific explanation under stated evidence conditions. It should not mean confidence that the object’s true nature is known in any absolute sense. A model may be good at ranking ordinary explanations, such as aircraft, balloons, satellites, drones, birds, meteors, lens flare, or atmospheric optics, but still be unable to decide between them when key fields are missing.

A better structure is to score each candidate explanation separately:

Candidate explanationFit score should reflectWhat can lower confidenceAircraft or helicopterTrack, timing, bearing, altitude, lighting pattern, sound, ADS-B or radar matchNon-broadcast aircraft, wrong bearing, poor timestamp, no range estimateDroneLow altitude, local movement, hovering, manoeuvres, nearby launch areaNo operator data, weak distance estimate, ambiguous scaleBalloonWind direction, slow drift, shape, altitude, sunlight, repeated resolved casesUnknown wind layer, no depth cue, object too brief or distantSatellite or StarlinkPredicted pass, sky position, speed, line formation, time after sunsetCloud cover, wrong azimuth, no stable location/timeAstronomical objectMoon, Venus, Jupiter, bright stars, meteor direction and timingWitness reports of rapid manoeuvres, poor horizon/elevation estimateCamera or sensor artefactZoom, focus, compression, infrared effects, reflection, rolling shutterMultiple independent sensors, corroborating witnesses, stable external trackUnresolved or anomalousOrdinary explanations do not fit available dataSparse evidence may prevent both explanation and anomaly claims

This approach mirrors the practical reality of public UAP work. AARO’s official imagery page includes resolved examples such as balloons and migratory birds, but also cases marked unresolved or undergoing analysis where available imagery is not enough to identify the object. The key lesson is not that every unresolved case is extraordinary; it is that some records do not contain enough information to close the case responsibly. [AARO]aaro.milOpen source on aaro.mil.

Candidate explanations and fit scores

A good scoring system begins with competing explanations, not a single “UFO confidence” number. The phrase “UFO” simply means the object is not yet identified; it does not identify what the object is. Scoring should therefore compare how each ordinary and unusual explanation fits the case file.

For example, a phone video of a bright light moving slowly across the western sky at dusk might receive a strong satellite fit if the time, location, direction of travel, elevation, and predicted satellite pass align. It might receive a weaker aircraft fit if there is a nearby flight track but the angular motion or lighting does not match. It might remain unresolved if the timestamp is wrong, the witness location is approximate, or the camera view lacks landmarks.

The score should be built from explainable components rather than hidden model intuition. A practical UFO investigation score can combine:

Temporal fit: Did the candidate object exist at the reported date and time?
Geospatial fit: Was it in the right part of the sky from the observer’s location?
Motion fit: Did its apparent direction, speed, and behaviour match the report?
Appearance fit: Did brightness, colour, shape, flashing, formation, or infrared signature match?
Environmental fit: Did weather, cloud, wind, visibility, or atmospheric conditions support the explanation?
Evidence fit: Is the match based on primary data, a weak inference, or a vague witness description?
Contradiction weight: Which parts of the report actively argue against the explanation?

This matters because two explanations can both be plausible for different reasons. A balloon may fit the slow drift and lack of sound, while an aircraft may fit a radar or ADS-B track. A single case status should not erase that tension. Instead, the case file should preserve the competing fit scores and explain why one is stronger, weaker, ruled out, or still unresolved.

AI can help by keeping the comparison consistent across cases. It can retrieve aircraft tracks, satellite predictions, weather observations, astronomy data, and historical examples, then highlight which candidate explanation best matches the known facts. But the model should not be allowed to convert “best available match” into “confirmed identification” unless the evidence meets a clear threshold.

Confidence illustration 1

Evidence quality changes the meaning of every score

The same fit score means different things depending on the evidence behind it. A “high” aircraft score based on a confirmed time, exact location, visible landmarks, matching flight track, and consistent direction of travel is much stronger than a high score based on a witness saying “around 9 pm somewhere west of town”.

That is why confidence scoring should include a separate evidence-quality grade. NASA’s UAP report argued that future analysis depends on better data acquisition, sensor calibration, multiple measurements, and thorough metadata rather than relying on poorly characterised observations. The same principle applies to a public-facing AI workflow: a model cannot recover missing range, lens settings, bearing, or true object size from a vague clip simply by sounding analytical. [NASA Science]science.nasa.govScience Independent Study Team ReportNASA ScienceIndependent Study Team ReportSeptember 13, 2023 — For any scientific analysis purposes, including UAP analysis, it is essenti…Published: September 13, 2023

A simple evidence-quality layer might use categories such as:

Strong evidence: original file preserved, accurate timestamp, precise location, camera metadata, landmarks, weather record, independent corroboration, and a matching external data source.
Moderate evidence: credible witness account and useful video or photo, but with some missing metadata or uncertain direction.
Weak evidence: short clip, social media repost, cropped file, approximate time, no location precision, no independent checks.
Insufficient evidence: report too vague to test, no original media, no reliable time or place, or contradictory basic details.

This separate grade prevents a common failure: giving a confident-looking explanation from weak inputs. A model might find that a known satellite pass occurred within 20 minutes of the claimed time, but if the witness time is approximate and the viewing direction is unknown, the system should say “possible satellite match, weak evidence” rather than “satellite confirmed”.

AARO’s public reporting illustrates the same distinction. In its 2024 reporting cycle, the office received hundreds of UAP reports, resolved some as prosaic objects, and left many under review or unresolved; public summaries emphasised that unresolved does not mean extraterrestrial, and that insufficient or non-actionable sensor data constrains resolution. [U.S. Department of War]media.defense.govDOPSR 2024 0263 AARO HISTORICAL RECORD REPORT VOLUME 1 2024DOPSR 2024 0263 AARO HISTORICAL RECORD REPORT VOLUME 1 2024

Avoiding confident language from weak inputs

The biggest risk in AI confidence scoring is false precision. A percentage can look scientific even when it is only a dressed-up guess. In UFO investigation, that is especially dangerous because the public may read “87% likely drone” as a factual determination, while a believer may read “13% anomalous” as evidence of something extraordinary. Both interpretations can be wrong if the underlying data are poor.

Machine-learning research treats calibration as the relationship between a system’s stated confidence and how often it is actually correct. A well-calibrated model that says “70% confidence” should be right about seven times out of ten across comparable cases. But calibration is hard, and modern AI systems, including large language models, can be overconfident or poorly aligned with real-world accuracy. NIST’s AI Risk Management Framework emphasises the need to manage AI risks in context, while AI confidence research highlights calibration as a central reliability problem rather than a cosmetic interface choice. NIST [IBM]ibm.comSource details in endnotes.

For UFO casework, this means the interface should prefer calibrated language over naked numbers. Instead of saying:

“AI confidence: 91% balloon.”

A more honest output would be:

“Balloon is the strongest current explanation. The object’s slow drift and apparent shape fit, and wind direction is broadly consistent. Confidence is moderate, not high, because the video is short, range is unknown, and no independent balloon track is available.”

That wording gives the reader the useful part of the score — which explanation fits best and why — while avoiding the illusion that the AI has measured certainty directly. Google’s People + AI guidance makes a similar design point: showing confidence can affect user decisions, so confidence indicators should help people calibrate trust rather than over-trust an automated output. [pair.withgoogle.com]pair.withgoogle.comExplainability + TrustExplainability + Trust

Confidence illustration 2

Status labels are safer than verdict labels

A practical UFO confidence system should use status labels that reflect evidence handling, not dramatic conclusions. The most useful labels are usually:

StatusMeaning in a case fileWhat it should not implyRuled outA candidate explanation conflicts with strong evidenceThat all ordinary explanations are ruled outPlausibleThe explanation fits several important facts but is not confirmedThat the case is solvedWeakThere is some fit, but the evidence or match is poorThat the explanation is impossibleUnresolvedCurrent evidence cannot support a firm determinationThat the object is extraordinaryAnomalousThe observation remains unusual after strong ordinary checksThat it is extraterrestrial or technologically advanced

“Anomalous” should be a demanding label. It should require not just a lack of identification, but a positive reason the event remains unusual after strong checks: reliable timestamp, known location, good viewing geometry, preserved original data, independent corroboration, and documented mismatch with ordinary explanations. A short social media clip with no metadata should usually be “insufficient” or “unresolved”, not anomalous.

This distinction is consistent with the most cautious institutional language. AARO describes its work as a data-driven effort to resolve UAP reports and publicly lists examples that are resolved, unresolved, closed as not anomalous, or still undergoing analysis. Its historical reporting has also argued that resolved cases to date have ordinary explanations, while many unsolved cases remain limited by evidence quality rather than by confirmed extraordinary features. [AARO]aaro.milCongressional Press ProductsCongressional Press Products

What the AI should show the reader

The best confidence display is not a single score at the top of the page. It is a short reasoning panel that makes the uncertainty legible. A public-facing case file should show the current leading explanation, the main alternatives, the evidence quality, and the missing data that would change the assessment.

A reader-friendly scoring panel might include:

Current leading explanation: “Possible aircraft.”
Confidence band: “Moderate.”
Why it fits: “A tracked aircraft passed near the reported bearing within two minutes of the stated time.”
Why it is not confirmed: “The witness location is approximate, and the video lacks stable landmarks.”
Competing explanations: “Drone weak; satellite weak; balloon unresolved pending wind-layer check.”
Evidence quality: “Moderate: original video available, but camera metadata incomplete.”
What would improve the case: “Exact observer location, original file metadata, direction of camera view, and independent witness or sensor record.”

This format gives AI a useful role: it organises reasoning. It does not ask the reader to trust the machine’s tone. It also helps investigators avoid premature closure. If the system says “aircraft plausible but not confirmed”, later evidence can still improve or overturn that assessment.

The confidence score should punish missing data

A scoring system that only rewards matching clues will overstate certainty. It also needs to penalise missing fields that are essential for identification. A bright object seen “somewhere over the hills” cannot be scored as strongly as an object filmed from a known address, with visible skyline reference points, at a verified timestamp, in a known camera direction.

For UFO investigation, the most important missing-data penalties are usually:

No exact time: weakens aircraft, satellite, astronomy, meteor, and launch checks.
No precise location: weakens bearing, elevation, and geospatial matching.
No viewing direction: makes sky-position matches fragile.
No original media: prevents metadata, compression, edit, and sensor checks.
No landmarks: makes angular motion and horizon position hard to estimate.
No duration: weakens comparison with satellites, aircraft, meteors, and drones.
No weather context: weakens balloon, atmospheric optics, cloud, and visibility analysis.
No independent corroboration: raises the risk of misperception, artefact, or hoax.

This is where AI can be particularly useful at intake. Instead of only producing a score after the fact, it can ask for the missing details that would most improve the case: “Was the object above the Moon or below it?”, “Which way were you facing?”, “Can you upload the original file rather than a screen recording?”, “Did it move against any fixed building, tree, or star?”

Confidence illustration 3

Why self-reported AI confidence is not enough

A large language model can be asked, “How confident are you?” and it may produce a neat percentage. That number should not be treated as a validated probability. It may reflect the model’s wording habits, training examples, or prompt style rather than tested accuracy on UFO sighting data.

Technical work on confidence estimation separates raw model output from calibrated confidence. Calibration usually requires testing predictions against known outcomes, measuring where the model is overconfident or underconfident, and adjusting how scores are reported. Recent research on large language models continues to treat confidence calibration as an open problem, especially where systems generate fluent explanations that may sound more certain than the evidence allows. [arXiv]arxiv.orgSource details in endnotes. [Harvard Data Science Review]hdsr.mitpress.mit.eduSource details in endnotes.

For UFO casework, that means a responsible workflow should not rely on the model’s self-assessment alone. It should derive confidence from auditable signals: source quality, number of independent matches, precision of time and location, strength of contradiction, calibration against past resolved cases, and human review. The AI may draft the explanation, but the score should come from the case evidence and the system’s tested performance.

How scoring can support, rather than replace, judgement

The aim is not to make UFO investigation less human. It is to make the reasoning more explicit. Witnesses deserve to have their reports handled carefully; readers deserve to know which claims are confirmed, which are inferred, and which are unknown. Investigators deserve tools that reduce repetitive screening without creating a new layer of artificial certainty.

A good confidence system therefore has three jobs. First, it ranks candidate explanations so ordinary causes can be checked quickly. Second, it exposes the quality and gaps in the evidence. Third, it keeps the final case status modest: ruled out, plausible, weak, unresolved, or anomalous only when the evidence justifies that label.

The strongest version of AI-assisted UFO investigation is not a machine that announces “solved”. It is a case file that can say, clearly and defensibly: “This explanation fits these facts, fails on these points, depends on these assumptions, and should change if this missing evidence appears.”

Endnotes

Source: science.nasa.gov
Title: Science Independent Study Team Report
Link: https://science.nasa.gov/wp-content/uploads/2023/09/uap-independent-study-team-final-report.pdf
Source snippet
NASA ScienceIndependent Study Team ReportSeptember 13, 2023 — For any scientific analysis purposes, including UAP analysis, it is essenti...

Published: September 13, 2023
Source: aaro.mil
Title: Official UAP Imagery
Link: https://www.aaro.mil/UAP-Cases/Official-UAP-Imagery/
Source snippet
AAROUAP ImageryIn 2024, the United States Africa Command submitted a report of an unidentified anomalous phenomenon to the All-domain Ano...
Source: nasa.gov
Title: update nasa shares uap independent study report names director
Link: https://www.nasa.gov/news-release/update-nasa-shares-uap-independent-study-report-names-director/
Source snippet
NASAUPDATE: NASA Shares UAP Independent Study Report14 Sept 2023 — We found that NASA can help the whole-of-government UAP effort through...
Source: war.gov
Title: department of defense releases the annual report on unidentified anomalous phen
Link: https://www.war.gov/News/Releases/Release/Article/3964824/department-of-defense-releases-the-annual-report-on-unidentified-anomalous-phen/
Source snippet
Department of WarDepartment of Defense Releases the Annual Report on...14 Nov 2024 — This year's UAP report covers UAP reports from May...
Source: nist.gov
Title: ai risk management framework
Link: https://www.nist.gov/itl/ai-risk-management-framework
Source: ibm.com
Link: https://www.ibm.com/think/topics/uncertainty-quantification
Source: pair.withgoogle.com
Title: Explainability + Trust
Link: https://pair.withgoogle.com/chapter/explainability-trust/
Source: aaro.mil
Link: https://www.aaro.mil/
Source: arxiv.org
Link: https://arxiv.org/html/2404.04689v1
Source: aaro.mil
Title: Congressional Press Products
Link: https://www.aaro.mil/Congressional-Press-Products/
Source: aaro.mil
Title: UNCLASSIFIED FY23 Consolidated Annual Report on UAP Oct 25 2023 1236
Link: https://www.aaro.mil/Portals/136/PDFs/UNCLASSIFIED-FY23_Consolidated_Annual_Report_on_UAP-Oct_25_2023_1236.pdf
Source: arxiv.org
Link: https://arxiv.org/pdf/2403.15368
Source: arxiv.org
Link: https://arxiv.org/html/2402.07632v4
Source: science.nasa.gov
Link: https://science.nasa.gov/uap/
Source: science.nasa.gov
Link: https://science.nasa.gov/wp-content/uploads/2023/04/NASASMDAIWorkshop21spreads1.pdf
Source: pair.withgoogle.com
Title: People + AI Guidebook
Link: https://pair.withgoogle.com/guidebook/
Source: nvlpubs.nist.gov
Title: AI.600 1
Link: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
Source: war.gov
Title: dod examining unidentified anomalous phenomena
Link: https://www.war.gov/News/News-Stories/Article/Article/3965403/dod-examining-unidentified-anomalous-phenomena/
Source: codelabs.developers.google.com
Title: pair guidebook
Link: https://codelabs.developers.google.com/codelabs/pair-guidebook
Source: design.google
Link: https://design.google/library/people-ai-research
Source: media.defense.gov
Title: DOPSR 2024 0263 AARO HISTORICAL RECORD REPORT VOLUME 1 2024
Link: https://media.defense.gov/2024/Mar/08/2003409233/-1/-1/0/DOPSR-2024-0263-AARO-HISTORICAL-RECORD-REPORT-VOLUME-1-2024.PDF
Source: hdsr.mitpress.mit.edu
Link: https://hdsr.mitpress.mit.edu/pub/jaqt0vpb

Additional References

Source: apnews.com
Link: https://apnews.com/article/5638be273b753253713a478546849e46
Source snippet
The report includes many misidentified objects such as balloons, birds, and satellites, though some cases remain unexplained due to insuf...
Source: youtube.com
Title: Calibrated Uncertainty: Why AI Needs to Know When It Doesn’t Know
Link: https://www.youtube.com/watch?v=kYJ_93oY7j0
Source snippet
Introduction to Conformal Prediction for Reliable Machine Learning...
Source: youtube.com
Link: https://www.youtube.com/watch?v=q6g4hT-m3o8
Source snippet
Handling Uncertainty in Machine Learning Models...
Source: facebook.com
Link: https://www.facebook.com/wired/posts/new-a-report-released-today-by-nasas-independent-study-team-describes-how-the-ag/695732782422317/
Source: aclanthology.org
Link: https://aclanthology.org/2024.naacl-long.366.pdf
Source: scribd.com
Link: https://www.scribd.com/document/428911586/People-AI-Guidebook-All-Chapters
Source: linkedin.com
Link: https://www.linkedin.com/posts/iamvishalkhare_hot-take-that-shouldnt-be-hot-asking-an-activity-7440308999717109760-pxsu
Source: medium.com
Link: https://medium.com/%40georgekar91/measuring-confidence-in-llm-responses-e7df525c283f
Source: managingexpectations.net
Link: https://managingexpectations.net/blog/articles/nasa-uap-study-managing-expectations.html
Source: modelop.com
Link: https://www.modelop.com/ai-governance/ai-regulations-standards/nist-ai-rmf

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

The X Files Ufo Minimal Movie Art Print | Film Poster | Canvas & Framed

Search eBay.co.uk: UFO poster

Browse similar on eBay.co.uk

Example eBay listing

I Want To Believe UFO Poster Giclée Fine Art Heavyweight Print

Search eBay.co.uk: UFO poster

Browse similar on eBay.co.uk

Example eBay listing

I Want To Believe Ufo Television Series - Canvas - Framed or Poster Available

Search eBay.co.uk: UFO poster

Browse similar on eBay.co.uk

Example eBay listing

VINTAGE UFO FLYING SAUCERS COMIC ADVERTISING A2 POSTER PRINT

Search eBay.co.uk: UFO poster

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

How Certain Can a UFO Explanation Be?

Introduction

A confidence score is not a truth machine

Candidate explanations and fit scores

Evidence quality changes the meaning of every score

Avoiding confident language from weak inputs

Status labels are safer than verdict labels

What the AI should show the reader

The confidence score should punish missing data

Why self-reported AI confidence is not enough

How scoring can support, rather than replace, judgement

Endnotes

Additional References

Further Reading

The UFO Experience

UFOs

Thinking, Fast and Slow

The Demon-Haunted World

Marketplace Samples

The X Files Ufo Minimal Movie Art Print | Film Poster | Canvas & Framed

I Want To Believe UFO Poster Giclée Fine Art Heavyweight Print

I Want To Believe Ufo Television Series - Canvas - Framed or Poster Available

VINTAGE UFO FLYING SAUCERS COMIC ADVERTISING A2 POSTER PRINT

Follow this branch

Parent topic

Related pages 9

More on this topic 5