Xenia Methodology

Page URL: https://xeniadata.com/methodology Page version: v1 skeleton (Day 1 draft, 2026-05-13) Audience: Property owners, users, attorneys, partners, advisors Required by: Xenia Legal Framework v1.2 Sections 11.4 (output memorization monitoring), 11.7 (model provenance), 12.2.4 (methodology disclosure), 6.4 (right-to-respond) California AB 2013: Sections 6 and 7 below satisfy the Generative AI Training Data Transparency Act, effective Jan. 1, 2026

1. Plain-English summary

Xenia helps travelers find independent and boutique hotels by combining three kinds of information:

Verified facts about each property (address, room count, amenities) that come from multiple independent sources and that we can confirm.
What the property says about itself — the hotel's own descriptions, claims, and characterizations, always shown as attributed claims, never as Xenia facts.
Xenia insights — patterns and inferences our analytical systems derive from many lawful inputs (review aggregates, structured public data, geographic context). These are clearly labeled as Xenia-derived and come with disclosure of how we built them.

This page explains how each kind of information is gathered, validated, and displayed. It also explains how a property owner can dispute any Xenia insight about their property and what to do if you believe a Xenia output is inaccurate.

2. The three-tier attribute classification

Every data point about a property on Xenia falls into exactly one of three tiers:

Tier 1 — Verified Fact

Objective, measurable, externally verifiable attributes (e.g., the property's name, street address, room count, official star rating from an authoritative body, presence of a pool).

Sourcing standard: Cross-validated across ≥3 independent sources OR sourced from a licensed authoritative feed.
Display standard: May be displayed as an unqualified statement of fact.
Examples: "The property has 36 guest rooms." "The property is located at 40128 Lakeview Drive."

Tier 2 — Self-Reported Attribute

Claims made by the property itself about itself.

Sourcing standard: Sourced from the hotel's own website, the hotel's authenticated submission via the Xenia onboarding portal, or other channels controlled by the property owner.
Display standard: MUST be attributed to the property. Language like "the property describes itself as..." or "the hotel states it offers..." is used. Never presented as an unqualified fact.
Examples: "The property describes itself as 'a boutique mountain retreat.'" "The hotel states that breakfast is included."

Tier 3 — Derived Attribute (Xenia inference)

Insights generated by Xenia's analytical systems from patterns in underlying lawful data.

Sourcing standard: Generated by Xenia's classification systems from a documented input set. Each derived attribute records its input sources, minimum signal threshold, contradicting-signal check, confidence score, model version, and CMI inheritance metadata.
Display standard: MUST be presented as a Xenia-derived classification, accompanied by signal count and confidence. Never presented as an unqualified fact.
Examples: "Xenia insight: Sound quality score 6.2/10. Derived from 247 review signals across 4 sources over 18 months. Confidence: 84%."

3. Where our data comes from

Permitted source categories

Licensed APIs: Cloudbeds (for properties under management), Google Places API (where licensed), Walk Score API, NOAA Weather.
Owner-supplied: Property owner's authenticated submission via Xenia's onboarding portal, with documented license grant.
Public, logged-out web pages: The property's own website, public government records (ADA compliance, fire code, health department), public regulatory filings.
Open data: OpenStreetMap (ODbL), Wikidata (CC0), Schema.org structured data published by property websites.
Xenia's own first-party data: Reviews and signals submitted by booked guests with explicit license grants under Xenia's Terms of Service.

Prohibited sources

Anything behind a login wall (extranet, partner-only portal, members-only forum)
Anything behind a paywall
Pirate libraries and shadow archives
Sources that have issued cease-and-desist to Xenia
Sources that require defeating anti-bot controls to access
Personal data of identifiable individuals (guest names, employee personal data)

How we choose between sources for the same fact

Where multiple lawful sources offer the same fact, Xenia preferences (in order): (1) the property's authenticated submission, (2) a licensed authoritative feed, (3) cross-validation of ≥3 independent public sources. Where sources disagree, the field is held for property attestation rather than published with an arbitrary winner.

4. How we generate Xenia insights (Tier 3)

Each Xenia insight is the output of a documented inference methodology with the following characteristics:

Input data sources are recorded for every insight, including their lawful-acquisition status.
Signal aggregation is statistical, not text-reproductive. We do not store, display, or republish verbatim review text. We aggregate review signals into counts, percentages, scores, and named themes.
Cross-source diversity is required. Xenia insights are never derived from a single source's content; they are derived from multiple lawful inputs so that no single source's market is substituted (see Thomson Reuters v. Ross Intelligence).
Confidence scoring accompanies every insight. Confidence is computed from signal count, source diversity, signal consistency, and recency.
Contradicting-signal check must pass before an insight is published. Where signals contradict, the insight is withheld pending property attestation or additional signals.
Memorization testing (Xenia Legal Framework v1.2 Section 11.4) is run on every inference model before production use. We sample ≥1,000 random inputs, run the model, and verify no output reproduces ≥15 consecutive words from any single input. Results are logged.
Quarterly re-test of memorization on every production model.
Model provenance is documented for every model used: name, version, vendor or training source.

5. How Xenia displays negative-framed information

Where a Xenia insight could be perceived as unfavorable to a property (e.g., a complaint-pattern derivation, a hidden-fee surprise factor), Xenia uses one or more of five permitted framings:

Aggregate-only reporting. Statistics about the underlying dataset, not assertions about the property. "Sound concerns mentioned in 32% of reviews mentioning sound."
Statistic-before-adjective. The number leads; any descriptor is derivative. "Cleanliness score: 7.1/10 (Good)."
Source-of-source linking. Where applicable, Xenia links to the platform where the underlying signals originated rather than republishing them.
Methodology disclosure. Every Xenia insight is accompanied by a link to this methodology page. The claim becomes "Xenia's model produced inference X with this confidence" — a true statement about the inference, not a factual claim about the property.
Property right-to-respond. Every property has a mechanism to dispute any Xenia insight, see the methodology, submit counter-evidence, and have the insight flagged or revised. See Section 8 below.

The following framings are never used by Xenia regardless of disclaimer:

Discrete factual assertions about a named property based on review aggregation. We never say "this hotel has thin walls."
Comparative ranking statements that disparage one property by elevating another.
Verbatim quotation of negative review text.
Published claims about individual identifiable staff members.

6. AI training and inference disclosure (California AB 2013 compliance)

Effective January 1, 2026, California Assembly Bill 2013 requires generative AI systems made available to California residents to publish a high-level summary of training datasets. The following is Xenia's disclosure:

6.1 Datasets used to train or develop Xenia's inference models

Dataset / source family	Owner	Purpose	Copyrighted material included?	License status	Time period	Approximate size
(To be completed before Scope C — Xenia's first-party guest reviews)	Xenia (under Xenia ToS)	Tier 3 inference training	No third-party copyrighted material	Xenia ToS license grant from booked guests	2026-05-13 onward	Building incrementally
(To be completed before Scope C — Hotel-owner attestations)	Xenia (under owner-submission license)	Tier 2 attribution display	No third-party copyrighted material	Owner license grant	2026-05-13 onward	Building incrementally
(To be completed before Scope C — Public-domain government records, OSM, Wikidata)	Various governmental and open-data providers	Tier 1 fact validation	Public-domain or open-license only (CC0, CC BY-SA, ODbL)	Public domain or open license	Varies by source	Varies
(Additional rows added as production sources are documented and licensed)

6.2 Update cadence

This disclosure is reviewed quarterly and republished annually. Material changes (a new training data source added, a model retrained on a new dataset, etc.) trigger an out-of-cycle update.

6.3 What Xenia does NOT use for training

Verbatim review text scraped from third-party platforms (Booking, TripAdvisor, Yelp, Expedia, etc.)
Any content from a pirate library or shadow archive
Content obtained by defeating any access control
Personal data of identifiable individuals

7. FTC Consumer Review Rule compliance (16 CFR Part 465)

Effective October 21, 2024, the FTC's Consumer Review and Testimonials Rule prohibits certain practices around consumer reviews. Xenia complies as follows:

No fake reviews. Xenia does not generate, commission, or display reviews that misrepresent the reviewer or the reviewed experience.
No suppression of legitimate negative reviews. Xenia's property right-to-respond mechanism is a process for dispute resolution and attribution, not a mechanism for suppression. Properties may not require Xenia to remove a Xenia insight as a condition of doing business with Xenia. Any contractual provision purporting to require Xenia to suppress lawful negative content is unenforceable and will be ignored.
Disclosure of material connections. Where a property has a material connection to Xenia (e.g., owner submission, partnership), the connection is disclosed alongside any property-supplied content.

8. Property right-to-respond mechanism

Every property covered by a Xenia derived attribute (Tier 3) has access to a property dashboard at https://xeniadata.com/property/[property_id]/manage (live in Scope B onward). The dashboard:

Displays every Xenia insight published about the property
Shows the methodology and signal sources for each insight
Allows the property to submit a dispute or counter-evidence
Tracks the dispute through a documented workflow with response within 10 business days

Unresolved disputes cause the disputed insight to be flagged on the public listing pending review. See the Right-to-Respond protocol (linked below).

For Scope A (internal MVP, Cedarwood properties only): no Tier 3 inferences are published; the right-to-respond mechanism is staged for Scope B onward.

9. Memorization testing and output filters

For every Tier 3 inference model in production:

Pre-production memorization test: 1,000 random inputs sampled; outputs scanned for verbatim or near-verbatim reproduction; threshold ≥15 consecutive words from any input triggers retraining or filter application.
Production output filter: N-gram match against input sources, edit-distance check, sentence-structure similarity check. Outputs that exceed thresholds are blocked from publication.
Quarterly re-test of memorization on production models.
Incident logging per Xenia Legal Framework v1.2 Section 7.1.3.

10. Audit and governance

Source access logs: Every URL or API endpoint we access is logged, with timestamp, user-agent, response code.
Data lineage logs: Every published data point traces to its sources, parsers, validation methods, and CMI metadata.
Compliance check logs: Every data action is preceded by a nine-question compliance check (Xenia Legal Framework v1.2 Section 8.1).
Cease-and-desist logs: All requests to cease access are honored and retained for 7 years.
Lawful acquisition logs: Documented basis for each source's lawful-acquisition status, retained for the lifetime of any derived data plus 7 years.

11. Independent review

Xenia's compliance framework was prepared in collaboration with research-driven legal analysis (IP, media-defamation, insurance, and engineering domains). It has not yet been signed off by a California-licensed attorney. Counsel sign-off is scheduled for the pre-commercial-launch phase. The framework is published openly to allow scrutiny and improvement.

12. Plain-English contacts

Question	Email
I'm a property owner and want to dispute a Xenia insight	dispute@xeniadata.com
I'm a user and the information about a property is wrong	feedback@xeniadata.com
I'm a webmaster and want to control Xenia's crawl on my domain	crawler@xeniadata.com
I'm filing a DMCA copyright notice	dmca@xeniadata.com
I'm a journalist or attorney with a methodology question	press@xeniadata.com

13. Skeleton status

This is a Day 1 skeleton draft. Sections marked "(To be completed before Scope C...)" will be populated as production sources are documented, licenses confirmed, and the right-to-respond mechanism reaches Scope B operational status. The methodology page becomes fully populated before any commercial public publication.

Last updated

2026-05-13.