Xenia Methodology
Page URL: https://xeniadata.com/methodology
Page version: v1 skeleton (Day 1 draft, 2026-05-13)
Audience: Property owners, users, attorneys, partners, advisors
Required by: Xenia Legal Framework v1.2 Sections 11.4 (output memorization monitoring), 11.7 (model provenance), 12.2.4 (methodology disclosure), 6.4 (right-to-respond)
California AB 2013: Sections 6 and 7 below satisfy the Generative AI Training Data Transparency Act, effective Jan. 1, 2026
1. Plain-English summary
Xenia helps travelers find independent and boutique hotels by combining three kinds of information:
- Verified facts about each property (address, room count, amenities) that come from multiple independent sources and that we can confirm.
- What the property says about itself — the hotel's own descriptions, claims, and characterizations, always shown as attributed claims, never as Xenia facts.
- Xenia insights — patterns and inferences our analytical systems derive from many lawful inputs (review aggregates, structured public data, geographic context). These are clearly labeled as Xenia-derived and come with disclosure of how we built them.
This page explains how each kind of information is gathered, validated, and displayed. It also explains how a property owner can dispute any Xenia insight about their property and what to do if you believe a Xenia output is inaccurate.
2. The three-tier attribute classification
Every data point about a property on Xenia falls into exactly one of three tiers:
Tier 1 — Verified Fact
Objective, measurable, externally verifiable attributes (e.g., the property's name, street address, room count, official star rating from an authoritative body, presence of a pool).
- Sourcing standard: Cross-validated across ≥3 independent sources OR sourced from a licensed authoritative feed.
- Display standard: May be displayed as an unqualified statement of fact.
- Examples: "The property has 36 guest rooms." "The property is located at 40128 Lakeview Drive."
Tier 2 — Self-Reported Attribute
Claims made by the property itself about itself.
- Sourcing standard: Sourced from the hotel's own website, the hotel's authenticated submission via the Xenia onboarding portal, or other channels controlled by the property owner.
- Display standard: MUST be attributed to the property. Language like "the property describes itself as..." or "the hotel states it offers..." is used. Never presented as an unqualified fact.
- Examples: "The property describes itself as 'a boutique mountain retreat.'" "The hotel states that breakfast is included."
Tier 3 — Derived Attribute (Xenia inference)
Insights generated by Xenia's analytical systems from patterns in underlying lawful data.
- Sourcing standard: Generated by Xenia's classification systems from a documented input set. Each derived attribute records its input sources, minimum signal threshold, contradicting-signal check, confidence score, model version, and CMI inheritance metadata.
- Display standard: MUST be presented as a Xenia-derived classification, accompanied by signal count and confidence. Never presented as an unqualified fact.
- Examples: "Xenia insight: Sound quality score 6.2/10. Derived from 247 review signals across 4 sources over 18 months. Confidence: 84%."
3. Where our data comes from
Permitted source categories
- Licensed APIs: Cloudbeds (for properties under management), Google Places API (where licensed), Walk Score API, NOAA Weather.
- Owner-supplied: Property owner's authenticated submission via Xenia's onboarding portal, with documented license grant.
- Public, logged-out web pages: The property's own website, public government records (ADA compliance, fire code, health department), public regulatory filings.
- Open data: OpenStreetMap (ODbL), Wikidata (CC0), Schema.org structured data published by property websites.
- Xenia's own first-party data: Reviews and signals submitted by booked guests with explicit license grants under Xenia's Terms of Service.
Prohibited sources
- Anything behind a login wall (extranet, partner-only portal, members-only forum)
- Anything behind a paywall
- Pirate libraries and shadow archives
- Sources that have issued cease-and-desist to Xenia
- Sources that require defeating anti-bot controls to access
- Personal data of identifiable individuals (guest names, employee personal data)
How we choose between sources for the same fact
Where multiple lawful sources offer the same fact, Xenia preferences (in order): (1) the property's authenticated submission, (2) a licensed authoritative feed, (3) cross-validation of ≥3 independent public sources. Where sources disagree, the field is held for property attestation rather than published with an arbitrary winner.
4. How we generate Xenia insights (Tier 3)
Each Xenia insight is the output of a documented inference methodology with the following characteristics:
- Input data sources are recorded for every insight, including their lawful-acquisition status.
- Signal aggregation is statistical, not text-reproductive. We do not store, display, or republish verbatim review text. We aggregate review signals into counts, percentages, scores, and named themes.
- Cross-source diversity is required. Xenia insights are never derived from a single source's content; they are derived from multiple lawful inputs so that no single source's market is substituted (see Thomson Reuters v. Ross Intelligence).
- Confidence scoring accompanies every insight. Confidence is computed from signal count, source diversity, signal consistency, and recency.
- Contradicting-signal check must pass before an insight is published. Where signals contradict, the insight is withheld pending property attestation or additional signals.
- Memorization testing (Xenia Legal Framework v1.2 Section 11.4) is run on every inference model before production use. We sample ≥1,000 random inputs, run the model, and verify no output reproduces ≥15 consecutive words from any single input. Results are logged.
- Quarterly re-test of memorization on every production model.
- Model provenance is documented for every model used: name, version, vendor or training source.
5. How Xenia displays negative-framed information
Where a Xenia insight could be perceived as unfavorable to a property (e.g., a complaint-pattern derivation, a hidden-fee surprise factor), Xenia uses one or more of five permitted framings:
- Aggregate-only reporting. Statistics about the underlying dataset, not assertions about the property. "Sound concerns mentioned in 32% of reviews mentioning sound."
- Statistic-before-adjective. The number leads; any descriptor is derivative. "Cleanliness score: 7.1/10 (Good)."
- Source-of-source linking. Where applicable, Xenia links to the platform where the underlying signals originated rather than republishing them.
- Methodology disclosure. Every Xenia insight is accompanied by a link to this methodology page. The claim becomes "Xenia's model produced inference X with this confidence" — a true statement about the inference, not a factual claim about the property.
- Property right-to-respond. Every property has a mechanism to dispute any Xenia insight, see the methodology, submit counter-evidence, and have the insight flagged or revised. See Section 8 below.
The following framings are never used by Xenia regardless of disclaimer:
- Discrete factual assertions about a named property based on review aggregation. We never say "this hotel has thin walls."
- Comparative ranking statements that disparage one property by elevating another.
- Verbatim quotation of negative review text.
- Published claims about individual identifiable staff members.
6. AI training and inference disclosure (California AB 2013 compliance)
Effective January 1, 2026, California Assembly Bill 2013 requires generative AI systems made available to California residents to publish a high-level summary of training datasets. The following is Xenia's disclosure:
6.1 Datasets used to train or develop Xenia's inference models
| Dataset / source family | Owner | Purpose | Copyrighted material included? | License status | Time period | Approximate size |
|---|---|---|---|---|---|---|
| (To be completed before Scope C — Xenia's first-party guest reviews) | Xenia (under Xenia ToS) | Tier 3 inference training | No third-party copyrighted material | Xenia ToS license grant from booked guests | 2026-05-13 onward | Building incrementally |
| (To be completed before Scope C — Hotel-owner attestations) | Xenia (under owner-submission license) | Tier 2 attribution display | No third-party copyrighted material | Owner license grant | 2026-05-13 onward | Building incrementally |
| (To be completed before Scope C — Public-domain government records, OSM, Wikidata) | Various governmental and open-data providers | Tier 1 fact validation | Public-domain or open-license only (CC0, CC BY-SA, ODbL) | Public domain or open license | Varies by source | Varies |
| (Additional rows added as production sources are documented and licensed) |
6.2 Update cadence
This disclosure is reviewed quarterly and republished annually. Material changes (a new training data source added, a model retrained on a new dataset, etc.) trigger an out-of-cycle update.
6.3 What Xenia does NOT use for training
- Verbatim review text scraped from third-party platforms (Booking, TripAdvisor, Yelp, Expedia, etc.)
- Any content from a pirate library or shadow archive
- Content obtained by defeating any access control
- Personal data of identifiable individuals
7. FTC Consumer Review Rule compliance (16 CFR Part 465)
Effective October 21, 2024, the FTC's Consumer Review and Testimonials Rule prohibits certain practices around consumer reviews. Xenia complies as follows:
- No fake reviews. Xenia does not generate, commission, or display reviews that misrepresent the reviewer or the reviewed experience.
- No suppression of legitimate negative reviews. Xenia's property right-to-respond mechanism is a process for dispute resolution and attribution, not a mechanism for suppression. Properties may not require Xenia to remove a Xenia insight as a condition of doing business with Xenia. Any contractual provision purporting to require Xenia to suppress lawful negative content is unenforceable and will be ignored.
- Disclosure of material connections. Where a property has a material connection to Xenia (e.g., owner submission, partnership), the connection is disclosed alongside any property-supplied content.
8. Property right-to-respond mechanism
Every property covered by a Xenia derived attribute (Tier 3) has access to a property dashboard at https://xeniadata.com/property/[property_id]/manage (live in Scope B onward). The dashboard:
- Displays every Xenia insight published about the property
- Shows the methodology and signal sources for each insight
- Allows the property to submit a dispute or counter-evidence
- Tracks the dispute through a documented workflow with response within 10 business days
Unresolved disputes cause the disputed insight to be flagged on the public listing pending review. See the Right-to-Respond protocol (linked below).
For Scope A (internal MVP, Cedarwood properties only): no Tier 3 inferences are published; the right-to-respond mechanism is staged for Scope B onward.
9. Memorization testing and output filters
For every Tier 3 inference model in production:
- Pre-production memorization test: 1,000 random inputs sampled; outputs scanned for verbatim or near-verbatim reproduction; threshold ≥15 consecutive words from any input triggers retraining or filter application.
- Production output filter: N-gram match against input sources, edit-distance check, sentence-structure similarity check. Outputs that exceed thresholds are blocked from publication.
- Quarterly re-test of memorization on production models.
- Incident logging per Xenia Legal Framework v1.2 Section 7.1.3.
10. Audit and governance
- Source access logs: Every URL or API endpoint we access is logged, with timestamp, user-agent, response code.
- Data lineage logs: Every published data point traces to its sources, parsers, validation methods, and CMI metadata.
- Compliance check logs: Every data action is preceded by a nine-question compliance check (Xenia Legal Framework v1.2 Section 8.1).
- Cease-and-desist logs: All requests to cease access are honored and retained for 7 years.
- Lawful acquisition logs: Documented basis for each source's lawful-acquisition status, retained for the lifetime of any derived data plus 7 years.
11. Independent review
Xenia's compliance framework was prepared in collaboration with research-driven legal analysis (IP, media-defamation, insurance, and engineering domains). It has not yet been signed off by a California-licensed attorney. Counsel sign-off is scheduled for the pre-commercial-launch phase. The framework is published openly to allow scrutiny and improvement.
12. Plain-English contacts
| Question | |
|---|---|
| I'm a property owner and want to dispute a Xenia insight | dispute@xeniadata.com |
| I'm a user and the information about a property is wrong | feedback@xeniadata.com |
| I'm a webmaster and want to control Xenia's crawl on my domain | crawler@xeniadata.com |
| I'm filing a DMCA copyright notice | dmca@xeniadata.com |
| I'm a journalist or attorney with a methodology question | press@xeniadata.com |
13. Skeleton status
This is a Day 1 skeleton draft. Sections marked "(To be completed before Scope C...)" will be populated as production sources are documented, licenses confirmed, and the right-to-respond mechanism reaches Scope B operational status. The methodology page becomes fully populated before any commercial public publication.
Last updated
2026-05-13.