Xenia Crawler Information
Page URL: https://xeniadata.com/crawler
Page version: v1 (Day 1 draft, 2026-05-13)
Audience: Webmasters, site operators, ToS reviewers, opposing counsel
Required by: Xenia Legal Framework v1.2 Section 4.4.2
What Xenia is
Xenia is a hotel data aggregator operated by Xenia Data . We build a structured, fact-only index of public hotel information to help travelers find independent and boutique properties that match their needs. We are NOT a re-publisher of third-party reviews, descriptions, or photographs.
How Xenia's automated systems operate
Xenia's automated data-collection systems are designed as a Transient Observation Architecture (TOA). The legal anchor is the Authors Guild v. Google / Field v. Google / hiQ Labs v. LinkedIn / Meta v. Bright Data line of cases. The technical and operational characteristics:
- We do NOT bypass authentication. Our crawlers never attempt to log in, never use real user credentials, never accept Terms of Service through any clickwrap, and never set session cookies. If a page requires login, our crawlers stop.
- We do NOT defeat anti-bot controls. We never solve CAPTCHAs, rotate IPs to evade rate limits, spoof browser fingerprints, or misrepresent the nature of our requests.
- We do NOT store source HTML. Our crawlers fetch a public page, extract structured factual data into our schema, and discard the source response. We do not maintain a copy of the source material beyond the transient processing window.
- We obey
robots.txt. Every fetch begins with arobots.txtcheck. We respectDisallow,Crawl-delay,noindex, andnoarchivedirectives. - We rate-limit politely. Default: 1 request per source domain every 2 seconds, with jitter, and a maximum of 1 concurrent connection per domain.
- We honor
429,403, and cease-and-desist responses immediately. Backoff is automatic; we do not retry through alternative IPs or identities.
Our identifying user-agent
All Xenia automated fetches identify themselves with the following user-agent string:
Xenia-Crawler/1.0 (+https://xeniadata.com/crawler)
If you see traffic from this user-agent, it is us. If you see traffic claiming to be us but from a different IP range than those listed below, please report it to abuse@xeniadata.com — that traffic is impersonation, not Xenia.
IP ranges used by Xenia
Current IP ranges from which Xenia originates automated requests:
| CIDR range | Provider | Effective date |
|---|---|---|
| (To be populated upon Day 7 Cloudflare Workers deployment) | Cloudflare | 2026-05-19 (target) |
This table is updated whenever our infrastructure changes. The current canonical machine-readable form is at: https://xeniadata.com/crawler/ip-ranges.json (live at deployment).
Contact for opt-out, cease-and-desist, or rate limiting
If you wish to: - Opt your domain out of Xenia's automated observation entirely, - Request a custom crawl rate (e.g., once per week instead of once per day), - Issue a cease-and-desist for one or more URLs or paths, - Report suspected impersonation of the Xenia user-agent,
please email:
We commit to acknowledging your request within 2 business days and to honoring it within 7 business days. We log every opt-out request to a permanent record and we do not retry domains we have been asked to stop crawling.
For copyright takedown notices (DMCA), see our DMCA designated agent page instead.
What we collect
Xenia collects facts from public web pages. Examples:
- The hotel's name, address, geographic coordinates, phone number
- Star or class rating where assigned by an official body (AAA, Forbes, Michelin, government)
- Amenity presence (pool, gym, restaurant, parking, Wi-Fi)
- Check-in / check-out times where publicly stated
- Pet policy, smoking policy, family policy where publicly stated
We also use licensed APIs (Cloudbeds for properties under management, Google Places API where licensed) and we collect first-party data through our own onboarding flow with explicit license grants from property owners.
What we do NOT collect
- Verbatim review text from any source
- Verbatim hotel descriptions, marketing copy, or editorial content
- Professional photographs sourced from third parties without documented license
- Personal data of identifiable individuals (guest names, employee personal information beyond a publicly listed work contact)
- Content gated behind authentication, paywall, or any access control
- Any source material from a pirate library or shadow archive
Our compliance posture
Xenia's full compliance framework is published at https://xeniadata.com/methodology and includes:
- Three-tier attribute classification (Verified Fact / Self-Reported / Xenia-Derived Inference)
- Display language conventions for derived inferences
- Property right-to-respond mechanism for any Xenia-derived claim about a property
- Memorization testing for AI-derived outputs
- Audit logging of every data action with immutable retention
A copyright agent is designated with the U.S. Copyright Office per DMCA § 512. See https://xeniadata.com/dmca.
Last updated
2026-05-13. Material changes to this page will be announced at least 7 days in advance to any party that has subscribed to crawler-information updates