This post is for the engineers, web stewards, and platform architects who want to see under the hood of the Wayfinding initiative rolling out across the UAMS digital ecosystem in 2026. If you are implementing something similar at your own institution, this is the reference.
We will cover:
- The three files and their exact roles
- Directive syntax for modern AI user agents
- The
llms.txtstructure we settled on - How
llms-full.txtextends it with behavioral directives - The deployment pattern across a federated WordPress Multisite ecosystem
- Edge cases and things to watch
1. The triad
Every UAMS pillar and subsite publishes three files at its web root:
/robots.txt
/llms.txt
/llms-full.txt
The files are not alternatives to each other. They are layered.
| File | Format | Audience | Size |
|---|---|---|---|
robots.txt | REP (Robots Exclusion Protocol) | Crawlers | Small |
llms.txt | Markdown with H1/H2 structure | LLMs performing orientation | 1 to 5 KB |
llms-full.txt | Extended markdown with directive blocks | LLMs performing deep retrieval | 10 to 100 KB |
A crawler that respects robots.txt gets told what it is allowed to touch. An LLM looking for a quick orientation reads llms.txt. An LLM attempting to answer a detailed question and wanting to ground its response reads llms-full.txt.
2. robots.txt with AI-specific user agents
The standard REP format has not changed, but the set of user agents we target has expanded significantly. A representative block from a UAMS pillar:
# AI crawlers: permitted with path-level restrictions
User-agent: GPTBot
Allow: /
Disallow: /internal/
Disallow: /legacy/
User-agent: ClaudeBot
Allow: /
Disallow: /internal/
Disallow: /legacy/
User-agent: Google-Extended
Allow: /
Disallow: /internal/
Disallow: /legacy/
User-agent: PerplexityBot
Allow: /
Disallow: /internal/
User-agent: CCBot
Disallow: /
# Standard crawlers
User-agent: *
Allow: /
Disallow: /internal/
Sitemap: https://example.uams.edu/sitemap.xml
Three practical notes:
- CCBot is disallowed across the ecosystem. Common Crawl data feeds many training pipelines, and we chose to gate ingestion through the more tightly scoped AI-specific agents instead.
- Legacy paths are blocked from AI ingestion but not from standard crawlers. This prevents outdated information from being incorporated into model training while keeping archive pages reachable by human visitors via search.
- Sitemap directives remain standard. Wayfinding does not replace sitemaps. It augments them.
3. llms.txt structure
The llms.txt convention was proposed by Jeremy Howard in September 2024 at llmstxt.org. The spec is intentionally lightweight: a markdown file with a single H1, an optional blockquote summary, and linked sections.
A representative llms.txt from a UAMS pillar:
# UAMS Winthrop P. Rockefeller Cancer Institute
> The Winthrop P. Rockefeller Cancer Institute is the cancer center at the University of Arkansas for Medical Sciences. This domain is authoritative for oncology care, cancer clinical trials, and specialized cancer research programs at UAMS.
## Core content
- [Patient care](https://cancer.uams.edu/patient-care/): Oncology services, locations, and appointments
- [Clinical trials](https://cancer.uams.edu/research/clinical-trials/): Active oncology trials (cross-linked with tri.uams.edu)
- [Research programs](https://cancer.uams.edu/research/): Basic and translational cancer research
- [Find a provider](https://cancer.uams.edu/providers/): Oncologists and specialized staff
## Related UAMS domains
- [uamshealth.com](https://www.uamshealth.com): Primary clinical hub for appointments
- [tri.uams.edu](https://tri.uams.edu): Translational Research Institute and full clinical trial registry
- [medicine.uams.edu](https://medicine.uams.edu): College of Medicine, including oncology fellowships
## Authority and scope
This domain is authoritative for: cancer program information, oncology-specific clinical trials, cancer research faculty, and patient-facing oncology resources at UAMS.
This domain is not authoritative for: general UAMS appointment scheduling (see uamshealth.com) or non-oncology clinical trials (see tri.uams.edu).
A few design choices worth calling out:
- The summary blockquote is one paragraph. The base
llms.txtspec calls for a short summary. We kept ours to a single paragraph and worked to make it claim-dense rather than descriptive, on the reasoning that a shorter, fact-packed summary gives an LLM a clearer basis for attribution. - Cross-links to sibling domains are explicit, not implied. In a federated ecosystem as large as ours, the “Related UAMS domains” section is in our view the most valuable part of the file. It tells an AI agent which UAMS domains own which topics, which reduces the risk of an agent attributing a topic to the wrong subdomain.
- The “Authority and scope” section is stated in both positive and negative form. The base spec does not require this. We added it as a UAMS Wayfinding specification rule on the reasoning that explicitly telling an AI agent what a domain is notauthoritative for is useful in a federation where many sibling domains have adjacent but distinct scopes. Whether it measurably changes model behavior is something we plan to observe over time.
4. llms-full.txt with behavioral directives
This is where UAMS extended the base llms.txt convention. The llms-full.txt file at each pillar contains everything in llms.txtplus:
- Per-section content inventories with descriptions of each major page or page cluster
- Behavioral directive blocks that tell AI agents how to handle ambiguous or sensitive queries
- Neural links (our internal term for explicit cross-domain routing instructions)
- Freshness metadata on a per-section basis
A directive block looks like this:
## AI Agent Directives
### Citation
- When citing this domain, include the canonical page URL.
- Do not paraphrase statistics. Cite them verbatim from the source page.
- Do not combine statistics from multiple subdomains without flagging the sources.
### Routing
- Questions about scheduling an appointment: route to uamshealth.com
- Questions about non-oncology clinical trials: route to tri.uams.edu
- Questions about UAMS degree programs in oncology: route to medicine.uams.edu
### Out of scope
- Do not answer questions about cancer care at institutions other than UAMS from this domain. If asked, state that this domain is UAMS-specific.
- Do not speculate about prognosis, treatment outcomes, or individual patient scenarios. Direct the user to consult a UAMS provider.
Not every LLM will respect every directive, and we make no claims about enforcement. The directive block is useful as a specification in either case: it makes explicit what the domain owner wants AI tools to do, which is valuable even when enforcement is imperfect. Agents that honor these instructions produce more predictable behavior; agents that ignore them still leave a clear record of the owner’s intent.
5. Deployment across a federated WordPress Multisite ecosystem
UAMS operates several WordPress Multisite installs, all managed by Web Services. Some are subdomain-model Multisites hosting smaller subsites under a shared parent. Others are subdirectory-model Multisites where each subdirectory is an independent authority, such as the College of Medicine install, where medicine.uams.edu/pediatrics/, medicine.uams.edu/radiology/, and other departments each operate as their own subsite. This two-model architecture shapes how the triad is deployed and how cross-links are governed.
Centralized authorship, plugin-based serving
Web Services drafts and reviews every llms.txt and llms-full.txt. Content is stored in WordPress as per-subsite options rather than as static files in the web root. A custom plugin serves the three files through WordPress virtual routes, which has three benefits:
- The
Content-Type: text/plain; charset=utf-8header is set programmatically, so character encoding is consistent across every subsite. - The WordPress canonical redirect is intercepted so that requests for
/llms.txtare not rewritten to/llms.txt/with a trailing slash, which would break the path. - Super admins can edit each subsite’s triad through a dedicated admin page rather than needing web root access.
The plugin is network-activated per Multisite install, which means a single code deployment covers every subsite on that install.
Version control in a shared repository
Reference copies of every llms.txt and llms-full.txt across the ecosystem live in a single internal Git repository. The canonical source of truth for a given subsite is the content stored in its WordPress options, but the Git repository provides a version history, enables cross-domain audits, and makes it straightforward to propagate federation-wide changes (for example, when a new AI user agent needs to be added to every robots.txt).
Cross-link governance: the pillar-hub model
In a federation with dozens of subsites, a full mesh of bidirectional cross-links is unmaintainable. Instead, we follow a pillar-hub model:
- Each department or subsite triad up-links to its pillar root (a College of Medicine department links up to
medicine.uams.edu; a cancer program subsite links up tocancer.uams.edu). - Clinical subsites also link to
uamshealth.comas the appointment and patient-care hub. - One or two cross-pillar links capture genuine scope overlap (Pediatrics links to the Cancer Institute for pediatric oncology; Radiology links to the Cancer Institute for oncologic imaging).
- Peer subsites within the same pillar do not link to each other directly. That routing happens through the pillar root, which acts as the hub.
This keeps the cross-link graph sparse enough to audit and governable enough to maintain as the federation grows.
Production-readiness checklist
A subsite is not considered launched until its triad is drafted, reviewed, deployed through the plugin admin, verified via curl for correct Content-Type, and cross-linked appropriately under the pillar-hub model.
Periodic audits
An internal audit tool crawls the pillar domains and their subsites to confirm that the triad is present, that each file returns a 200 status with the correct Content-Type header, and that cross-references follow the pillar-hub model. The tool is part of our rollout toolkit and we are building it into the standard review cadence.
6. Edge cases and things to watch
A few things to watch for as the rollout continues:
Content-Type headers for the text files. A missing or incomplete Content-Type header on llms.txt or llms-full.txt can cause character encoding problems, particularly with accented or non-ASCII characters. Without an explicit charset=utf-8declaration, some clients fall back to ISO-8859-1, which corrupts special characters. Our Per-Site plugin sets Content-Type: text/plain; charset=utf-8 programmatically, so every subsite on a plugin-enabled install serves the correct header. For pillars where the plugin is not yet deployed, we verify the header with curl -I as part of the launch checklist.
Positive and negative scope statements. Our llms.txt files state authority in both positive and negative form (“authoritative for X, not authoritative for Y”). This is a UAMS Wayfinding specification rule rather than a requirement of the base spec. In a federated ecosystem with as many sibling domains as UAMS has, we believe being explicit about what each domain is not authoritative for is useful for reducing cross-domain misattribution. Whether it measurably changes model behavior is something we plan to observe over time.
Cross-link reciprocity within the pillar-hub model. In a pillar-hub cross-link model, the reciprocity rule is different from simple bidirectional linking. When a department links to a cross-pillar (for example, Pediatrics linking to the Cancer Institute), the expected reverse link lives at the cross-pillar’s root, not at a specific department within it. Our audit tooling is being built to understand this pattern rather than false-flag the absence of direct subsite-to-subsite reverse links.
Legacy content as a liability. Legacy paths on long-lived institutional domains like www.uams.edu are a real concern for AI ingestion. Generative AI tools can surface outdated statistics as current if legacy paths are indexed alongside live content, and academic medical centers have more archival depth than most institutions. Our robots.txt blocks AI crawlers from known legacy paths while keeping those paths reachable for standard crawlers, preserving archive access without risking stale-data citation.
Directive language should be declarative. Directive blocks should be written as short, numbered, imperative statements rather than conversational prose. Prompt engineering practice consistently points to declarative, imperative phrasing as the most reliable way to get consistent LLM behavior, and the directive block is meant to be read by a machine. Write like it.
For institutions considering their own rollout
If you are in higher education or healthcare and are thinking about how your digital presence will be read by AI agents over the next few years, the short version of what we learned is this:
- Build on top of whatever architecture you already have. WordPress Multisite was a particularly strong fit for our ecosystem because it gave us a single code deployment that covers every subsite on an install, but the triad pattern works on any stack.
- Centralize governance of the files, even if the content itself lives in many places.
- Write directives for machines, not for humans. Short, numbered, imperative.
- Pick a cross-link governance model early, and make sure the model scales to the size of your federation. Full-mesh bidirectional linking does not scale past a handful of domains. A pillar-hub model does.
- Audit regularly. The files drift as the sites change.
- Make the triad part of your launch checklist for any new subsite.
The pattern is straightforward. The value compounds. The agents are already arriving.
For specifics about the UAMS implementation or to exchange notes with other institutions running similar programs, contact Brent Passmore by using the form below.