Schema Design
How elements are named, scoped, and modeled. Naming conventions, domain namespacing, and the concept/property/vocabulary decision tree.
1. Naming conventions
Casing encodes element type:
| Element type | Convention | Examples |
|---|---|---|
| Concepts | PascalCase | Person, Enrollment, GroupMembership |
| Properties | snake_case | given_name, date_of_birth |
| Vocabulary value codes | snake_case | never_married, bank_transfer |
| Vocabulary identifiers | kebab-case | gender-type, enrollment-status |
Enforced by JSON Schema regex validators. Once a name is published at candidate or above, it cannot be changed.
2. Domain-scoped URIs
Some concepts share a name across domains but carry different semantics ("Enrollment" in social protection vs. education). Domain-specific elements get a domain segment in their URI; universal elements live at the root:
publicschema.org/sp/Enrollment(social protection)publicschema.org/Person(universal)
The test: an element is universal if the same definition carries the same meaning regardless of domain. If not, it belongs in a domain namespace.
The same test applies to properties and vocabularies in principle, but applying it is a judgment call. A controlled vocabulary whose values are tightly tied to a domain workflow (sp/grievance-type, sp/grievance-status, sp/enrollment-status) is clearly domain-scoped. A property that references such a vocabulary (grievance_type, grievance_status) may still sit at the root namespace when the property's primitive shape (a coded value, an ISO date, an identifier reference) is portable even though its value set is not. Several such "root property, domain vocabulary" pairs exist in the current schema. The split is deliberate: it keeps the property URI stable if the concept is later renamed or generalised across domains, while the vocabulary carries the domain-specific semantics.
Names are never prefixed with a domain abbreviation. It is Enrollment, not SPEnrollment. The URI structure handles disambiguation. The build pipeline keys concepts by (domain, id), so two concepts can share a short name as long as their domains differ (e.g., root Person and crvs/Person coexist). ADR-014's named exception has been superseded by ADR-018, which renamed CRVSPerson to Person under the crvs domain.
The build pipeline keys concepts internally by {domain}/{id} (e.g., sp/Enrollment, crvs/Birth) for domain-scoped concepts and by bare id (e.g., Person, Event) for universal ones. This prevents silent overwrites when two domains define concepts with the same short name.
| Code | Domain | Status |
|---|---|---|
sp |
Social protection | Active |
edu |
Education | Future |
health |
Health | Future |
crvs |
Civil registration and vital statistics | Active |
ServicePoint and its subtypes (HealthFacility, School, WaterPoint, RegistrationOffice) remain at root rather than under domain segments. They are classified by sector using the service-point type vocabulary, not by URI domain. This keeps service-point records cross-cuttingly usable across social-protection, education, health, and CRVS workflows without introducing domain-specific supertypes.
3. URI persistence
Every element gets a stable URI. Once published at candidate or above, a URI will not be removed. Deprecated terms continue to resolve with metadata indicating the replacement. See Versioning and Maturity for the full model.
4. Concept, property, or vocabulary
Use this decision tree to determine what kind of element to create.
Step 1: Does it have its own identity? Does this thing exist independently, get referenced from multiple places, and have its own lifecycle? If yes, it is a concept.
Example: GroupMembership is a concept, not a property on Person or Group. It carries its own data (role, dates), has its own lifecycle, and is referenced from both sides.
Step 2: Is it an attribute of a concept? A fact about a specific concept, with no independent identity? If yes, it is a property. Multiple values (e.g., phone numbers) still make it a property with cardinality many.
Step 3: Is the value drawn from a closed set? If the property accepts one answer from a defined list with stable meanings, the value set is a vocabulary.
Step 4: Reference or inline? If the value has its own identity and properties, reference a concept (concept: Location). If it is a simple scalar, use an inline primitive.
Example: latitude is an inline decimal on Location. It has no independent identity, no sub-properties. It is a number.
| Situation | Element type |
|---|---|
| Own lifecycle, referenced from multiple concepts | Concept |
| Attribute of a concept, no independent identity | Property |
| Value from a closed set of options | Vocabulary |
| Value has its own identity and sub-properties | Property referencing a concept |
| Simple scalar | Inline primitive type |
Actor vs. receiver supertypes
Agent and Party are two abstract supertypes that carry different semantics.
Partyis the receiver side: the persons and organised groups of persons (Household, Family, Farm) that can be identified, enrolled in programs, and receive benefits or services. Beneficiary-side references (beneficiary,recipient,subject,redeemable_by,issued_to) range overParty.Agentis the actor side: the persons, organisations, and software that perform, publish, evaluate, decide, or execute. Actor-side references (performed_by,evaluator,publisher) range overAgent.
Person is the only concept that belongs to both hierarchies. A person can both receive services and perform them. Organization is an Agent only (it is not modelled as a receiver today). SoftwareAgent is an Agent only. See ADR-008.
4a. Group-like concepts
Three concepts in PublicSchema describe collections of persons but have distinct semantics. Choosing the right one matters for data quality and interoperability.
Household is a co-residential economic unit. Members share a dwelling and typically share food and resources. The operational definition varies by country and program (combining co-residence, shared budget, shared cooking, and kinship criteria), but the anchor is always physical co-location and shared livelihood. Household is the right concept for registering beneficiary units in social protection programs.
Family is a kinship network. Members are connected by blood, marriage, or adoption, regardless of where they live. A family can span multiple households and geographic areas. Kinship links between members are modelled as Relationship records between Person instances; Family itself carries no dedicated kinship properties at this stage. Family is the right concept when the unit of interest is a relational network rather than a co-residential arrangement.
FamilyRegister is an administrative document, not a group. It is a civil-registration record that tracks a family unit over time as vital events (births, deaths, marriages) occur. It references a Family to expose current membership. FamilyRegister is the right concept for modelling koseki-style, hukou-style, or livret-de-famille-style administrative instruments.
When to use each
| You want to record... | Use |
|---|---|
| A beneficiary unit sharing a dwelling and resources | Household |
| A network of persons connected by blood, marriage, or adoption | Family |
| An administrative civil-registration document tracking a family | FamilyRegister |
Interoperability bridge
Many systems use "family" colloquially to mean the co-residential unit. When exchanging data with such systems, set group_type: family on the Household record. This signals to consumers that the household is being represented as a family for interoperability purposes without misrepresenting the PublicSchema semantics.
5. Temporal context
Almost everything in public service delivery is time-bounded. A status snapshot without a validity period is incomplete. When designing a concept or property, ask: will this value change over time? If yes, model the temporal context explicitly (start/end dates, validity periods).
Date property conventions
Lifecycle concepts use domain-specific named dates that describe the domain event. Relationship and membership concepts use generic start_date / end_date.
| Concept type | Date pattern | Examples |
|---|---|---|
| Lifecycle (Enrollment) | Domain-specific named dates | enrollment_date, exit_date |
| Lifecycle (Entitlement) | Domain-specific period | coverage_period_start, coverage_period_end |
| Lifecycle (Grievance) | Domain-specific event dates | submission_date, resolution_date |
| Single event (PaymentEvent) | Single event date | payment_date |
| Relationship (GroupMembership, Relationship) | Generic dates | start_date, end_date |
Do not mix both patterns on the same concept. A lifecycle concept should not carry both enrollment_date and start_date.
6. Property independence
A property like start_date is defined once and reused across concepts. When a shared property needs concept-specific value sets (e.g., status on Enrollment vs. Grievance), it specializes via different vocabulary references rather than pretending the differences don't exist.
Cross-concept property reuse
Property independence is not limited to repeated structural fields. Substantive observables can be reused across concepts too. water_source, sanitation_facility, and dwelling_type appear on both SocioEconomicProfile (baseline registration context) and DwellingDamageProfile in a sibling schema (post-shock assessment). In each case the property is declared once and listed in each concept's properties; the pattern also illustrates how PublicSchema properties can be reused by domain-specific profile subtypes vendored in a sibling schema.
The rules that keep this honest:
- One property file per named concept.
water_sourceis a single YAML file referenced from both profiles. - Contextual framing lives on the concept, not the property. The property definition names the observable ("the household's primary source of drinking water"). Each concept definition names how that observable is interpreted in that concept (baseline vs. post-shock).
- Reuse must be disclosed in both concepts' narrative definitions. A reader on either page must be able to see that the field also appears elsewhere and why.
- Reuse does not make records type-compatible. A
SocioEconomicProfilerecord and aDwellingDamageProfilerecord are different things even when their property values overlap. Adopters should consult the concept page, not the property list, when serialising into a strongly typed shape. - Split when wording diverges. If the property's own definition needs different text in each context, create two properties.
locationandlocation_of_assessmentare split this way:locationis the concept-agnostic geographic location of the record's subject (the household's site for a Household record, the organisation's primary site for an Organization record);location_of_assessmentis where a post-shock damage assessment was physically carried out, which may differ after displacement.
triggering_hazard_event (on DwellingDamageProfile) and triggering_vital_event (on CivilStatusAnnotation) follow the same split. Both were originally unified as one triggering_event whose type was widened to concept:Event, but the expected subtype carries meaning for validators and practitioners, so each consumer now declares its own typed reference. See ADR-007 for the full argument.
7. Age applicability
Some Person-scoped properties are only meaningful for specific age groups. The Washington Group Short Set and Extended Set apply to adults; the Child Functioning Module applies to children ages 2-4 and 5-17. WHO growth standards apply to under-5s. Rather than encoding these rules in definition prose alone (which machines cannot parse), properties carry an optional age_applicability array of controlled tags.
| Tag | Numeric range | Source of the band |
|---|---|---|
infant_0_1 |
0-23 months | General infancy (covers MICS infant modules, early WHO growth) |
child_2_4 |
2-4 years (24-59 months) | CFM 2-4 variant; WHO Child Growth Standards |
child_5_17 |
5-17 years | CFM 5-17 variant; also CRC definition of "child" |
adolescent |
10-19 years | WHO definition (deliberately crosscutting with child_5_17 and adult) |
adult |
18+ years | WG-SS / WG-ES |
Topical relevance, not eligibility
age_applicability answers "which age groups does this property concern?" It is not a filter primitive for eligibility. Age-based filtering is the consumer's job, computed from date_of_birth. Under this framing, overlap between tags is a feature, not a bug: a property about adolescent reproductive health carries both child_5_17 and adolescent because the topic genuinely concerns both the under-18 bracket and the WHO 10-19 bracket.
A consumer asking "is this field relevant for a 15-year-old?" evaluates the child's age against all of the property's bands and asks whether any match. A consumer asking "is this topic adolescence-specific?" checks for the adolescent tag specifically.
Population rules
- Only populate on properties that attach to
Person. Age-applicability is meaningless on concepts without an age. - Not required. Absence means the property applies broadly to any age.
- Validator enforces bibliography-implied coverage: properties cited by
washington-group-ssorwashington-group-esmust includeadult; properties cited bywashington-group-cfmmust include at least one of the child bands (child_2_4orchild_5_17). Properties may narrow CFM coverage where the definition text explains which variant they map to.
8. External equivalents vs. serialisation bindings
The external_equivalents field on properties was originally intended for equivalents in other ontologies (SEMIC Core Vocabularies, DCI Core): a property like given_name maps exactly to http://www.w3.org/ns/person#firstName. The match is semantic: both describe the same concept in an alternate ontology.
The same field is also used for serialisation bindings such as FHIR R4 Observation with LOINC codes. These are not equivalents in the semic/dci sense; they are instructions for how to serialise this property into a specific interop format. The distinction matters when reading a property detail page: a SEMIC row says "this concept exists in another ontology"; a FHIR/LOINC row says "when you serialise this data into FHIR, use this code."
Convention:
- Per-item LOINC codes belong on the property (each WG item has its own LOINC code).
- Whole-vocabulary LOINC answer-list references belong on the vocabulary (
standard.uri). Example:pregnancy-statuscarries one LOINC answer-list URI for the whole value set.
9. Sensitivity annotations
Some properties reveal sensitive circumstances regardless of whether they identify a specific person. program_ref reveals enrollment in a specific program (which may target HIV, disability, or poverty). grievance_type reveals that someone filed a complaint.
| Level | When to use | What it signals |
|---|---|---|
standard |
Default. No special handling beyond normal data protection. | Can be omitted (assumed if absent). |
sensitive |
Reveals circumstances (health, poverty, victimhood) in most contexts. | Requires justification to collect or disclose. |
restricted |
Should not appear in credentials at routine service points. | Requires a Data Protection Impact Assessment. |
This is a practitioner warning, not a compliance label. Whether a property constitutes personal data depends on the record it appears in, not the property itself. See Selective Disclosure for credential-level classification.
10. Display escape hatches
Some properties carry schema-level semantics that are intentionally decoupled from jurisdiction-specific display requirements. The certificate_label property on Parent is an example: the data model stores role-nature (biological, gestational, legal, adoptive) on parental_role, decoupled from gender, but some jurisdictions are required by law to print gendered or positional labels ("mother", "father", "parent 1", "parent 2") on the civil certificate. Rather than forcing the schema to carry gendered codes, certificate_label provides a free-text field for the label as it must appear on the printed document. The underlying role-nature remains machine-readable and gender-neutral; the display-layer requirement is satisfied without contaminating the data model.
See a problem on this page? Report it on GitHub.
Help improve this page
Highlight any text on this page to leave an annotation. Join our review group to get started.
If you have a GitHub account, click the feedback icon next to any section heading to report a specific issue.