Title: | Create Data Frames for Exchange and Reuse |
Version: | 0.4.0 |
Date: | 2025-08-26 |
Language: | en-GB |
Maintainer: | Daniel Antal <daniel.antal@dataobservatory.eu> |
Description: | The 'dataset' package helps create semantically rich, machine-readable, and interoperable datasets in R. It extends tidy data frames with metadata that preserves meaning, improves interoperability, and makes datasets easier to publish, exchange, and reuse in line with ISO and W3C standards. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
URL: | https://dataset.dataobservatory.eu/ |
BugReports: | https://github.com/dataobservatory-eu/dataset/issues/ |
LazyData: | true |
Imports: | assertthat, haven, ISOcodes, labelled, pillar, tibble, utils, vctrs |
RoxygenNote: | 7.3.2 |
Suggests: | dplyr, jsonld, knitr, rdflib, rmarkdown, spelling, tidyr, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Depends: | R (≥ 3.5) |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-08-25 22:11:39 UTC; DanielAntal |
Author: | Daniel Antal |
Repository: | CRAN |
Date/Publication: | 2025-08-26 09:30:02 UTC |
Coerce a defined vector to character
Description
as_character()
is the recommended method to convert a
defined()
vector to a character type. It is metadata-aware and
ensures that the underlying data is character before coercion.
Base R's as.character()
method applied to defined
vectors
simply strips the class and returns the values as a plain character vector.
This is equivalent to calling as_character()
with preserve_attributes = FALSE
.
Usage
as_character(x, ...)
## S3 method for class 'haven_labelled_defined'
as_character(x, preserve_attributes = FALSE, ...)
## S3 method for class 'haven_labelled_defined'
as.character(x, ...)
Arguments
x |
A vector created with |
... |
Reserved for future use. |
preserve_attributes |
Logical. If |
Details
If preserve_attributes = TRUE
, the returned character vector retains
semantic metadata such as unit
, concept
, and namespace
, though the
"defined"
class itself is removed. If preserve_attributes = FALSE
(default), a plain character vector is returned with all attributes stripped.
For numeric-based defined
vectors, as_character()
throws an informative
error to avoid accidental coercion of non-character data.
Note: as.character()
(base R) is supported but simply returns the raw
values, and does not preserve or warn about metadata loss.
Value
A character vector.
See Also
Examples
# Recommended use
fruits <- defined(c("apple", "avocado", "kiwi"), label = "Fruit", unit = "kg")
as_character(fruits, preserve_attributes = TRUE)
# Strip metadata
as_character(fruits, preserve_attributes = FALSE)
# Equivalent base R fallback
as.character(fruits)
Create a Bibentry Object with DataCite Metadata Fields
Description
Constructs a bibliographic metadata record conforming to the
DataCite Metadata Schema. The resulting
object is stored as a modified utils::bibentry()
enriched with structured
Dublin Core and DataCite-compliant metadata.
Usage
as_datacite(x, type = "bibentry", ...)
datacite(
Title,
Creator,
Identifier = NULL,
Publisher = NULL,
PublicationYear = NULL,
Subject = subject_create(term = "data sets", subjectScheme =
"Library of Congress Subject Headings (LCSH)", schemeURI =
"https://id.loc.gov/authorities/subjects.html", valueURI =
"http://id.loc.gov/authorities/subjects/sh2018002256"),
Type = "Dataset",
Contributor = NULL,
Date = ":tba",
DateList = NULL,
Language = NULL,
AlternateIdentifier = ":unas",
RelatedIdentifier = ":unas",
Format = ":tba",
Version = "0.1.0",
Rights = ":tba",
Description = ":tba",
Geolocation = ":unas",
FundingReference = ":unas"
)
is.datacite(x)
## S3 method for class 'datacite'
is.datacite(x)
## S3 method for class 'datacite'
print(x, ...)
Arguments
x |
An object that is tested if it has a class "datacite". |
type |
A DataCite 4.4 metadata can be returned as:
|
... |
Optional parameters to add to a |
Title |
The name(s) by which the resource is known. Similar to dct:title. |
Creator |
One or more |
Identifier |
A persistent identifier (e.g., DOI or URI). May refer to a specific version or all versions of the resource. |
Publisher |
The name of the organization that holds, publishes, or
distributes the resource. Required by DataCite. See |
PublicationYear |
The year of public availability (in |
Subject |
A topic, keyword, or classification term. See |
Type |
The resource type. Defaults to |
Contributor |
An individual or institution that contributed to the development, distribution, or curation of the resource. |
Date |
A date in |
DateList |
A list of multiple dates. Currently not supported. |
Language |
Language code as per IETF BCP 47 / ISO 639-1. See |
AlternateIdentifier |
Optional local or secondary identifier. Defaults
to |
RelatedIdentifier |
Related resources (e.g., prior versions, papers).
Defaults to |
Format |
A technical format (e.g., |
Version |
A free-text version string (e.g., |
Rights |
Licensing or usage restrictions for the resource. Defaults to
|
Description |
Free-text summary or additional information. Defaults to
|
Geolocation |
Geographic location covered or referenced by the resource.
See |
FundingReference |
Information about funding or financial support.
Defaults to |
Details
DataCite is a leading non-profit organization that provides persistent identifiers (DOIs) for research data and other research outputs. Members of the research community use DataCite to register datasets with globally resolvable metadata for citation and discovery.
This function sets "Dataset"
as the default resource type. The Size
attribute (e.g., bytes, pages, etc.) is automatically added if available.
Value
as_datacite(x, type)
returns the DataCite bibliographical metadata
of x
either as a list, a bibentry object, an N-Triples text serialisation
or a dataset_df object.
A utils::bibentry()
object with DataCite-compliant fields. Use
as_datacite()
to extract the metadata as a list or bibentry object.
is.datacite(x)
returns a logical values (if the object
x
is of class datacite
).
Source
See Also
Learn more in the vignette:
bibrecord
Other bibrecord functions:
as_dublincore()
,
bibrecord()
Examples
datacite(
Title = "Growth of Orange Trees",
Creator = c(
person(
given = "N.R.",
family = "Draper",
role = "cre",
comment = c(VIAF = "http://viaf.org/viaf/84585260")
),
person(
given = "H",
family = "Smith",
role = "cre"
)
),
Publisher = "Wiley",
Date = 1998,
Language = "en"
)
# Extract bibliographic metadata
as_datacite(orange_df)
# As a list
as_datacite(orange_df, "list")
Add or Retrieve Dublin Core Metadata
Description
Adds or retrieves metadata conforming to the Dublin Core Metadata Terms standard, enabling consistent and structured citation and retrieval of R dataset objects.
is.dublincore()
checks whether an object inherits from the "dublincore"
class.
Usage
as_dublincore(x, type = "bibentry", ...)
dublincore(
title,
creator,
contributor = NULL,
year = NULL,
publisher = NULL,
identifier = NULL,
subject = NULL,
type = "DCMITYPE:Dataset",
dataset_date = NULL,
language = NULL,
relation = NULL,
dataset_format = "application/r-rds",
rights = NULL,
datasource = NULL,
description = NULL,
coverage = NULL
)
is.dublincore(x)
## S3 method for class 'dublincore'
print(x, ...)
Arguments
x |
An object to test. |
type |
The resource type. For datasets, use |
... |
Additional metadata fields. |
title |
A name given to the resource. See |
creator |
One or more |
contributor |
Additional contributors ( |
year |
An explicit publication year. If omitted, inferred from
|
publisher |
A character or |
identifier |
A unique persistent identifier (e.g., DOI). See |
subject |
A keyword or controlled vocabulary term. See |
dataset_date |
A publication or release date ( |
language |
ISO 639-1 language code. See |
relation |
A related resource (e.g., version, paper, or parent dataset).
Currently only supports an URI, for example,
|
dataset_format |
The technical format of the dataset (e.g., MIME type).
See |
rights |
A string describing intellectual property or usage rights.
Use a URI like |
datasource |
A URL or label for the original source of the dataset. |
description |
A free-text summary of the dataset. See |
coverage |
Geographic or temporal extent (spatial/temporal coverage). |
Details
The Dublin Core Metadata Element Set (DCMES) is a standardized vocabulary for describing digital and physical resources. It includes 15 core fields and is formally standardized as ISO 15836, IETF RFC 5013, and ANSI/NISO Z39.85.
This function constructs a utils::bibentry()
object extended with DCMI
terms and is compatible with dataset_df()
objects. The resulting metadata
can be used for semantic documentation and machine-readable citation.
For compatibility with utils::bibentry()
, the dataset_date
parameter is
automatically used to derive both publication_date
and year
fields.
Value
A bibentry
object extended with class "bibrecord"
, storing structured
Dublin Core metadata. Use as_dublincore()
to extract the metadata in list,
tabular, or RDF form.
A logical value: TRUE
if x
is a Dublin Core metadata record (i.e.,
inherits from "dublincore"
), otherwise FALSE
.
Source
See Also
Learn more in the vignette:
bibrecord
Other bibrecord functions:
as_datacite()
,
bibrecord()
Examples
orange_bibentry <- dublincore(
title = "Growth of Orange Trees",
creator = c(
person(
given = "N.R.",
family = "Draper",
role = "cre",
comment = c(VIAF = "http://viaf.org/viaf/84585260")
),
person(given = "H", family = "Smith", role = "cre")
),
contributor = person(given = "Antal", family = "Daniel", role = "dtm"),
publisher = "Wiley",
datasource = "https://isbnsearch.org/isbn/9780471170822",
dataset_date = 1998,
identifier = "https://doi.org/10.5281/zenodo.14917851",
language = "en",
description = "The Orange data frame has 35 rows and 3 columns of records
of the growth of orange trees."
)
# To inspect structured metadata from a dataset_df object:
as_dublincore(orange_df, type = "list")
Coerce a defined vector to a factor
Description
Converts a defined()
vector with value labels into a
factor using haven::as_factor()
. This allows categorical defined
vectors to behave like standard factors in models and plotting.
Usage
as_factor(x, ...)
Arguments
x |
A vector created with |
... |
Reserved for future extensions; not used. |
Value
A factor vector with levels derived from the value labels.
Examples
sex <- defined(
c(0, 1, 1, 0),
label = "Sex",
labels = c("Female" = 0, "Male" = 1)
)
as_factor(sex)
Coerce a defined vector to numeric
Description
as_numeric()
is the recommended method to convert a defined()
vector to a numeric vector. It ensures the underlying data is numeric and can
optionally preserve semantic metadata.
Base R's as.numeric()
does not support custom classes like defined()
.
This method drops all metadata and class information, returning a plain
numeric vector. It is equivalent to as_numeric(x, preserve_attributes = FALSE)
.
Usage
as_numeric(x, ...)
## S3 method for class 'haven_labelled_defined'
as_numeric(x, preserve_attributes = FALSE, ...)
## S3 method for class 'haven_labelled_defined'
as.numeric(x, ...)
Arguments
x |
A vector created with |
... |
Reserved for future use. |
preserve_attributes |
Logical. Whether to keep metadata attributes.
Defaults to |
Details
If preserve_attributes = TRUE
, the returned vector retains the unit
,
concept
, and namespace
attributes, but is no longer of class "defined"
.
If FALSE
(default), a base numeric vector is returned without metadata.
For character-based defined
vectors, an error is thrown to avoid invalid
coercion.
Value
A numeric vector (either bare or with metadata, depending on the
preserve_attributes
argument).
See Also
as.character()
, strip_defined()
Examples
gdp <- defined(c(3897L, 7365L), label = "GDP", unit = "million dollars")
# Drop all metadata
as_numeric(gdp)
# Preserve unit and concept
as_numeric(gdp, preserve_attributes = TRUE)
# Equivalence to base coercion (without metadata)
as.numeric(gdp)
# Metadata-aware variant preferred in pipelines
attr(as_numeric(gdp, TRUE), "unit")
Create a Modern Metadata Object Compatible with bibentry
Description
Constructs a utils::bibentry()
object extended with Dublin Core and
DataCite-compatible fields. This unified structure supports use with
functions such as dublincore()
and datacite()
, and is the internal
format for storing rich metadata with datasets.
Usage
bibrecord(
title,
author,
contributor = NULL,
publisher = NULL,
year = NULL,
date = Sys.Date(),
identifier = NULL,
subject = NULL,
...
)
Arguments
title |
A character string specifying the dataset title. |
author |
A |
contributor |
Optional list or vector of |
publisher |
A character string or |
year |
Publication year. Automatically derived from |
date |
A Date object or character string in ISO format. |
identifier |
A persistent identifier (e.g., DOI or URL). |
subject |
Optional keyword, tag, or controlled vocabulary term. |
... |
Additional fields such as |
Value
An object of class "bibrecord"
and "bibentry"
, suitable for citation and
embedding in metadata-aware structures such as dataset_df()
.
See Also
Learn more in the vignette:
bibrecord
Other bibrecord functions:
as_datacite()
,
as_dublincore()
Examples
bibrecord(
title = "Gross domestic product, volumes",
author = person("Eurosat"),
publisher = person("Eurostat"),
identifier = "https://doi.org/10.2908/TEINA011",
date = as.Date("2025-05-20")
)
Bind strictly defined rows
Description
Add rows of dataset y
to dataset x
, validating all
semantic metadata. Metadata (labels, units, concept definitions,
namespaces) must match exactly. Additional dataset-level metadata such as
title and creator can be overridden using ...
.
Usage
bind_defined_rows(x, y, ..., strict = FALSE)
Arguments
x |
A |
y |
A |
... |
Optional dataset-level attributes such as |
strict |
Logical. If |
Details
This function combines two semantically enriched datasets created
with dataset_df()
. All variable-level attributes — including labels,
units, concept definitions, and namespaces — must match. If strict =
TRUE
(the default), the row identifier namespace (used in the rowid
column) must also match exactly.
If strict = FALSE
, row identifiers from y
may differ and will
be ignored; the output will inherit x
's row identifier scheme.
Value
A new dataset_df
object with rows from x
and y
, combined
semantically.
Examples
A <- dataset_df(
length = defined(c(10, 15),
label = "Length",
unit = "cm", namespace = "http://example.org"
),
identifier = c(id = "http://example.org/dataset#"),
dataset_bibentry = dublincore(
title = "Dataset A",
creator = person("Alice", "Smith")
)
)
B <- dataset_df(
length = defined(c(20, 25),
label = "Length",
unit = "cm", namespace = "http://example.org"
),
identifier = c(id = "http://example.org/dataset#")
)
bind_defined_rows(A, B) # succeeds
C <- dataset_df(
length = defined(c(30, 35),
label = "Length",
unit = "cm", namespace = "http://example.org"
),
identifier = c(id = "http://another.org/dataset#")
)
## Not run:
bind_defined_rows(A, C, strict = TRUE) # fails: mismatched rowid
## End(Not run)
bind_defined_rows(A, C, strict = FALSE) # succeeds: rowid inherited
Combine defined vectors with metadata checks
Description
The c()
method for defined
vectors ensures that all semantic metadata
(label, unit, concept, namespace, and value labels) match exactly. This
prevents accidental loss or mixing of incompatible definitions during
concatenation.
Usage
## S3 method for class 'haven_labelled_defined'
c(...)
Arguments
... |
One or more vectors created with |
Details
All input vectors must:
Have identical
label
attributesHave identical
unit
,concept
, andnamespace
Have identical value labels (or none)
Value
A single defined
vector with concatenated values and retained
metadata.
See Also
Examples
a <- defined(1:3, label = "Length", unit = "meter")
b <- defined(4:6, label = "Length", unit = "meter")
c(a, b)
Remove role suffixes from formatted person names
Description
Remove role suffixes from formatted person names
Usage
clean_person_name(p)
Arguments
p |
A |
Value
Character string without role annotations, e.g. "Jane Doe"
.
Get or set contributors
Description
contributor()
is a lightweight wrapper around creator()
that
works only with contributors. It retrieves or updates only the contributor
entries in the dataset's bibliographic metadata.
Usage
contributor(x)
contributor(x, overwrite = FALSE) <- value
Arguments
x |
A dataset object created with |
overwrite |
Logical. If |
value |
A |
Details
All people are stored in the author
slot of the underlying
utils::bibentry
. This helper preserves primary creators and filters or
updates only those entries that represent contributors.
A contributor is defined as:
a person with
role == "ctb"
, ora person with a
comment[["contributorType"]]
.Primary creators (authors) typically have
role %in% c("aut", "cre")
.Contributors can be further annotated with metadata in
comment
, for example:
comment = c(contributorType = "hostingInstitution", ORCID = "0000-0000-0000-0000")
Value
-
contributor()
returns autils::person
or a list of such objects corresponding to contributors. -
contributor<-()
returns the updated dataset (invisibly).
See Also
Other bibliographic helper functions:
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
df <- dataset_df(data.frame(x = 1))
creator(df) <- person("Jane", "Doe", role = "aut")
# Add a contributor
contributor(df, overwrite = FALSE) <-
person("GitHub",
role = "ctb",
comment = c(contributorType = "hostingInstitution")
)
# Replace all contributors
contributor(df) <- person("Support", "Team", role = "ctb")
# Inspect only contributors
contributor(df)
Get/set the Creator of the object.
Description
Add the optional Creator
property as an attribute to a
dataset object.
Usage
creator(x)
creator(x, overwrite = TRUE) <- value
Arguments
x |
A semantically rich data frame object created by
|
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The |
Details
The Creator
corresponds to
dct:creator
in Dublin Core and Creator in DataCite. The name of the entity that holds,
archives, publishes prints, distributes, releases, issues, or produces the
dataset. This property will be used to formulate the citation, so consider
the prominence of the role.
Value
The Creator attribute as a character of length one is added to
x
.
See Also
Other bibliographic helper functions:
contributor()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
creator(orange_df)
# To change author:
creator(orange_df) <- person("Jane", "Doe")
# To add author:
creator(orange_df, overwrite = FALSE) <- person("John", "Doe")
Create a new dataset_df
object
Description
The dataset_df()
constructor creates semantically rich modern data frames.
These inherit from tibble::tibble
and carry structured metadata using
attributes.
Usage
dataset_df(
...,
identifier = c(obs = "http://example.com/dataset#obs"),
var_labels = NULL,
units = NULL,
concepts = NULL,
dataset_bibentry = NULL,
dataset_subject = NULL
)
as_dataset_df(
df,
identifier = c(obs = "http://example.com/dataset#obs"),
var_labels = NULL,
units = NULL,
concepts = NULL,
dataset_bibentry = NULL,
dataset_subject = NULL,
...
)
is.dataset_df(x)
## S3 method for class 'dataset_df'
print(x, ...)
is_dataset_df(x)
Arguments
... |
Vectors (columns) that should be included in the dataset. |
identifier |
A named vector of one or more URI prefixes for row IDs.
Defaults to |
var_labels |
A named list of human-readable labels for each variable. |
units |
A named list of measurement units for measured variables. |
concepts |
A named list of linked concepts (URIs) for variables or dimensions. |
dataset_bibentry |
A bibliographic metadata record for the dataset,
created using |
dataset_subject |
A subject descriptor created with |
df |
A |
x |
A |
Details
Use is.dataset_df()
to check class membership.
S3 methods for dataset_df
include:
-
print()
to display the dataset with metadata -
summary()
to summarize both data and metadata
For full details, see vignette("dataset_df", package = "dataset")
.
Value
A dataset_df
object: a tibble with attached metadata stored in
attributes.
is.dataset_df
returns a logical value
(if the object is of class dataset_df
.)
Note
A simple, serverless scaffolding for publishing dataset_df
objects
on the web (with HTML + RDF exports) is available at
https://github.com/dataobservatory-eu/dataset-template.
See Also
defined()
, dublincore()
, datacite()
, subject()
Examples
my_dataset <- dataset_df(
country_name = defined(
c("AD", "LI"),
concept = "http://data.europa.eu/bna/c_6c2bb82d",
namespace = "https://www.geonames.org/countries/$1/"
),
gdp = defined(
c(3897, 7365),
label = "Gross Domestic Product",
unit = "million dollars",
concept = "http://data.europa.eu/83i/aa/GDP"
),
identifier = c(
obs = "https://dataobservatory-eu.github.io/dataset-template#"
),
dataset_bibentry = dublincore(
title = "GDP of Andorra and Liechtenstein",
description = "A small but semantically rich dataset example.",
creator = person("Jane", "Doe", role = "cre"),
publisher = "Open Data Institute",
language = "en"
)
)
# Basic usage
print(my_dataset)
head(my_dataset)
summary(my_dataset)
# Metadata access
as_dublincore(my_dataset)
as_datacite(my_dataset)
# Export description as RDF triples
my_description <- describe(my_dataset, con = tempfile())
my_description
Get or set the technical format of a dataset
Description
Adds or retrieves the optional "format"
field of a dataset's bibentry.
This field is the dataset's technical/media type (e.g., a MIME type).
Usage
dataset_format(x)
dataset_format(x, overwrite = FALSE) <- value
Arguments
x |
A semantically rich data frame created with |
overwrite |
Logical. Replace an existing non‑default value? If |
value |
A length‑one character string specifying the format
(e.g., |
Details
The format field corresponds to
dct:format
in Dublin Core and to format
in
DataCite.
It is useful for indicating serialization such as "text/csv"
,
"application/parquet"
, or "application/r-rds"
.
If no format is set, this helper uses the package default
"application/r-rds"
.
Value
The "format"
(technical format) as a character string (length 1).
When assigning, the updated object x
is returned invisibly.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
dataset_format(orange_df) <- "text/csv"
dataset_format(orange_df)
# Reset to the package default
dataset_format(orange_df) <- NULL
Get or Set the Title of a Dataset
Description
Retrieve or assign the main title of a dataset, typically used as the primary label in metadata exports (e.g., DataCite or Dublin Core).
Usage
dataset_title(x)
dataset_title(x, overwrite = FALSE) <- value
Arguments
x |
A dataset object created by |
overwrite |
Logical. If |
value |
A character string representing the new title. If |
Details
According to the Dublin Core specification for title
,
the title represents the name by which the resource is formally known.
The DataCite metadata schema supports multiple titles (e.g., translated, alternative), but this function currently supports only a single main title.
Value
dataset_title()
returns the current dataset title as a character
string. dataset_title<-()
returns the updated dataset object (invisible).
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
dataset_title(orange_df)
# Set a new title with overwrite = TRUE
dataset_title(orange_df, overwrite = TRUE) <- "The Growth of Orange Trees"
dataset_title(orange_df)
Dataset to triples (three columns or N-Triples)
Description
Converts a dataset to RDF-style triples with subject, predicate, and object columns. Supports semantic expansion via variable metadata.
Usage
dataset_to_triples(x, idcol = NULL, expand_uri = TRUE, format = "data.frame")
Arguments
x |
A |
idcol |
Name or index of the subject column. If NULL, defaults to
|
expand_uri |
Logical; if TRUE, expands URIs using namespaces and definitions. |
format |
Output format: |
Details
For publishing examples, a minimal serverless scaffold is provided at https://github.com/dataobservatory-eu/dataset-template, which shows how to host CSV + RDF serialisations on GitHub Pages without any server setup.
Value
Either a data.frame
with columns s
, p
, and o
, or a character
vector of N-Triple lines.
Note
A simple, serverless scaffolding for publishing dataset_df
objects
on the web (with HTML + RDF exports) is available at
https://github.com/dataobservatory-eu/dataset-template.
Examples
# A minimal example with just rowid and geo
data("gdp", package = "dataset")
small_geo <- dataset_df(
geo = defined(
gdp$geo[1:3],
label = "Geopolitical entity",
concept = "http://example.com/prop/geo",
namespace = "https://dd.eionet.europa.eu/vocabulary/eurostat/geo/$1"
)
)
# View as triple table
dataset_to_triples(small_geo)
# View as N-Triples
dataset_to_triples(small_geo, format = "nt")
Build default provenance bundle
Description
Construct a small PROV bundle (as N‑Triples) describing the dataset, the software agent, and an optional creation time.
Usage
default_provenance(
dataset_id = "http://example.com/dataset#",
author = NULL,
dtm = NULL,
generated_at_time = NULL
)
Arguments
dataset_id |
Base IRI for the dataset (used as the |
author |
Optional creator/author agent. |
dtm |
Optional data team/maintainer agent. |
generated_at_time |
Optional POSIXct time; defaults to |
Details
This helper is used internally to seed provenance metadata. It emits a set of
PROV statements including an Entity
for the dataset, an Activity
for
creation, and SoftwareAgent
entries for the package citation.
Value
A character vector of N‑Triples suitable for the "prov"
attribute.
Create a semantically well-defined, labelled vector
Description
defined()
constructs a vector enriched with semantic metadata such as a
label, unit of measurement, concept URI, and optional namespace.
These vectors behave like base R vectors but retain metadata during
subsetting, comparison, and printing.
Usage
defined(
x,
labels = NULL,
label = NULL,
unit = NULL,
concept = NULL,
namespace = NULL,
...
)
is.defined(x)
## S3 method for class 'haven_labelled_defined'
summary(object, ...)
Arguments
x |
A vector of type character, numeric, Date, factor, or a |
labels |
An optional named vector of value labels. Only a subset of values may be labelled. |
label |
A short human-readable label (string of length 1). |
unit |
Unit of measurement (e.g., "kg", "hours"). Must be a string of
length 1 or |
concept |
A URI or concept name representing the meaning of the variable. |
namespace |
Optional string or named character vector, used for value-level URI expansion. |
... |
Reserved for future use. |
object |
An R object to be summarised. |
Details
The resulting object inherits from haven::labelled()
and integrates with
tidyverse workflows, enabling downstream conversion to RDF and other
standards.
Value
A vector of class "defined"
(technically
haven_labelled_defined
), which behaves like a standard vector with
additional semantic metadata and is inherited from haven::labelled()
.
See Also
browseVignettes("dataset")
is.defined()
, as_numeric()
, as_character()
, as_factor()
,
strip_defined()
Examples
gdp_vector <- defined(
c(3897, 7365, 6753),
label = "Gross Domestic Product",
unit = "million dollars",
concept = "http://data.europa.eu/83i/aa/GDP"
)
# To check the s3 class of the vector:
is.defined(gdp_vector)
# To print the defined vector:
print(gdp_vector)
# To summarise the defined vector:
summary(gdp_vector)
# Subsetting work as expected:
gdp_vector[1:2]
Describe a dataset in N-Triples format
Description
Writes provenance and Dublin Core metadata of a dataset to a file or connection in N-Triples format.
Usage
describe(x, con)
Arguments
x |
A |
con |
A connection or a character string path (e.g. from |
Value
Writes N-Triples to con
and invisibly returns x
.
Examples
test_ds <- dataset_df(
rowid = defined(c("eg:1", "eg:2"),
namespace = "http://example.com/dataset#"
),
geo = defined(
gdp$geo[1:2],
label = "Country",
concept = "http://example.com/prop/geo",
namespace = "https://eionet.europa.eu/geo/$1"
),
dataset_bibentry = dublincore(
title = "Example Dataset",
creator = person("John", "Doe")
)
)
# returns invisibly the contents of the text file serialisation:
testdescription <- describe(test_ds, con = tempfile())
testdescription
Get or set the dataset Description
Description
Get or set the optional Description
property as an attribute
on a dataset object.
Usage
description(x)
description(x, overwrite = FALSE) <- value
Arguments
x |
A dataset object created with |
overwrite |
Logical. If |
value |
The new description, as a character string. |
Details
The Description
is recommended for discovery in DataCite. It
captures additional information that does not fit other metadata categories
— such as technical notes or dataset usage. It is a free-text field. See
dct:description.
Value
The Description
attribute as a character vector of length 1.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
description(orange_df)
description(orange_df, overwrite = TRUE) <- "This dataset records orange tree growth."
description(orange_df)
Internal: Expand multi-valued DC fields to RDF triples
Description
Converts scalar or vector fields into RDF triples.
Usage
expand_triples(dataset_id, predicate_uri, values)
Arguments
dataset_id |
The subject URI |
predicate_uri |
The RDF predicate URI |
values |
A scalar, character vector, or list (e.g., person objects) |
Value
A character vector of RDF triples
Format contributors into a citation string
Description
Format a list of utils::person
objects into a compact string, merging roles
per person and normalizing names. Contributors without explicit roles are
assigned "ctb"
. If NULL
or ":unas"
is supplied, returns ":unas"
.
Usage
fix_contributor(contributors = NULL)
Arguments
contributors |
A vector or list of |
Value
A single character string, e.g. "{Jane Doe [dtm, ctb]} and {John Smith [ctb]}"
.
A Small GDP Dataset
Description
A compact sample of GDP and main aggregates from Eurostat's annual international cooperation dataset. This data subset contains illustrative records for select countries and time periods.
Usage
gdp
Format
A data frame with 10 rows and 5 variables:
-
geo
: Country name (character) -
year
: Reference year (integer) -
gdp
: Gross Domestic Product value (numeric) -
unit
: Unit of measurement, e.g., "Million EUR" (character) -
freq
: Observation frequency, e.g., "Annual" (character)
Details
This dataset is intended for examples, tests, and demonstration purposes. It reflects simplified GDP data as published by Eurostat. The actual Eurostat dataset includes more countries, breakdowns, and metadata.
Source
Eurostat (2021). GDP and main aggregates - international data cooperation (annual data). doi:10.2908/NAIDA_10_GDP
Examples
head(gdp)
Get or Set the Geolocation of a Dataset Object
Description
Access or assign the optional geolocation
attribute to a semantically rich
dataset object.
Usage
geolocation(x)
geolocation(x, overwrite = TRUE) <- value
Arguments
x |
A dataset object created by |
overwrite |
Logical. If |
value |
A character string specifying the |
Details
The geolocation
field describes the spatial region or named place where
the data was collected or that the dataset is about. This field is
recommended for data discovery in DataCite Metadata Schema 4.4.
See: DataCite: Geolocation Guidance
Value
A character string of length 1, representing the geolocation
attribute attached to x
.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
orange_dataset <- orange_df
geolocation(orange_df) <- "US"
geolocation(orange_df)
geolocation(orange_df, overwrite = FALSE) <- "GB"
Get or set the bibentry
Description
Retrieve or replace the bibliographic entry stored in a dataset's attributes.
The entry is a utils::bibentry
used to hold citation metadata for
dataset_df()
objects.
Usage
get_bibentry(dataset)
set_bibentry(dataset) <- value
Arguments
dataset |
A dataset created with |
value |
A |
Details
New datasets are initialized with reasonable defaults. To build a new
bibentry with sensible defaults and field names, use datacite()
(DataCite)
or dublincore()
(Dublin Core), then assign it with
set_bibentry(dataset) <- value
.
See the vignette for more background:
vignette("bibentry", package = "dataset")
.
Value
-
get_bibentry(dataset)
returns theutils::bibentry
stored indataset
's attributes. -
set_bibentry(dataset) <- value
sets the attribute and returns the modified dataset invisibly.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
# Get the bibentry of a dataset_df object:
be <- get_bibentry(orange_df)
# Create a well-formed bibentry (DataCite-style):
be2 <- datacite(
Creator = person("Jane", "Doe"),
Title = "The Orange Trees Dataset",
Publisher = "MyOrg"
)
# Assign the new bibentry:
set_bibentry(orange_df) <- be2
# Inspect in different notations:
as_datacite(orange_df, type = "list")
as_dublincore(orange_df, type = "list")
Get concepts for all variables in a dataset_df
Description
Returns a named list of concept URIs (or NULLs) for all variables.
Usage
get_variable_concepts(x)
Arguments
x |
A |
Value
A named list of concept URIs for each variable.
Examples
get_variable_concepts(orange_df)
Add Identifier to First Column of a Dataset
Description
Adds a prefixed identifier (e.g., eg:
) to the first column of a dataset,
useful for generating semantic row IDs (e.g., for RDF serialization).
Usage
id_to_column(x, prefix = "eg:", ids = NULL)
Arguments
x |
A dataset created with |
prefix |
A character string used as the prefix for row identifiers.
Defaults to |
ids |
Optional. A character vector of custom IDs to use instead of row names. |
Value
A dataset of the same class as x
, with the first column updated to include
unique prefixed identifiers.
Examples
# Example with a dataset_df object:
id_to_column(orange_df)
# Example with a regular data.frame:
id_to_column(Orange, prefix = "orange:")
Get or Set the Identifier of a Dataset or Metadata Record
Description
Retrieve or assign the identifier
attribute of a dataset or
bibliographic metadata object.
Usage
identifier(x)
identifier(x, overwrite = TRUE) <- value
Arguments
x |
A |
overwrite |
Logical. If |
value |
A character string giving the identifier. Can be named (e.g.,
|
Details
An identifier provides an unambiguous reference to a resource. Recommended practice is to supply a persistent identifier string, such as a DOI, ISBN, or URN, that conforms to a recognized identification system.
Both Dublin Core
and DataCite 4.4
define identifier
as a core property. If the identifier is a DOI, it will
also be stored in the doi
field of the metadata record.
Although identifier
is not part of the minimal Dublin Core term set, it is
always included in dataset
metadata for compatibility with publishing and
indexing systems. You may omit it if working under a strict DC profile.
For best practice in choosing identifier schemes, see the IANA-registered URI schemes.
Value
For identifier()
, the current identifier as a character string. For
identifier<-()
, the updated object (invisible).
Examples
orange_copy <- orange_df
# Get the current identifier
identifier(orange_copy)
# Set a new identifier (e.g., a DOI)
identifier(orange_copy) <- "https://doi.org/10.9999/example.doi"
# Prevent accidental overwrite
identifier(orange_copy, overwrite = FALSE) <- "https://example.org/id"
# Use numeric and NULL values
identifier(orange_copy) <- 12345
identifier(orange_copy) <- NULL # Sets ":unas"
Set the Primary Language of a Dataset
Description
Assign the primary language of a semantically rich dataset object using an
ISO 639 language code or full language name. This sets the language
attribute in the dataset's metadata.
Usage
language(x)
language(x, iso_639_code = "639-3") <- value
language(x, iso_639_code = "639-3") <- value
Arguments
x |
A dataset object created by |
iso_639_code |
A character string indicating the desired return format:
either |
value |
A 2-letter or 3-letter language code (ISO 639-1 or ISO 639-2), or a full language name (case-insensitive). |
Details
This function supports recognition of:
2-letter codes (ISO 639-1, e.g.,
"en"
,"fr"
)3-letter codes from both:
-
Alpha_3_B
(bibliographic, e.g.,"fre"
) -
Alpha_3_T
(terminologic, e.g.,"fra"
)
-
Full language names (e.g.,
"English"
,"French"
)
For compatibility with open science repositories and modern metadata
standards, this function returns the terminologic code (Alpha_3_T
)
when available. If Alpha_3_T
is missing for a language, the legacy
bibliographic code (Alpha_3_B
) is used as a fallback.
Full language names (e.g., "English"
, "Spanish"
) are matched
case-insensitively against the ISO 639-2 Name field. Exact matches are
attempted first; if none are found, a prefix match is used. For example:
-
"English"
returns"eng"
-
"English, Old"
returns"ang"
This means that:
Both
"fra"
(terminologic) and"fre"
(bibliographic) will be accepted as valid input for FrenchThe resulting value stored and returned will be
"fra"
This behaviour aligns with:
Common repository practices (Zenodo, OSF, Figshare)
If value
is NULL
, the language is marked as ":unas"
(unspecified).
In some cases<U+2014>especially for historical or moribund languages<U+2014>multiple
similar names may exist. In such cases, it is safer to use a specific
language code (e.g., "ang"
instead of "English, Old"
and "enm"
for "English, Middle (1100-1500)"
). You can also
refer directly to the definitions in ISOcodes::ISO_639_2
for clarity.
Value
The dataset with an updated language
attribute, typically an ISO
639-2/T code (Alpha_3_T
) such as "fra"
, "eng"
, "spa"
, etc.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
df <- dataset_df(data.frame(x = 1:3))
language(df) <- "English" # Returns "eng"
language(df) <- "fre" # Legacy code; returns "fra"
language(df) <- "fra" # Returns "fra"
language(df, iso_639_code = "639-1") <- "fra" # Returns "fr"
language(df) <- NULL # Sets ":unas"
Map R person roles to schema.org-style roles
Description
Map R person roles to schema.org-style roles
Usage
map_role_to_schema(role)
Arguments
role |
A character vector of roles (e.g. |
Value
A character vector with schema.org-style roles.
Create an N-Triple
Description
Create a single N-Triple triple.
Usage
n_triple(s, p, o)
Arguments
s |
The subject of a triplet. |
p |
The predicate of a triplet. |
o |
The object of a triplet. |
Details
N-Triples is an easy to parse line-based subset of Turtle to serialize
RDF. An N-Triple triple is a sequence of RDF terms representing the subject,
predicate and object of an RDF Triple. Use n_triples()
to serialize
multiple statements.
Value
A character vector containing one N-Triple string.
Source
Examples
s <- "http://example.org/show/218"
p <- "http://www.w3.org/2000/01/rdf-schema#label"
o <- "That Seventies Show"
n_triple(s, p, o)
Create N-Triples
Description
Create RDF triple statements to annotate your dataset with standard, interoperable metadata.
Usage
n_triples(triples)
Arguments
triples |
A character vector of concatenated N-Triples, created with
|
Details
N-Triples is a line-based serialization format for RDF. It is easy to parse and widely supported. For details, see the W3C RDF 1.2 N-Triples specification.
Value
A character vector of unique N-Triple strings.
Examples
triple_1 <- n_triple(
"http://example.org/show/218",
"http://www.w3.org/2000/01/rdf-schema#label",
"That Seventies Show"
)
triple_2 <- n_triple(
"http://example.org/show/218",
"http://example.org/show/localName",
'"Cette Série des Années Septante"@fr-be'
)
n_triples(c(triple_1, triple_2, triple_1))
Growth of Orange Trees
Description
A dataset recording the growth of orange trees, replicated from the classic
datasets::Orange
dataset and implemented as a dataset_df
S3 class with enhanced semantic metadata.
Usage
orange_df
Format
A data frame with 35 rows and 4 variables:
-
rowid
: A unique identifier for each row (character) -
tree
: Tree identifier (ordered factor) -
age
: Age of the tree in days (numeric) -
circumference
: Trunk circumference in mm (numeric)
Details
This is a semantically enriched version of the classic Orange dataset,
constructed using the dataset_df()
and dublincore()
constructors.
Each column includes semantic metadata such as units, labels, concepts,
or namespace identifiers. The dataset also embeds a machine-readable citation
for reproducibility and provenance tracking.
Constructor Example
orange_bibentry <- dublincore( title = "Growth of Orange Trees", creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), contributor = person( given = "Antal", family = "Daniel", role = "dtm" ), publisher = "Wiley", datasource = "https://isbnsearch.org/isbn/9780471170822", dataset_date = 1998, identifier = "https://doi.org/10.5281/zenodo.14917851", language = "en", description = "The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees." ) orange_df <- dataset_df( rowid = defined(paste0("orange:", row.names(Orange)), label = "ID in the Orange dataset", namespace = c("orange" = "datasets::Orange") ), tree = defined(Orange$Tree, label = "The number of the tree" ), age = defined(Orange$age, label = "The age of the tree", unit = "days since 1968/12/31" ), circumference = defined(Orange$circumference, label = "circumference at breast height", unit = "milimeter", concept = "https://www.wikidata.org/wiki/Property:P2043" ), dataset_bibentry = orange_bibentry ) orange_df$rowid <- defined(orange_df$rowid, namespace = "https://doi.org/10.5281/zenodo.14917851" )
References
Draper, N. R. & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley.
Pinheiro, J. C. & Bates, D. M. (2000). Mixed-effects Models in S and S-PLUS. Springer.
Becker, R. A., Chambers, J. M. & Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.
Examples
# Print with semantic citation and data preview
print(orange_df)
# Access semantic metadata associated with variables
print(orange_df$age)
# Retrieve the embedded bibliographic record
as_dublincore(orange_df)
Get or update provenance information
Description
Retrieve or append provenance statements (in N‑Triples form) stored on a
dataset_df()
object.
Usage
provenance(x)
provenance(x) <- value
Arguments
x |
A dataset created with |
value |
Character vector of N‑Triples created by |
Details
Provenance is stored in the "prov"
attribute as N‑Triples text. Use
n_triple()
or n_triples()
to construct valid statements that follow
PROV‑O (e.g., prov:wasGeneratedBy
, prov:wasInformedBy
).
Value
-
provenance(x)
returns the contents of the"prov"
attribute (character vector of N‑Triples), orNULL
if none is set. -
provenance(x) <- value
appendsvalue
to the"prov"
attribute and returns the modified dataset invisibly.
Examples
provenance(orange_df)
# Add a provenance statement:
provenance(orange_df) <- n_triple(
"https://doi.org/10.5281/zenodo.10396807",
"http://www.w3.org/ns/prov#wasInformedBy",
"http://example.com/source#1"
)
Get or Set the Publication Year of a Dataset Object
Description
Access or assign the optional publication_year
attribute to a semantically
rich dataset object.
Usage
publication_year(x)
publication_year(x, overwrite = TRUE) <- value
Arguments
x |
A dataset object created by |
overwrite |
Logical. If |
value |
A character string specifying the publication year. |
Details
The publication_year
represents the year when the dataset was or will be
made publicly available, in YYYY
format. For additional context, see
DataCite: Publication Year-Additional Guidance.
Value
The publication_year
attribute as a character string.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publisher()
,
relation()
,
rights()
,
subject()
Examples
publication_year(orange_df)
publication_year(orange_df) <- "1998"
Get or Set the Publisher of a Dataset Object
Description
The publisher is the entity responsible for holding, archiving, releasing, or distributing the resource. It is typically included in dataset citation metadata.
For software, this might refer to a code repository (e.g., GitHub). If both
a hosting platform and a producing institution are involved, use the
publisher for the institution and creator()
with
contributorType = "hostingInstitution"
for the platform.
Usage
publisher(x)
publisher(x, overwrite = TRUE) <- value
Arguments
x |
A dataset object created with |
overwrite |
Logical. Should existing publisher metadata be overwritten?
Defaults to |
value |
A character string specifying the publisher. |
Details
Adds or retrieves the optional "publisher"
attribute for a dataset object.
This property aligns with dct:publisher
(Dublin Core) and publisher
(DataCite).
Value
A character string of length one containing the "publisher"
attribute.
When assigning, the updated object x
is returned invisibly.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
relation()
,
rights()
,
subject()
Examples
publisher(orange_df) <- "Wiley"
publisher(orange_df)
Add or retrieve related items (DataCite/Dublin Core)
Description
Manage related resources for a dataset using a unified accessor.
For DataCite 4.x, this maps to
relatedIdentifier
(+ type & relation).For Dublin Core, this maps to
dct:relation
(string).
Usage
relation(x)
relation(x) <- value
related_create(
relatedIdentifier,
relationType,
relatedIdentifierType,
resourceTypeGeneral = NULL
)
is.related(x)
related_item(x)
related_item(x) <- value
Arguments
x |
A dataset object created with |
value |
A |
relatedIdentifier |
A string with the identifier of the related resource. |
relationType |
A string naming the relation type (per DataCite vocabulary). |
relatedIdentifierType |
A string naming the identifier type ( |
resourceTypeGeneral |
Optional: a string naming the general type of the related resource. |
Details
To remain compatible with utils::bibentry()
, the bibentry stores
only the
string identifier (e.g., DOI/URL). The full structured object created by
related_create()
is preserved in the "relation"
attribute.
A "related"
object is a small S3 list with the following elements:
-
relatedIdentifier
: the related resource identifier (DOI, URL, etc.) -
relationType
: the DataCite relation type (e.g.,"IsPartOf"
,"References"
) -
relatedIdentifierType
: the type of identifier ("DOI"
,"URL"
, etc.) -
resourceTypeGeneral
: optional, the general type of the related resource (e.g.,"Text"
,"Dataset"
)
Value
-
relation(x)
returns:a single structured
"related"
object (fromrelated_create()
) if only one relation is present,a list of
"related"
objects if multiple relations are present,otherwise it falls back to the bibentry field (
relatedidentifier
for DataCite orrelation
for Dublin Core).
-
relation(x) <- value
sets the"relation"
attribute (structured object or list of objects) and the bibentry string fields (relatedidentifier
andrelation
), and returns the dataset invisibly. -
related_create()
constructs a structured"related"
object. -
is.related(x)
returnsTRUE
ifx
inherits from class"related"
.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
rights()
,
subject()
Examples
df <- dataset_df(data.frame(x = 1))
relation(df) <- related_create(
relatedIdentifier = "10.1234/example",
relationType = "IsPartOf",
relatedIdentifierType = "DOI"
)
relation(df) # structured object
get_bibentry(df)$relation # "10.1234/example"
get_bibentry(df)$relatedidentifier # "10.1234/example"
# Character input is normalized to a DOI/URL with default types
relation(df) <- "https://doi.org/10.5678/xyz"
relation(df) # structured object (relationType/Type filled with defaults)
# Create related object directly
rel <- related_create("https://doi.org/10.5678/xyz", "References", "DOI")
is.related(rel) # TRUE
Get or Set the Rights of a Dataset Object
Description
Adds or retrieves the optional "rights"
attribute of a dataset object.
This field contains information about intellectual property or usage rights.
Usage
rights(x)
rights(x, overwrite = FALSE) <- value
Arguments
x |
A semantically rich data frame created with |
overwrite |
Logical. Should the existing value be replaced? If |
value |
A character string specifying the rights (e.g., |
Details
The "rights"
field corresponds to
dct:rights
from Dublin Core, and to rights
in DataCite.
Rights information typically includes statements about legal ownership, licensing, or usage conditions. It helps ensure that users understand how a dataset may be reused, cited, or shared.
Value
The "rights"
attribute of the dataset as a character string (length 1).
When assigning, the updated object x
is returned invisibly.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
subject()
Examples
rights(orange_df) <- "CC-BY-SA"
rights(orange_df)
Strip the class from a defined vector
Description
Converts a defined
vector to a base R numeric or character,
retaining metadata as passive attributes.
Usage
strip_defined(x)
Arguments
x |
A |
Value
A base R vector with attributes (label
, unit
, etc.) intact.
See Also
Examples
gdp <- defined(c(3897L, 7365L), label = "GDP", unit = "million dollars")
strip_defined(gdp)
fruits <- defined(c("apple", "avocado", "kiwi"),
label = "Fruit", unit = "kg"
)
strip_defined(fruits)
Create, add, or retrieve a subject
Description
Manage the subject metadata of a dataset. The subject can be stored as a
simple character term or as a structured object with subproperties created by
subject_create()
.
Usage
subject(x)
subject_create(
term,
schemeURI = NULL,
valueURI = NULL,
prefix = NULL,
subjectScheme = NULL,
classificationCode = NULL
)
subject(x) <- value
is.subject(x)
Arguments
x |
A dataset object created with |
term |
A subject term, for example |
schemeURI |
URI of the subject identifier scheme, for example
|
valueURI |
URI of the subject term, for example
|
prefix |
Abbreviated prefix for a scheme URI, for example |
subjectScheme |
Name of the subject scheme, classification code, or authority if one is used. This acts as a namespace. |
classificationCode |
Classification code for schemes that do not have
|
value |
A subject object created by |
Details
The subject property records what the dataset is about.
The DataCite subject property
allows multiple subproperties, but these cannot be stored directly in a
standard utils::bibentry
object.
Therefore:
If you set a character string as the subject, it is stored in both the bibentry and the
"subject"
attribute.If you set a structured subject (via
subject_create()
), the$term
value is stored in the bibentry, and the full object is stored in the"subject"
attribute of thedataset_df
object.
Value
-
subject(x)
returns:a single
"subject"
object if only one is present,a list of
"subject"
objects if multiple are present,otherwise falls back to the plain string from the bibentry.
-
subject(x) <- value
accepts a character vector, a"subject"
object, or a list of"subject"
objects, and updates both the bibentry slot and the"subject"
attribute. Returns the dataset invisibly. -
subject_create()
returns a structured"subject"
object — or a list of them if multiple terms are provided. -
is.subject(x)
returnsTRUE
ifx
inherits from class"subject"
.
See Also
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
Examples
# Set a structured subject
subject(orange_df) <- subject_create(
term = "Oranges",
schemeURI = "http://id.loc.gov/authorities/subjects",
valueURI = "http://id.loc.gov/authorities/subjects/sh85095257",
subjectScheme = "LCCH",
prefix = "lcch:"
)
# Retrieve subject with subproperties
subject(orange_df)
Internal: Generate RDF triples for a single column
Description
Create subject-predicate-object triples from one column of a dataset
Usage
triples_column_generate(s_vec, col, colname)
Arguments
s_vec |
A character vector of subject URIs (length = number of rows) |
col |
The column vector (e.g., |
colname |
The name of the column (used as fallback for predicate) |
Value
A data.frame with columns s, p, o
Internal: Convert triple data.frame to N-Triples format
Description
Turns a data.frame with s
, p
, o
columns into N-Triples strings.
Usage
triples_to_ntriples(df)
Arguments
df |
A data.frame with columns |
Value
A character vector of N-Triple lines.
Get / set a concept definition for a vector or a dataset
Description
Assigns a concept URI to a vector created with defined()
. This
method updates the concept
attribute and validates that the input is a single
character string or NULL.
Usage
var_concept(x, ...)
var_concept(x) <- value
## Default S3 replacement method:
var_concept(x) <- value
Arguments
x |
A vector to which the concept URI will be assigned. |
... |
Further parameters for inheritance, not in use. |
value |
A character string with a concept URI or NULL to remove the concept. |
Details
get_variable_concepts()
is identical to var_concept()
.
Value
The (linked) concept of the meaning of the data contained by a
vector constructed withdefined()
.
The modified vector with updated concept
metadata.
Examples
small_country_dataset <- dataset_df(
country_name = defined(c("Andorra", "Lichtenstein"), label = "Country"),
gdp = defined(c(3897, 7365),
label = "Gross Domestic Product",
unit = "million dollars"
)
)
var_concept(small_country_dataset$country_name) <- "http://data.europa.eu/bna/c_6c2bb82d"
var_concept(small_country_dataset$country_name)
# To remove a concept definition of variable
var_concept(small_country_dataset$country_name) <- NULL
x <- defined(c(1, 2, 3), label = "Example Variable")
var_concept(x) <- "http://example.org/concept/XYZ"
var_concept(x)
Get or Set a Variable Label
Description
Adds or retrieves a human-readable label as a metadata attribute for a variable or vector. This label is useful for making variables easier to understand than their programmatic names (e.g., column names).
label_attribute()
is a low-level helper that retrieves the "label"
attribute
of an object without any fallback or printing logic. It is primarily used internally.
The var_label<-
assignment method sets or removes the "label"
attribute
of a vector or data frame column. This allows attaching human-readable
descriptions to variables for interpretability and downstream metadata use.
Usage
## S3 method for class 'defined'
var_label(x, ...)
label_attribute(x)
var_label(x) <- value
## S3 replacement method for class 'haven_labelled_defined'
var_label(x) <- value
## S3 method for class 'dataset_df'
var_label(
x,
unlist = FALSE,
null_action = c("keep", "fill", "skip", "na", "empty"),
recurse = FALSE,
...
)
Arguments
x |
A vector or data frame. |
... |
Further arguments passed to or used by methods. |
value |
A character string to assign as the label, or |
unlist |
For data frames, return a named vector instead of a list. |
null_action |
For data frames, controls how to handle columns without a variable label. Options are:
|
recurse |
If |
Details
This interface builds on labelled::var_label()
and is compatible with
the defined()
infrastructure for semantic metadata (labels, namespaces,
units, and variable identifiers).
See labelled::var_label()
for low-level usage. For a comprehensive
guide to working with variable labels and semantic metadata, see:
vignette("defined", package = "dataset")
.
Value
-
var_label(x)
returns the"label"
attribute ofx
as a character string. -
var_label(x) <- value
sets, removes, or replaces the label attribute ofx
, returning the updated object invisibly.
A character string if the "label"
attribute exists, or NULL
if not present.
The modified object x
, returned invisibly with the updated "label"
attribute.
See Also
labelled::var_label()
, var_labels()
, defined()
Other defined metadata methods and functions:
var_labels()
,
var_namespace()
,
var_unit()
Examples
# Retrieve the label attribute
var_label(orange_df$circumference)
# Set or update the label attribute
var_label(orange_df$circumference) <- "circumference (breast height)"
# Example: Retrieve variable labels from a dataset_df
df <- dataset_df(
id = defined(1:3, label = "Observation ID"),
temp = defined(c(22.5, 23.0, 21.8), label = "Temperature (°C)"),
site = defined(c("A", "B", "A"))
)
# List form (default)
var_label(df)
# Character vector form
var_label(df, unlist = TRUE, null_action = "empty")
# Exclude variables without labels
var_label(df, null_action = "skip")
# Replace missing labels with column names
var_label(df, null_action = "fill")
Get or set all variable labels on a dataset
Description
Retrieve or assign labels for all variables (columns) in a dataset.
Usage
var_labels(
x,
unlist = FALSE,
null_action = c("keep", "fill", "skip", "na", "empty")
)
var_labels(x) <- value
Arguments
x |
A |
unlist |
Logical; if |
null_action |
How to handle columns without labels. One of:
|
value |
|
Details
This is the dataset-level equivalent of var_label()
.
It works with any data.frame
-like object, including dataset_df()
, and
returns/sets the "label"
attribute of each column.
Labels are useful for storing human-readable descriptions of variables that may have short or cryptic column names.
For internal purposes, this function uses the "var_labels"
dataset
attribute and delegates to var_label()
and
var_label<-()
on individual columns.
Value
Getter: a named list (or vector if
unlist = TRUE
) of variable labels.Setter: the modified
x
with updated labels, returned invisibly.
See Also
Other defined metadata methods and functions:
var_label()
,
var_namespace()
,
var_unit()
Examples
df <- dataset_df(
id = defined(1:3, label = "Observation ID"),
temp = defined(c(22.5, 23.0, 21.8), label = "Temperature (°C)"),
site = defined(c("A", "B", "A"))
)
# Get all variable labels
var_labels(df)
# Set multiple labels at once
var_labels(df) <- list(site = "Site code")
# Return as a named vector with empty string for unlabeled vars
var_labels(df, unlist = TRUE, null_action = "empty")
Get or Set the Namespace of a Variable
Description
Retrieve or assign the namespace part of a permanent, global variable identifier, independent of the current R session or instance.
Usage
var_namespace(x, ...)
var_namespace(x) <- value
get_variable_namespaces(x, ...)
namespace_attribute(x)
get_namespace_attribute(x)
set_namespace_attribute(x, value)
namespace_attribute(x) <- value
Arguments
x |
A vector. |
... |
Additional arguments for method compatibility with other classes. |
value |
A character string specifying the namespace, or |
Details
The namespace
attribute is useful when working with remote, linked, or
open data sources. Variable identifiers in such datasets are often qualified
with a common namespace prefix. When combined, the prefix and namespace form
a persistent URI or IRI for the variable.
Retaining the namespace ensures the identifiers remain valid and resolvable during validation, merging, or future updates of the vector (such as when it is used as a column in a dataset).
get_variable_namespaces()
is an alias for var_namespace()
.
namespace_attribute()
and set_namespace_attribute()
are internal helpers.
For full usage, see:
vignette("defined", package = "dataset")
<U+2014> demonstrating integration of
variable labels, namespaces, units of measure, and machine-independent
identifiers.
Value
A character string representing the namespace attribute of a vector
constructed with defined()
. Returns the updated object (in setter forms).
See Also
Other defined metadata methods and functions:
var_label()
,
var_labels()
,
var_unit()
Examples
# Define a vector with a namespace
x <- defined("Q42", namespace = c(wd = "https://www.wikidata.org/wiki/"))
# Get the namespace
var_namespace(x)
get_variable_namespaces(x)
# Set the namespace
var_namespace(x) <- "https://example.org/ns/"
# Remove the namespace
var_namespace(x) <- NULL
# Use lower-level helpers (not typically used directly)
namespace_attribute(x)
namespace_attribute(x) <- "https://example.org/custom/"
Get or Set a Unit of Measure
Description
Adds or retrieves a unit of measure (UoM) attribute to a vector. Units provide semantic meaning for numeric or character data — such as currency, weight, or time — helping prevent incorrect operations like merging values measured in incompatible units.
The var_unit<-
assignment method sets, updates, or removes the "unit"
attribute of a vector. This can be used with defined()
vectors or base
vectors to ensure consistent semantic annotation.
unit_attribute()
is a low-level helper to directly access the "unit"
attribute of a vector, without applying fallback logic. It is mainly used
internally.
get_unit_attribute()
is an alias for unit_attribute()
, included for naming
consistency in codebases that distinguish getter/setter patterns.
set_unit_attribute()
is the low-level assignment function that sets or
removes the "unit"
attribute of an object. Used internally by
unit_attribute<-
.
Usage
var_unit(x, ...)
var_unit(x) <- value
## Default S3 replacement method:
var_unit(x) <- value
get_variable_units(x, ...)
unit_attribute(x)
get_unit_attribute(x)
set_unit_attribute(x, value)
unit_attribute(x) <- value
Arguments
x |
A vector. |
... |
Further arguments for method extensions. |
value |
A single character string or |
Details
The "unit"
attribute stores a machine-readable representation of a
unit of measure (e.g., "kg"
, "USD"
, "days"
). This is useful when
working with linked open data or when combining data from multiple sources
where silent mismatches in units could cause errors.
For full integration with semantic metadata (e.g., labels, concepts,
namespaces), use defined()
vectors or dataset_df()
objects.
get_variable_units()
is an alias for var_unit()
.
See vignette("defined", package = "dataset")
for end-to-end examples
involving semantic enrichment.
Value
-
var_unit(x)
returns the"unit"
attribute as a character string. -
var_unit(x) <- value
sets, updates, or removes the unit and returns the modified vector invisibly.
The modified object x
, returned invisibly with the updated "unit"
attribute.
The "unit"
attribute of the object x
, or NULL
if not set.
The object x
with updated "unit"
attribute.
See Also
Other defined metadata methods and functions:
var_label()
,
var_labels()
,
var_namespace()
Examples
# Retrieve the unit of measure (if defined)
var_unit(orange_df$circumference)
# Regular data.frame columns have no unit by default
var_unit(mtcars$wt)
# Add a unit to a column
var_unit(mtcars$wt) <- "1000 lbs"
# Remove the unit
var_unit(mtcars$wt) <- NULL
Cast defined vector to base numeric (double)
Description
S3 method for vctrs::vec_cast()
that converts a
haven_labelled_defined
vector (created by defined()
) to a base
numeric
(double) vector, dropping all semantic metadata.
Usage
## S3 method for class 'haven_labelled_defined'
vec_cast.double(x, to, ...)
Arguments
x |
|
to |
Target type (must be |
... |
Ignored; reserved for future use. |
Value
A plain numeric (double) vector.
Examples
x <- defined(c(10, 20), unit = "kg")
vctrs::vec_cast(x, double())
as.numeric(x)
From haven
Description
From haven
Usage
vec_cast_named(x, to, ...)
Convert to XML Schema Definition (XSD) Types
Description
Converts R vectors, data frames, and dataset_df
objects to
XML Schema Definition (XSD)
compatible string representations such as xsd:decimal
, xsd:boolean
,
xsd:date
, and xsd:dateTime
.
Usage
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'haven_labelled_defined'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'data.frame'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'dataset_df'
xsd_convert(x, idcol = "rowid", shortform = TRUE, ...)
## S3 method for class 'tbl_df'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'character'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'numeric'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'integer'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'logical'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'factor'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'POSIXct'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'Date'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
## S3 method for class 'difftime'
xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
Arguments
x |
An object (vector, data frame, tibble, or |
idcol |
Column name or position to use as row (observation) identifier.
If |
shortform |
Logical. If |
... |
Additional arguments passed to methods. |
Details
This is primarily used for generating RDF-compatible typed literals.
For vectors, returns a character vector of typed literals.
For data frames or tibbles, returns a data frame with the same structure but with all values converted to XSD strings.
For
dataset_df
objects, behaves like the data frame method but preserves dataset-level attributes.
Value
A character vector or data frame with values serialized as XSD-compatible RDF literals.
Class-specific examples
xsd_convert(42L) # integer -> xsd:integer xsd_convert(c(TRUE, FALSE, NA)) # logical -> xsd:boolean xsd_convert(Sys.Date()) # Date -> xsd:date xsd_convert(Sys.time()) # POSIXct -> xsd:dateTime xsd_convert(factor("apple")) # factor -> xsd:string xsd_convert(c("apple", "banana")) # character -> xsd:string
Examples
# Simple data frame with mixed types
df <- data.frame(
id = 1:2,
value = c(3.14, 2.71),
active = c(TRUE, FALSE),
date = as.Date(c("2020-01-01", "2020-12-31"))
)
# Short vs long-form URI:
xsd_convert(120L, shortform = TRUE)
xsd_convert(121L, shortform = FALSE)