knitr::opts_chunk$set(
collapse = TRUE, comment = "#>",
eval = identical(tolower(Sys.getenv("LLMR_RUN_VIGNETTES", "false")), "true") )
OpenAI-compatible (OpenAI, Groq, Together, x.ai,
DeepSeek)
Chat Completions accept a response_format
(e.g.,
{"type":"json_object"}
or a JSON-Schema payload).
Enforcement varies by provider but the interface is OpenAI-shaped.
See OpenAI
API overview, Groq API
(OpenAI-compatible), Together: OpenAI
compatibility, x.ai: OpenAI API
schema, DeepSeek:
OpenAI-compatible endpoint
Anthropic (Claude)
No global “JSON mode.” Instead, you define a tool with
an input_schema
(JSON Schema) and
force it via tool_choice
, so the model
must return a JSON object that validates the schema.
See Anthropic
Messages API: tools & input_schema
Google Gemini (REST)
Set responseMimeType = "application/json"
in
generationConfig
to request JSON. Some models also accept
responseSchema
for constrained JSON
(model-dependent).
See Gemini
documentation —
llm_parse_structured()
strips fences and extracts the
largest balanced {...}
or
[...]
before parsing.llm_parse_structured_col()
hoists fields (supports
dot/bracket paths and JSON Pointer) and keeps non-scalars as
list-columns.llm_validate_structured_col()
validates locally via
jsonvalidate (AJV).enable_structured_output()
flips the right provider
switch (OpenAI-compat response_format
, Anthropic
tool + input_schema
, Gemini
responseMimeType
/responseSchema
).All chunks use a tiny helper so your document knits even without API keys.
safe({
library(LLMR)
cfg <- llm_config(
provider = "openai", # try "groq" or "together" too
model = "gpt-4o-mini",
temperature = 0
)
# Flip JSON mode on (OpenAI-compat shape)
cfg_json <- enable_structured_output(cfg, schema = NULL)
res <- call_llm(cfg_json, 'Give me a JSON object {"ok": true, "n": 3}.')
parsed <- llm_parse_structured(res)
cat("Raw text:\n", as.character(res), "\n\n")
str(parsed)
})
#> Raw text:
#> {
#> "ok": true,
#> "n": 3
#> }
#>
#> List of 2
#> $ ok: logi TRUE
#> $ n : num 3
What could still fail? Proxies labeled
“OpenAI-compatible” sometimes accept response_format
but
don’t strictly enforce it; LLMR’s parser recovers from fences or
pre/post text.
Groq serves Qwen 2.5 Instruct models with OpenAI-compatible APIs.
Their Structured Outputs feature enforces JSON Schema
and (notably) expects all properties to be listed under
required
.
safe({
library(LLMR); library(dplyr)
# Schema: make every property required to satisfy Groq's stricter check
schema <- list(
type = "object",
additionalProperties = FALSE,
properties = list(
title = list(type = "string"),
year = list(type = "integer"),
tags = list(type = "array", items = list(type = "string"))
),
required = list("title","year","tags")
)
cfg <- llm_config(
provider = "groq",
model = "qwen-2.5-72b-instruct", # a Qwen Instruct model on Groq
temperature = 0
)
cfg_strict <- enable_structured_output(cfg, schema = schema, strict = TRUE)
df <- tibble(x = c("BERT paper", "Vision Transformers"))
out <- llm_fn_structured(
df,
prompt = "Return JSON about '{x}' with fields title, year, tags.",
.config = cfg_strict,
.schema = schema, # send schema to provider
.fields = c("title","year","tags"),
.validate_local = TRUE
)
out %>% select(structured_ok, structured_valid, title, year, tags) %>% print(n = Inf)
})
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> [2025-08-26 01:02:36.347837] LLMR Error: LLM API request failed.
#> HTTP status: 404
#> Reason: The model `qwen-2.5-72b-instruct` does not exist or you do not have access to it.
#> Tip: check model params for provider/API version.
#> [2025-08-26 01:02:36.346521] LLMR Error: LLM API request failed.
#> HTTP status: 404
#> Reason: The model `qwen-2.5-72b-instruct` does not exist or you do not have access to it.
#> Tip: check model params for provider/API version.
#> # A tibble: 2 × 5
#> structured_ok structured_valid title year tags
#> <lgl> <lgl> <chr> <chr> <chr>
#> 1 FALSE FALSE <NA> <NA> <NA>
#> 2 FALSE FALSE <NA> <NA> <NA>
If your key is set, you should see structured_ok = TRUE
,
structured_valid = TRUE
, plus parsed columns. (Tip: if
you see a 400 complaining about required
, add
all properties to required
, as
above.)
max_tokens
)safe({
library(LLMR)
schema <- list(
type="object",
properties=list(answer=list(type="string"), confidence=list(type="number")),
required=list("answer","confidence"),
additionalProperties=FALSE
)
cfg <- llm_config("anthropic","claude-3-7", temperature = 0)
cfg <- enable_structured_output(cfg, schema = schema, name = "llmr_schema")
res <- call_llm(cfg, c(
system = "Return only the tool result that matches the schema.",
user = "Answer: capital of Japan; include confidence in [0,1]."
))
parsed <- llm_parse_structured(res)
str(parsed)
})
#> Warning in call_llm.anthropic(cfg, c(system = "Return only the tool result that
#> matches the schema.", : Anthropic requires max_tokens; setting it at 2048.
#> ERROR: LLM API request failed.
#> HTTP status: 404
#> Reason: model: claude-3-7
#> Tip: check model params for provider/API version.
#> NULL
Anthropic requires
max_tokens
; LLMR warns and defaults if you omit it.
safe({
library(LLMR)
cfg <- llm_config(
"gemini", "gemini-2.0-flash",
response_mime_type = "application/json" # ask for JSON back
# Optionally: gemini_enable_response_schema = TRUE, response_schema = <your JSON Schema>
)
res <- call_llm(cfg, c(
system = "Reply as JSON only.",
user = "Produce fields name and score about 'MNIST'."
))
str(llm_parse_structured(res))
})
#> List of 1
#> $ :List of 2
#> ..$ name : chr "MNIST"
#> ..$ score: chr "99.6"
safe({
library(LLMR); library(tibble)
messy <- c(
'```json\n{"x": 1, "y": [1,2,3]}\n```',
'Sure! Here is JSON: {"x":"1","y":"oops"} trailing words',
'{"x":1, "y":[2,3,4]}'
)
tibble(response_text = messy) |>
llm_parse_structured_col(
fields = c(x = "x", y = "/y/0") # dot/bracket or JSON Pointer
) |>
print(n = Inf)
})
#> # A tibble: 3 × 5
#> response_text structured_ok structured_data x y
#> <chr> <lgl> <list> <dbl> <dbl>
#> 1 "```json\n{\"x\": 1, \"y\": [1,2,3]… TRUE <named list> 1 1
#> 2 "Sure! Here is JSON: {\"x\":\"1\",\… TRUE <named list> 1 NA
#> 3 "{\"x\":1, \"y\":[2,3,4]}" TRUE <named list> 1 2
Why this helps Works when outputs arrive fenced,
with pre/post text, or when arrays sneak in. Non-scalars become
list-columns (set allow_list = FALSE
to force scalars
only).
enable_structured_output()
and run
llm_parse_structured()
+ local validation.input_schema
: https://docs.anthropic.com/en/api/messages#body-tool-choice