README

The Fill-Mask Association Test (FMAT) is an integrative and probability-based method using BERT Models to measure conceptual associations (e.g., attitudes, biases, stereotypes, social norms, cultural values) as propositions in natural language (Bao, 2024, JPSP).

⚠️ Please update this package to version ≥ 2025.4 for faster and more robust functionality.

Author

Citation

(1) FMAT Package

(2) FMAT Research Articles - Methodology

(3) FMAT Research Articles - Applications

Installation

Besides the R package FMAT, you also need to have a Python environment and install three Python packages (transformers, huggingface-hub, and torch).

(1) R Package

## Method 1: Install from CRAN
install.packages("FMAT")

## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/FMAT", force=TRUE)

(2) Python Environment and Packages

Install Anaconda (an environment/package manager that automatically installs Python, its IDEs like Spyder, and a large list of common Python packages).

Check Python packages installed and versions:
(with Terminal in RStudio or Command Prompt on Windows system)

You may install either the latest versions (with better support for modern models) or specific versions (with downloading progress bars).

Option 1: Install Latest Versions (with Better Support for Modern Models)

Option 2: Install Specific Versions (with Downloading Progress Bars)

Guidance for FMAT

Step 1: Download BERT Models

Use set_cache_folder() to change the default HuggingFace cache directory from “%USERPROFILE%/.cache/huggingface/hub” to another folder you like, so that all models would be downloaded and saved in that folder. Keep in mind: This function takes effect only for the current R session temporarily, so you should run this each time BEFORE you use other FMAT functions in an R session.

Use BERT_download() to download BERT models. A full list of BERT models are available at Hugging Face.

Use BERT_info() and BERT_vocab() to obtain detailed information of BERT models.

Step 2: Design FMAT Queries

Design queries that conceptually represent the constructs you would measure (see Bao, 2024, JPSP for how to design queries).

Use FMAT_query() and/or FMAT_query_bind() to prepare a data.table of queries.

Step 3: Run FMAT

Several steps of preprocessing have been included in the function for easier use (see FMAT_run() for details).

Notes

Guidance for GPU Acceleration

By default, the FMAT package uses CPU to enable the functionality for all users. But for advanced users who want to accelerate the pipeline with GPU, the FMAT_run() function supports using a GPU device.

BERT Models

Classic 12 English Models

The reliability and validity of the following 12 English BERT models for the FMAT have been established in our earlier research.

library(FMAT)
models = c(
  "bert-base-uncased",
  "bert-base-cased",
  "bert-large-uncased",
  "bert-large-cased",
  "distilbert-base-uncased",
  "distilbert-base-cased",
  "albert-base-v1",
  "albert-base-v2",
  "roberta-base",
  "distilroberta-base",
  "vinai/bertweet-base",
  "vinai/bertweet-large"
)
BERT_download(models)

(Tested 2024-05-16 on the developer’s computer: HP Probook 450 G10 Notebook PC)

General 30 English and 30 Chinese Models

We are using a more comprehensive list of 30 English BERT models and 30 Chinese BERT models in our ongoing and future projects.

library(FMAT)
set_cache_folder("G:/HuggingFace_Cache/")  # models saved in my portable SSD

## 30 English Models
models.en = c(
  # BERT (base/large/large-wwm, uncased/cased)
  "bert-base-uncased",
  "bert-base-cased",
  "bert-large-uncased",
  "bert-large-cased",
  "bert-large-uncased-whole-word-masking",
  "bert-large-cased-whole-word-masking",
  # ALBERT (base/large/xlarge, v1/v2)
  "albert-base-v1",
  "albert-base-v2",
  "albert-large-v1",
  "albert-large-v2",
  "albert-xlarge-v1",
  "albert-xlarge-v2",
  # DistilBERT (uncased/cased/distilroberta)
  "distilbert-base-uncased",
  "distilbert-base-cased",
  "distilroberta-base",
  # RoBERTa (roberta/muppet, base/large)
  "roberta-base",
  "roberta-large",
  "facebook/muppet-roberta-base",
  "facebook/muppet-roberta-large",
  # ELECTRA (base/large)
  "google/electra-base-generator",
  "google/electra-large-generator",
  # MobileBERT (uncased)
  "google/mobilebert-uncased",
  # ModernBERT (base/large)
  "answerdotai/ModernBERT-base",   # transformers >= 4.48.0
  "answerdotai/ModernBERT-large",  # transformers >= 4.48.0
  # [Tweets] (BERT/RoBERTa/BERTweet-base/BERTweet-large)
  "muhtasham/base-mlm-tweet",
  "cardiffnlp/twitter-roberta-base",
  "vinai/bertweet-base",
  "vinai/bertweet-large",
  # [PubMed Abstracts] (BiomedBERT, base/large)
  "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract",
  "microsoft/BiomedNLP-BiomedBERT-large-uncased-abstract"
)

## 30 Chinese Models
models.cn = c(
  # BERT [Google]
  "bert-base-chinese",
  # BERT [Alibaba-PAI] (base/ck-base/ck-large/ck-huge)
  "alibaba-pai/pai-bert-base-zh",
  "alibaba-pai/pai-ckbert-base-zh",
  "alibaba-pai/pai-ckbert-large-zh",
  "alibaba-pai/pai-ckbert-huge-zh",
  # BERT [HFL] (wwm, bert-wiki/bert-ext/roberta-ext)
  "hfl/chinese-bert-wwm",
  "hfl/chinese-bert-wwm-ext",
  "hfl/chinese-roberta-wwm-ext",
  # BERT [HFL] (lert/macbert/electra, base/large)
  "hfl/chinese-lert-base",
  "hfl/chinese-lert-large",
  "hfl/chinese-macbert-base",
  "hfl/chinese-macbert-large",
  "hfl/chinese-electra-180g-base-generator",
  "hfl/chinese-electra-180g-large-generator",
  # RoBERTa [UER] (H=512/768, L=6/8/10/12)
  "uer/chinese_roberta_L-6_H-512",
  "uer/chinese_roberta_L-8_H-512",
  "uer/chinese_roberta_L-10_H-512",
  "uer/chinese_roberta_L-12_H-512",
  "uer/chinese_roberta_L-6_H-768",
  "uer/chinese_roberta_L-8_H-768",
  "uer/chinese_roberta_L-10_H-768",
  "uer/chinese_roberta_L-12_H-768",
  # RoBERTa [UER] (wwm, base/large)
  "uer/roberta-base-wwm-chinese-cluecorpussmall",
  "uer/roberta-large-wwm-chinese-cluecorpussmall",
  # BERT [IDEA-CCNL] (MacBERT/TCBert-base/TCBert-large)
  "IDEA-CCNL/Erlangshen-MacBERT-325M-NLI-Chinese",
  "IDEA-CCNL/Erlangshen-TCBert-330M-Classification-Chinese",
  "IDEA-CCNL/Erlangshen-TCBert-330M-Sentence-Embedding-Chinese",
  # RoBERTa [IDEA-CCNL] (UniMC, base/large)
  "IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese",
  "IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinese",
  # MegatronBERT [IDEA-CCNL] (huge)
  "IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese"
)

BERT_info(models.en)
BERT_info(models.cn)

Information of the 30 English Models

Information of the 30 Chinese Models

Device Information

(Tested 2025-12-14 on the developer’s computer: HP Zbook X ZHAN99 G1i 16 inch - Intel Ultra9 285H - 64GB/2T - NVIDIA GeForce RTX 5060 Laptop GPU - Mobile Workstation PC)

Related Packages

While the FMAT is an innovative method for the computational intelligent analysis of psychology and society, you may also seek for an integrative toolbox for other text-analytic methods. Another R package I developed—PsychWordVec—is useful and user-friendly for word embedding analysis (e.g., the Word Embedding Association Test, WEAT). Please refer to its documentation and feel free to use it.