Type: Package
Title: Import Human Gene Nomenclature
Version: 0.3.0
Description: A set of routines to quickly download and import the HUGO Gene Nomenclature Committee (HGNC) data set on mapping of gene symbols to gene entries in other genomic databases or resources.
License: MIT + file LICENSE
URL: https://github.com/patterninstitute/hgnc, https://www.pattern.institute/hgnc/
BugReports: https://github.com/patterninstitute/hgnc/issues
Encoding: UTF-8
RoxygenNote: 7.3.2
Depends: R (≥ 4.2.0)
Imports: cli, dplyr, httr2, memoise, prettyunits, purrr, readr, stringr, tibble
Suggests: lubridate, spelling, tidyr
Language: en-US
Config/Needs/website: patterninstitute/chic, rmarkdown
NeedsCompilation: no
Packaged: 2025-06-18 00:08:18 UTC; rmagno
Author: Ramiro Magno ORCID iD [aut, cre], Isabel Duarte ORCID iD [aut], Jacob Munro ORCID iD [aut], Ana-Teresa Maia ORCID iD [ctb], Pattern Institute ROR ID [cph, fnd]
Maintainer: Ramiro Magno <rmagno@pattern.institute>
Repository: CRAN
Date/Publication: 2025-06-18 02:10:02 UTC

hgnc: Import Human Gene Nomenclature

Description

A set of routines to quickly download and import the HUGO Gene Nomenclature Committee (HGNC) data set on mapping of gene symbols to gene entries in other genomic databases or resources.

Author(s)

Maintainer: Ramiro Magno rmagno@pattern.institute (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Convert an HGNC value to another

Description

crosswalk() will convert values found in one of the columns of an HGNC gene data set to values in another.

Usage

crosswalk(value, from, to = from, hgnc_dataset = import_hgnc_dataset())

Arguments

value

A character vector of values to be matched in the from column. These values must match once and only once in the from column values, otherwise NA is returned.

from

The name of the column in the HGNC gene data set (hgnc_dataset) where values passed in value are used as queries.

to

The name of the column whose values are to be returned, corresponding to matches in the from column.

hgnc_dataset

A data frame corresponding to a HGNC gene data set. Typically, you'd get hold of a HGNC gene data set with import_hgnc_dataset(). For testing purposes and an offline solution, you may use alternatively the function hgnc_dataset_example() providing a subset.

Examples

## Not run: 
# Map a gene symbol to its HUGO identifier.
crosswalk(value = "A1BG", from = "symbol", to = "hgnc_id")

# If `from` and `to` refer to the same column, `crosswalk()` will filter
# out unmatched values by converting them to `NA`.
crosswalk(value = c("A1BG", "Not a gene"), from = "symbol", to = "symbol")

# This is the default behavior, so you can simply call:
crosswalk(value = c("A1BG", "Not a gene"), from = "symbol")

## End(Not run)


Download the human gene nomenclature dataset

Description

download_hgnc_dataset() gets the latest HGNC approved data set.

Usage

download_hgnc_dataset(
  url = latest_archive_url(),
  path = getwd(),
  filename = basename(url)
)

Arguments

url

A character string naming the URL of the HGNC dataset. It defaults to the latest available archive.

path

The directory path where the downloaded file is to be saved. By default, this value is NULL and the data is imported directly into memory without saving into disk.

filename

A character string with the name of the saved file. By default, this is inferred from the last part of the URL.

Value

The path to the saved file.


Filter genes by keyword

Description

Filter the HGNC data set by a keyword (or a regex) to be looked up in the columns containing gene names or symbols. By default, it will look up in symbol, name, alias_symbol, alias_name, prev_symbol and prev_name. Note that this function dives into list-columns for matching and returns a gene entry if at least one of the strings matches the keyword.

Usage

filter_by_keyword(
  tbl,
  keyword,
  cols = c("symbol", "name", "alias_symbol", "alias_name", "prev_symbol", "prev_name")
)

Arguments

tbl

A tibble containing the HGNC data set, typically obtained with import_hgnc_dataset().

keyword

A keyword or a regular expression to be used as search criterion.

cols

Columns to be looked up.

Value

A tibble of the HGNC data set filtered by observations matching the keyword.

Examples

## Not run: 
# Start by retrieving the HGNC data set
hgnc_tbl <- import_hgnc_dataset()

# Search for entries containing "TP53" in the HGNC data set
hgnc_tbl |>
  filter_by_keyword('TP53') |>
  dplyr::select(1:4)

# The same as above but restrict the search to the `symbol` column
hgnc_tbl |>
  filter_by_keyword('TP53', cols = 'symbol') |>
  dplyr::select(1:4)

# Match "TP53" exactly in the `symbol` column
hgnc_tbl |>
  filter_by_keyword('^TP53$', cols = 'symbol') |>
  dplyr::select(1:4)

# `filter_by_keyword()` is vectorised over `keyword`
hgnc_tbl |>
  filter_by_keyword(c('^TP53$', '^PIK3CA$'), cols = 'symbol') |>
  dplyr::select(1:4)

## End(Not run)


Example HGNC data set

Description

hgnc_dataset_example() provides an example HGNC data set. This example contains only the first 10,000 gene entries of the HGNC data set dated of 2024-07-26 03:11:20.

This is mostly used in example code as it does not require internet connection.

Usage

hgnc_dataset_example()

Value

A tibble whose structure is documented in import_hgnc_dataset().

Examples

hgnc_dataset_example()


Import HGNC data

Description

import_hgnc_dataset() imports HGNC data into R. Specify a directory path in addition if you wish the save the data to disk.

Usage

import_hgnc_dataset(file = latest_archive_url())

Arguments

file

A file or URL of the complete HGNC data set (in TSV format). Use list_archives() to list previous versions of these data. Pass one of the URLs (column url) to file to import that specific version. By default the value of file is the URL corresponding to the latest version, i.e. the returned value of latest_archive_url().

Value

A tibble of the HGNC data set consisting of 55 columns:

Examples

## Not run: import_hgnc_dataset()


Last update of HGNC data set

Description

This function returns the date of the most recent update of the HGNC data set.

Usage

last_update()

Value

A POSIXct date-time object.

Examples

try(last_update())

Latest HGNC archive URL

Description

Latest HGNC archive URL

Usage

latest_archive_url()

Value

A string with the latest HGNC archive URL.

Examples

try(latest_archive_url())


Latest HGNC monthly URL

Description

Latest HGNC monthly URL

Usage

latest_monthly_url()

Value

A string with the latest HGNC monthly archive URL.

Examples

try(latest_monthly_url())


Latest HGNC quarterly URL

Description

Latest HGNC quarterly URL

Usage

latest_quarterly_url()

Value

A string with the latest HGNC monthly archive URL.

Examples

try(latest_quarterly_url())


List monthly or quarterly archives

Description

This function lists the monthly and quarterly archives currently available from the HGNC's Google Cloud Storage.

Usage

list_archives(release = c("monthly", "quarterly"))

Arguments

release

Series type: "monthly" or "quarterly".

Value

A tibble of available archives for download with the following columns:

Examples

try(list_archives())


Check if an Element Matches Exactly Once

Description

This function checks whether a specific element from vector x appears exactly once in vector y.

Usage

matches_once(x, y)

Arguments

x

A vector containing the elements to match.

y

A vector in which the elements from x will be matched.

Value

A logical vector of the same length as x, where each element is TRUE if it matches exactly once in y, and FALSE otherwise.

Examples

## Not run: 
x <- c(1, 2, 3)
y <- c(1, 1, 2, 4)
matches_once(x, y)
## End(Not run)


Count the Number of Matches for Each Element in a Vector

Description

This function counts how many times each element of vector x matches any element in vector y.

Usage

n_match(x, y)

Arguments

x

A vector of elements to be matched.

y

A vector in which the elements from x will be matched.

Value

An integer vector of the same length as x, where each element indicates the number of times it matches in y.

Examples

## Not run: 
x <- c(1, 2, 3)
y <- c(1, 1, 2, 4)
n_match(x, y)
## End(Not run)