text_token {ttgsea}R Documentation

Tokenizing text

Description

An n-gram is used for tokenization. This function can also be used to limit the total number of tokens.

Usage

text_token(text, ngram_min = 1, ngram_max = 1, num_tokens)

Arguments

text

text data

ngram_min

minimum size of an n-gram (default: 1)

ngram_max

maximum size of an n-gram (default: 1)

num_tokens

maximum number of tokens

Value

token

result of tokenizing text

ngram_min

minimum size of an n-gram

ngram_max

maximum size of an n-gram

Author(s)

Dongmin Jung

See Also

tm::removeWords, stopwords::stopwords, textstem::lemmatize_strings, text2vec::create_vocabulary, text2vec::prune_vocabulary

Examples

library(fgsea)
data(examplePathways)
data(exampleRanks)
names(examplePathways) <- gsub("_", " ",
                          substr(names(examplePathways), 9, 1000))
set.seed(1)
fgseaRes <- fgsea(examplePathways, exampleRanks)
tokens <- text_token(data.frame(fgseaRes)[,"pathway"],
          num_tokens = 1000)

[Package ttgsea version 1.0.0 Index]