Type: | Package |
Title: | Construct Consistent Time Series from Textual Data |
Version: | 0.1.3 |
Date: | 2023-11-27 |
Description: | A rolling version of the Latent Dirichlet Allocation, see Rieger et al. (2021) <doi:10.18653/v1/2021.findings-emnlp.201>. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks. |
URL: | https://github.com/JonasRieger/rollinglda |
BugReports: | https://github.com/JonasRieger/rollinglda/issues |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
Depends: | R (≥ 4.0.0), ldaPrototype (≥ 0.3.0) |
Imports: | checkmate (≥ 1.8.5), data.table (≥ 1.11.2), lubridate, stats, tosca (≥ 0.2-0), utils |
Suggests: | covr, testthat |
RoxygenNote: | 7.2.0 |
LazyData: | true |
NeedsCompilation: | yes |
Packaged: | 2023-11-27 12:26:31 UTC; riege |
Author: | Jonas Rieger |
Maintainer: | Jonas Rieger <jonas.rieger@tu-dortmund.de> |
Repository: | CRAN |
Date/Publication: | 2023-11-28 07:00:05 UTC |
rollinglda: Construct Consistent Time Series from Textual Data
Description
RollingLDA is a rolling version of the Latent Dirichlet
Allocation (LDA). By a sequential approach, it enables the construction of
LDA-based time series of topics that are consistent with previous states of
LDA models. After an initial modeling, updates can be computed efficiently,
allowing for real-time monitoring and detection of events or structural breaks.
For bug reports and feature requests please use the issue tracker:
https://github.com/JonasRieger/rollinglda/issues. Also have a look at
the (detailed) example at https://github.com/JonasRieger/rollinglda.
Data
economy
Example Dataset (576 articles from Wikinews) for testing.
Constructor
as.RollingLDA
RollingLDA objects used in this package.
Getter
getChunks
Getter for RollingLDA
objects.
Modeling
RollingLDA
Performing the method from scratch.
updateRollingLDA
Performing updates on RollingLDA
objects.
Author(s)
Maintainer: Jonas Rieger jonas.rieger@tu-dortmund.de (ORCID)
References
Rieger, Jonas, Carsten Jentsch and Jörg Rahnenführer (2021). "RollingLDA: An Update Algorithm of Latent Dirichlet Allocation to Construct Consistent Time Series from Textual Data". EMNLP Findings 2021. URL doi:10.18653/v1/2021.findings-emnlp.201.
See Also
Useful links:
Report bugs at https://github.com/JonasRieger/rollinglda/issues
RollingLDA
Description
Performs a rolling version of Latent Dirichlet Allocation.
Usage
RollingLDA(...)
## Default S3 method:
RollingLDA(
texts,
dates,
chunks,
memory,
vocab.abs = 5L,
vocab.rel = 0,
vocab.fallback = 100L,
doc.abs = 0L,
memory.fallback = 0L,
init,
type = c("ldaprototype", "lda"),
id,
...
)
Arguments
... |
additional arguments passed to |
texts |
[ |
dates |
[ |
chunks |
[ |
memory |
[ |
vocab.abs |
[ |
vocab.rel |
[0,1] |
vocab.fallback |
[ |
doc.abs |
[ |
memory.fallback |
[ |
init |
[ |
type |
[ |
id |
[ |
Details
The function first computes a initial LDA model (using
LDARep
or LDAPrototype
).
Afterwards it models temporal chunks of texts with a specified memory for
initialization of each model chunk.
The function returns a RollingLDA
object. You can receive results and
all other elements of this object with getter functions (see getChunks
).
Value
[named list
] with entries
id
[
character(1)
] See above.lda
LDA
object of the fitted RollingLDA.docs
[
named list
] with modeled texts in a preprocessed format. SeeLDAprep
.dates
[
named Date
] with dates of the modeled texts.vocab
[
character
] with the vocabularies considered for modeling.chunks
[
data.table
] with specifications for each model chunk.param
[
named list
] with parameter specifications forvocab.abs
[integer(1)
],vocab.rel
[0,1],vocab.fallback
[integer(1)
] anddoc.abs
[integer(1)
]. See above for explanation.
See Also
Other RollingLDA functions:
as.RollingLDA()
,
getChunks()
,
updateRollingLDA()
Examples
roll_lda = RollingLDA(texts = economy_texts,
dates = economy_dates,
chunks = "quarter",
memory = "3 quarter",
init = "2008-07-03",
K = 10,
type = "lda")
roll_lda
getChunks(roll_lda)
getLDA(roll_lda)
roll_proto = RollingLDA(texts = economy_texts,
dates = economy_dates,
chunks = "quarter",
memory = "3 quarter",
init = "2007-07-03",
K = 10,
n = 12,
pm.backend = "socket",
ncpus = 2)
roll_proto
getChunks(roll_proto)
getLDA(roll_proto)
RollingLDA Object
Description
Constructor for RollingLDA objects used in this package.
The function may be useful to create a RollingLDA object out of a standard
LDA
object to use it as initial model and
update it using updateRollingLDA
.
Usage
as.RollingLDA(x, id, lda, docs, dates, vocab, chunks, param)
is.RollingLDA(obj, verbose = FALSE)
Arguments
x |
[ |
id |
[ |
lda |
[ |
docs |
[ |
dates |
[ |
vocab |
[ |
chunks |
[
If not passed, |
param |
[ |
obj |
[ |
verbose |
[ |
Details
If you call as.RollingLDA
on an object x
which already is of
the structure of an RollingLDA
object (in particular a RollingLDA
object itself), the additional arguments id, param, ...
may be used to override the specific elements.
Value
[named list
] RollingLDA
object.
See Also
Other RollingLDA functions:
RollingLDA()
,
getChunks()
,
updateRollingLDA()
Examples
roll_lda = RollingLDA(texts = economy_texts,
dates = economy_dates,
chunks = "quarter",
memory = "3 quarter",
init = "2008-07-03",
K = 10,
type = "lda")
is.RollingLDA(roll_lda, verbose = TRUE)
getID(roll_lda)
roll_lda = as.RollingLDA(roll_lda, id = "newID")
getID(roll_lda)
A Snippet of the Economy Dataset from toscaData
Description
Example Dataset from Wikinews consisting of 576 articles. It can be used to familiarize with the functions offered by this package.
Usage
data(economy_texts)
data(economy_dates)
Format
economy_texts
is a named list of tokenized texts of length 576.
economy_dates
is
An object of class Date
of length 576.
Source
https://github.com/Docma-TU/toscaData
Getter for RollingLDA
Description
Returns the corresponding element of a RollingLDA
object.
Usage
getChunks(x)
getNames(x)
getDates(x, names, inverse)
getDocs(x, names, inverse)
getVocab(x)
## S3 method for class 'RollingLDA'
getLDA(x, job, reduce, all)
## S3 method for class 'RollingLDA'
getID(x)
## S3 method for class 'RollingLDA'
getParam(x)
Arguments
x |
[ |
names |
[ |
inverse |
[ |
job |
not implemented for |
reduce |
not implemented for |
all |
not implemented for |
Value
The requested element of a RollingLDA
object.
See Also
Other RollingLDA functions:
RollingLDA()
,
as.RollingLDA()
,
updateRollingLDA()
Updating an existing RollingLDA object
Description
Performs an update of an existing object consisting of a rolling version of Latent Dirichlet Allocation.
Usage
updateRollingLDA(
x,
texts,
dates,
chunks,
memory,
param = getParam(x),
compute.topics = TRUE,
memory.fallback = 0L,
...
)
## S3 method for class 'RollingLDA'
RollingLDA(
x,
texts,
dates,
chunks,
memory,
param = getParam(x),
compute.topics = TRUE,
memory.fallback = 0L,
...
)
Arguments
x |
[ |
texts |
[ |
dates |
[ |
chunks |
[ |
memory |
[ |
param |
[
|
compute.topics |
[ |
memory.fallback |
[ |
... |
not implemented |
Details
The function uses an existing RollingLDA
object and
models new texts with a specified memory as initialization of the new LDA chunk.
The function returns a RollingLDA
object. You can receive results and
all other elements of this object with getter functions (see getChunks
).
Value
[named list
] with entries
id
[
character(1)
] See above.lda
LDA
object of the fitted RollingLDA.docs
[
named list
] with modeled texts in a preprocessed format. SeeLDAprep
dates
[
named Date
] with dates of the modeled texts.vocab
[
character
] with the vocabularies considered for modeling.chunks
[
data.table
] with specifications for each model chunk.param
[
named list
] with parameter specifications forvocab.abs
[integer(1)
],vocab.rel
[0,1],vocab.fallback
[integer(1)
] anddoc.abs
[integer(1)
]. See above for explanation.
See Also
Other RollingLDA functions:
RollingLDA()
,
as.RollingLDA()
,
getChunks()
Examples
roll_lda = RollingLDA(texts = economy_texts[economy_dates < "2008-05-01"],
dates = economy_dates[economy_dates < "2008-05-01"],
chunks = "month",
memory = "month",
init = 100,
K = 10,
type = "lda")
# updateRollingLDA = RollingLDA, if first argument is a RollingLDA object
roll_update = RollingLDA(roll_lda,
texts = economy_texts[economy_dates >= "2008-05-01"],
dates = economy_dates[economy_dates >= "2008-05-01"],
chunks = "month",
memory = "month")
roll_update
getChunks(roll_update)