Search
Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Division
Browse By Session Type
Search Tips
Personal Schedule
Sign In
Numerous good packages like quanteda, tm, and topicmodels exist for creating and analysing text using document-term matrices. However, the bag-of-words assumption makes it difficult to use word-order and syntactic features required to e.g. semantic network analysis and context-aware dictionaries. Morever, dropping the word order makes it difficult to relate findings back to the original documents.
We present corpustools, an R package that overcomes these limitations by using indexed lists of tokens (words) rather than a document-term matrix and that allows for better search and network analysis as well as for inspecting LDA results. Integration with tm and quanteda is provided.