Class textmodel.TextHash
textmodel.TextHash This is the class for the TextHash document model.
It is used in the samples as a way of providing information to the {@doubletree.DoubleTree} to create its {@doubletree.Trie} data structure. It is not a necessary component of DoubleTree — all that really matters is that the relevant {@doubletree.Trie}s get created. For example, instead of TextHash, a database query might provide the input to {@doubletree.Trie}.
The TextHash maintains a hash of the data items where the keys are based on the distinguishing fields. The data parameters, except for baseField, should be the same as those used in the DoubleTree that will visualize the data.
"Items" are types, which are associated with token ids. Some functions expect an full item as an argument. Others expect a key as an argument. #itemToIndex converts a full item to a key.
Defined in: TextHash.js.
Constructor Attributes | Constructor Name and Description |
---|---|
textmodel.TextHash(string, caseSensitive, fldNames, fldDelim, distinguishingFldsArray, baseField, useRecords)
|
Field Attributes | Field Name and Description |
---|---|
The number of tokens (readOnly)
|
|
The number of types (readOnly)
|
Method Attributes | Method Name and Description |
---|---|
containsIndex(item)
is the item key in the model
|
|
containsItem(item)
is the item in the model
|
|
fromJSON(obj)
make this TextHash have the values of a (previously saved) TextHash JSON object
|
|
getItem(item, contextLen, includeOnly, itemIsRegex, contextFilters, maxRandomHits, puncToExclude)
get the information associated with an item
The information is an array of preceding items, an array of matching items, and an array of following items, where the preceding and following items are of length contextLen. |
|
getItemContext(item, contextLen, id, itemIsRegex)
get a string of context around a single hit
|
|
getItems(regex, contextLen, includeOnly, contextFilters, maxRandomHits, puncToExclude)
Convenience function for textmodel.TextHash#getItem with isRegex=true
|
|
get the unique item keys in the model
|
|
get the unique item keys in the model, each followed by tab and its token count
|
|
itemToIndex(item)
convert a full item to its key form
|
- Parameters:
- string
- the input string, where each item is separated by whitespace.
- caseSensitive
- is the comparison of the baseField case sensitive or not
- fldNames
- the names of the fields in the data items
- fldDelim
- the field delimter in the data items. Note: it cannot be a whitespace (e.g. tab), since whitespace is used to delimit items
- distinguishingFldsArray
- the fields that determine identity
- baseField
- the primary field for comparison and display (typically token or lemma, but also possibly part of speech)
- useRecords
- blank lines are treated as delimiting units (records) in the text. Default is false
- Parameters:
- item
- a key (not a full item)
- Returns:
- true if the item key is in the model, false otherwise
- Parameters:
- item
- a full item (not a key)
- Returns:
- true if the item is in the model, false otherwise
- Parameters:
- obj
- the TextHash JSON object
The information is an array of preceding items, an array of matching items, and an array of following items, where the preceding and following items are of length contextLen.
- Parameters:
- item
- a key (not a full item)
- contextLen
- the length of the preceding and following context to be returned
- includeOnly
- an object where the keys are the item ids to be included (optional)
- itemIsRegex
- true if the item parameter should be considered as a regular expression instead of as a true item
- contextFilters
- an object with "include" OR "exclude" which have objects whose keys are fields and whose values are arrays of values of those fields e.g. {"include":{"POS":["NN","NNS"]}} would include in the context only those items whose POS is NN or NNS. Similarly, "leftEnd" and "rtEnd" (both are possible together) indicate properties determining the left and right end points of the context, excluding those elements. So {"leftEnd":{"POS":["SENT"]}, "rtEnd":{"POS":["MD"]}} would include in the left context elements up to but not including the first SENT POS, and in the right context elements up to but not including the first MD POS.
- maxRandomHits
- how many random hits to return. -1 or null to return all
- puncToExclude
- a string of punctuation to exclude from the base field (will override any punctuation allowed via "include" in contextFilters). Default is null (i.e. include all punctuation)
- Returns:
- array of [array of prefixes, array of item, array of suffixes, array of ids]
- Parameters:
- item
- the item key (not a full form)
- contextLen
- the length of the preceding and following context to include
- id
- the id of the hit to return
- itemIsRegex
- true if the item parameter should be considered as a regular expression instead of as a true item (This should match the value in the original query)
- Returns:
- string of context around the item with id, including the item itself
- Parameters:
- regex
- contextLen
- includeOnly
- contextFilters
- maxRandomHits
- puncToExclude
- Returns:
- a sorted (case insensitive) array of item keys
- Returns:
- a sorted (case insensitive) array of item keys with their token counts
- Parameters:
- item
- the full item
- Returns:
- the key form of the item