Class textmodel.TextHash

textmodel.TextHash This is the class for the TextHash document model.

It is used in the samples as a way of providing information to the {@doubletree.DoubleTree} to create its {@doubletree.Trie} data structure. It is not a necessary component of DoubleTree — all that really matters is that the relevant {@doubletree.Trie}s get created. For example, instead of TextHash, a database query might provide the input to {@doubletree.Trie}.

The TextHash maintains a hash of the data items where the keys are based on the distinguishing fields. The data parameters, except for baseField, should be the same as those used in the DoubleTree that will visualize the data.

"Items" are types, which are associated with token ids. Some functions expect an full item as an argument. Others expect a key as an argument. #itemToIndex converts a full item to a key.

Defined in: TextHash.js.

Class Summary
Constructor Attributes	Constructor Name and Description
	textmodel.TextHash(string, caseSensitive, fldNames, fldDelim, distinguishingFldsArray, baseField, useRecords)

Field Summary
Field Attributes	Field Name and Description
	numTokens The number of tokens (readOnly)
	numTypes The number of types (readOnly)

Method Summary
Method Attributes	Method Name and Description
	containsIndex(item) is the item key in the model
	containsItem(item) is the item in the model
	fromJSON(obj) make this TextHash have the values of a (previously saved) TextHash JSON object
	getItem(item, contextLen, includeOnly, itemIsRegex, contextFilters, maxRandomHits, puncToExclude) get the information associated with an item The information is an array of preceding items, an array of matching items, and an array of following items, where the preceding and following items are of length contextLen.
	getItemContext(item, contextLen, id, itemIsRegex) get a string of context around a single hit
	getItems(regex, contextLen, includeOnly, contextFilters, maxRandomHits, puncToExclude) Convenience function for textmodel.TextHash#getItem with isRegex=true
	getUniqItems() get the unique item keys in the model
	getUniqItemsWithCounts() get the unique item keys in the model, each followed by tab and its token count
	itemToIndex(item) convert a full item to its key form

Class Detail

textmodel.TextHash(string, caseSensitive, fldNames, fldDelim, distinguishingFldsArray, baseField, useRecords)

Parameters:
string: the input string, where each item is separated by whitespace.
caseSensitive: is the comparison of the baseField case sensitive or not
fldNames: the names of the fields in the data items
fldDelim: the field delimter in the data items. Note: it cannot be a whitespace (e.g. tab), since whitespace is used to delimit items
distinguishingFldsArray: the fields that determine identity
baseField: the primary field for comparison and display (typically token or lemma, but also possibly part of speech)
useRecords: blank lines are treated as delimiting units (records) in the text. Default is false

Field Detail

{number} numTokens

The number of tokens (readOnly)

{number} numTypes

The number of types (readOnly)

Method Detail

containsIndex(item)

is the item key in the model

Parameters:
item: a key (not a full item)

Returns:: true if the item key is in the model, false otherwise

containsItem(item)

is the item in the model

Parameters:
item: a full item (not a key)

Returns:: true if the item is in the model, false otherwise

fromJSON(obj)

make this TextHash have the values of a (previously saved) TextHash JSON object

Parameters:
obj: the TextHash JSON object

getItem(item, contextLen, includeOnly, itemIsRegex, contextFilters, maxRandomHits, puncToExclude)

get the information associated with an item

The information is an array of preceding items, an array of matching items, and an array of following items, where the preceding and following items are of length contextLen.

Parameters:
item: a key (not a full item)
contextLen: the length of the preceding and following context to be returned
includeOnly: an object where the keys are the item ids to be included (optional)
itemIsRegex: true if the item parameter should be considered as a regular expression instead of as a true item
contextFilters: an object with "include" OR "exclude" which have objects whose keys are fields and whose values are arrays of values of those fields e.g. {"include":{"POS":["NN","NNS"]}} would include in the context only those items whose POS is NN or NNS. Similarly, "leftEnd" and "rtEnd" (both are possible together) indicate properties determining the left and right end points of the context, excluding those elements. So {"leftEnd":{"POS":["SENT"]}, "rtEnd":{"POS":["MD"]}} would include in the left context elements up to but not including the first SENT POS, and in the right context elements up to but not including the first MD POS.
maxRandomHits: how many random hits to return. -1 or null to return all
puncToExclude: a string of punctuation to exclude from the base field (will override any punctuation allowed via "include" in contextFilters). Default is null (i.e. include all punctuation)

Returns:: array of [array of prefixes, array of item, array of suffixes, array of ids]

getItemContext(item, contextLen, id, itemIsRegex)

get a string of context around a single hit

Parameters:
item: the item key (not a full form)
contextLen: the length of the preceding and following context to include
id: the id of the hit to return
itemIsRegex: true if the item parameter should be considered as a regular expression instead of as a true item (This should match the value in the original query)

Returns:: string of context around the item with id, including the item itself

getItems(regex, contextLen, includeOnly, contextFilters, maxRandomHits, puncToExclude)

Convenience function for textmodel.TextHash#getItem with isRegex=true

Parameters:
regex
contextLen
includeOnly
contextFilters
maxRandomHits
puncToExclude

getUniqItems()

get the unique item keys in the model

Returns:: a sorted (case insensitive) array of item keys

getUniqItemsWithCounts()

get the unique item keys in the model, each followed by tab and its token count

Returns:: a sorted (case insensitive) array of item keys with their token counts

itemToIndex(item)

convert a full item to its key form

Parameters:
item: the full item

Returns:: the key form of the item

Classes

Class textmodel.TextHash