sciwing.numericalizer¶
numericalizer¶
-
class
sciwing.numericalizers.numericalizer.
Numericalizer
(vocabulary: sciwing.vocab.vocab.Vocab = None)¶ Bases:
sciwing.numericalizers.base_numericalizer.BaseNumericalizer
-
__init__
(vocabulary: sciwing.vocab.vocab.Vocab = None)¶ Numericalizer converts tokens that are strings to numbers
Parameters: vocabulary (Vocab) – A vocabulary object that is built using a set of tokenized strings
-
get_mask_for_batch_instances
(instances: List[List[int]]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f32f3092550>¶
-
get_mask_for_instance
(instance: List[int]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f32f3092510>¶
-
numericalize_batch_instances
(instances: List[List[str]]) → List[List[int]]¶ Numericalizes a batch of instances
Parameters: instances (List[List[str]]) – A list of tokenized sentences Returns: A list of numericalized instances Return type: List[List[int]]
-
numericalize_instance
(instance: List[str]) → List[int]¶ Numericalize a single instance
Parameters: instance (List[str]) – An instance is a list of tokens Returns: Numericalized instance Return type: List[int]
-
pad_batch_instances
(instances: List[List[int]], max_length: int, add_start_end_token: bool = True) → List[List[int]]¶ Pads a batch of instances according to the vocab object
Parameters: - instances (List[List[int]]) –
- max_length (int) –
- add_start_end_token (int) –
Returns: Return type: List[List[int]]
-
pad_instance
(numericalized_text: List[int], max_length: int, add_start_end_token: bool = True) → List[int]¶ Pads the instance according to the vocab object
Parameters: - numericalized_text (List[int]) – Pads a numericalized instance
- max_length (int) – The maximum length to pad to
- add_start_end_token (bool) – If true, start and end token will be added to the tokenized text
Returns: Padded instance
Return type: List[int]
-
vocabulary
¶
-
transformer_numericalizer¶
-
class
sciwing.numericalizers.transformer_numericalizer.
NumericalizerForTransformer
(vocab: sciwing.vocab.vocab.Vocab = None, tokenizer: sciwing.tokenizers.bert_tokenizer.TokenizerForBert = None)¶ Bases:
sciwing.numericalizers.base_numericalizer.BaseNumericalizer
-
get_mask_for_batch_instances
(instances: List[List[int]])¶
-
get_mask_for_instance
(instance: List[int])¶
-
numericalize_batch_instances
(instances: List[List[str]]) → List[int]¶
-
numericalize_instance
(instance: Union[List[str], List[sciwing.data.token.Token]]) → List[int]¶
-
pad_batch_instances
(instances: List[List[int]], max_length: int, add_start_end_token: bool = True)¶ Pads a batch of instances according to the vocab object
Parameters: - instances (List[List[int]]) –
- max_length (int) –
- add_start_end_token (int) –
Returns: Return type: List[List[int]]
-
pad_instance
(numericalized_text: List[int], max_length: int, add_start_end_token: bool = True) → List[int]¶ Pads the instance according to the vocab object
Parameters: - numericalized_text (List[int]) – Pads a numericalized instance
- max_length (int) – The maximum length to pad to
- add_start_end_token (bool) – If true, start and end token will be added to the tokenized text
Returns: Padded instance
Return type: List[int]
-