sciwing.datasets.classification¶
base_text_classification¶
-
class
sciwing.datasets.classification.base_text_classification.
BaseTextClassification
(filename: str, tokenizers: Dict[str, sciwing.tokenizers.BaseTokenizer.BaseTokenizer])¶ Bases:
object
-
__init__
(filename: str, tokenizers: Dict[str, sciwing.tokenizers.BaseTokenizer.BaseTokenizer])¶ Base Text Classification Dataset to be inherited by all text classification datasets
Parameters: - filename (str) – Full path of the filename where classification dataset is stored
- tokenizers (Dict[str, BaseTokenizer]) – The mapping between namespace and a tokenizer
-
get_lines_labels
() -> (typing.List[sciwing.data.line.Line], typing.List[sciwing.data.label.Label])¶ A list of lines from the file and a list of corresponding labels
This method is to be implemented by a new dataset. The decision on the implementation logic is left to the new class. Datasets come in all shapes and sizes.
Returns: Returns a list of text examples and corresponding labels Return type: (List[str], List[str])
-
text_classification_dataset¶
-
class
sciwing.datasets.classification.text_classification_dataset.
TextClassificationDataset
(filename: str, tokenizers: Dict[str, sciwing.tokenizers.BaseTokenizer.BaseTokenizer] = <sciwing.tokenizers.word_tokenizer.WordTokenizer object>)¶ Bases:
sciwing.datasets.classification.base_text_classification.BaseTextClassification
,sphinx.ext.autodoc.importer._MockObject
This represents a dataset that is of the form
line1###label1
line2###label2
line3###label3 . . .
-
get_lines_labels
() -> (typing.List[sciwing.data.line.Line], typing.List[sciwing.data.label.Label])¶ A list of lines from the file and a list of corresponding labels
This method is to be implemented by a new dataset. The decision on the implementation logic is left to the new class. Datasets come in all shapes and sizes.
Returns: Returns a list of text examples and corresponding labels Return type: (List[str], List[str])
-
labels
¶
-
lines
¶
-
-
class
sciwing.datasets.classification.text_classification_dataset.
TextClassificationDatasetManager
(train_filename: str, dev_filename: str, test_filename: str, tokenizers: Dict[str, sciwing.tokenizers.BaseTokenizer.BaseTokenizer] = None, namespace_vocab_options: Dict[str, Dict[str, Any]] = None, namespace_numericalizer_map: Dict[str, sciwing.numericalizers.base_numericalizer.BaseNumericalizer] = None, batch_size: int = 10)¶ Bases:
sciwing.data.datasets_manager.DatasetsManager
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(train_filename: str, dev_filename: str, test_filename: str, tokenizers: Dict[str, sciwing.tokenizers.BaseTokenizer.BaseTokenizer] = None, namespace_vocab_options: Dict[str, Dict[str, Any]] = None, namespace_numericalizer_map: Dict[str, sciwing.numericalizers.base_numericalizer.BaseNumericalizer] = None, batch_size: int = 10)¶ Parameters: - train_filename (str) – The path wehere the train file is stored
- dev_filename (str) – The path where the dev file is stored
- test_filename (str) – The path where the test file is stored
- tokenizers (Dict[str, BaseTokenizer]) – A mapping from namespace to the tokenizer
- namespace_vocab_options (Dict[str, Dict[str, Any]]) – A mapping from the name to options
- namespace_numericalizer_map (Dict[str, BaseNumericalizer]) – Every namespace can have a different numericalizer specified
- batch_size (int) – The batch size of the data returned
-