sciwing.modules.embedders¶
bert_embedder¶
-
class
sciwing.modules.embedders.bert_embedder.
BertEmbedder
(datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, dropout_value: float = 0.0, aggregation_type: str = 'sum', bert_type: str = 'bert-base-uncased', word_tokens_namespace='tokens', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3390>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.modules.embedders.base_embedders.BaseEmbedder
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, dropout_value: float = 0.0, aggregation_type: str = 'sum', bert_type: str = 'bert-base-uncased', word_tokens_namespace='tokens', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3390>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶ Bert Embedder that embeds the given instance to BERT embeddings
Parameters: - dropout_value (float) – The amount of dropout to be added after the embedding
- aggregation_type (str) –
The kind of aggregation of different layers. BERT produces representations from different layers. This specifies the strategy to aggregating them One of
- sum
- Sum the representations from all the layers
- average
- Average the representations from all the layers
- bert_type (type) –
The kind of BERT embedding to be used
- bert-base-uncased
- 12 layer transformer trained on lowercased vocab
- bert-large-uncased:
- 24 layer transformer trained on lowercased vocab
- bert-base-cased:
- 12 layer transformer trained on cased vocab
- bert-large-cased:
- 24 layer transformer train on cased vocab
- scibert-base-cased
- 12 layer transformer trained on scientific document on cased normal vocab
- scibert-sci-cased
- 12 layer transformer trained on scientific documents on cased scientifc vocab
- scibert-base-uncased
- 12 layer transformer trained on scientific docments on uncased normal vocab
- scibert-sci-uncased
- 12 layer transformer train on scientific documents on ncased scientific vocab
- word_tokens_namespace (str) – The namespace in the liens where the tokens are stored
- device (Union[torch.device, str]) – The device on which the model is run.
-
forward
(lines: List[sciwing.data.line.Line]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3610>¶ Parameters: lines (List[Line]) – A list of lines Returns: The bert embeddings for all the words in the instances The size of the returned embedding is [batch_size, max_len_word_tokens, emb_dim]
Return type: torch.Tensor
-
get_embedding_dimension
() → int¶
-
bow_elmo_embedder¶
-
class
sciwing.modules.embedders.bow_elmo_embedder.
BowElmoEmbedder
(datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, layer_aggregation: str = 'sum', device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154f8e50>] = <sphinx.ext.autodoc.importer._MockObject object>, word_tokens_namespace='tokens')¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.modules.embedders.base_embedders.BaseEmbedder
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, layer_aggregation: str = 'sum', device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154f8e50>] = <sphinx.ext.autodoc.importer._MockObject object>, word_tokens_namespace='tokens')¶ Bag of words Elmo Embedder which aggregates elmo embedding for every token
Parameters: - layer_aggregation (str) –
You can chose one of
[sum, average, last, first]
which decides how to aggregate different layers of ELMO. ELMO produces three layers of representations- sum
- Representations from different layers are summed
- average
- Representations from different layers are average
- last
- Representations from last layer is considered
- first
- Representations from first layer is considered
- device (Union[str, torch.device]) – device for running the model on
- word_tokens_namespace (int) – Namespace where all the word tokens are stored
- layer_aggregation (str) –
-
forward
(lines: List[sciwing.data.line.Line]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e1050>¶ Parameters: lines (List[Line]) – Just a list of lines Returns: Returns the representation for every token in the instance [batch_size, max_num_words, emb_dim]
. In case of Elmo theemb_dim
is 1024Return type: torch.Tensor
-
get_embedding_dimension
() → int¶
-
char_embedder¶
-
class
sciwing.modules.embedders.char_embedder.
CharEmbedder
(char_embedding_dimension: int, hidden_dimension: int, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, word_tokens_namespace: str = 'tokens', char_tokens_namespace: str = 'char_tokens', device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3d90>] = <sphinx.ext.autodoc.importer._MockObject object>)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.modules.embedders.base_embedders.BaseEmbedder
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(char_embedding_dimension: int, hidden_dimension: int, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, word_tokens_namespace: str = 'tokens', char_tokens_namespace: str = 'char_tokens', device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3d90>] = <sphinx.ext.autodoc.importer._MockObject object>)¶ This is a character embedder that takes in lines and collates the character embeddings for all the tokens in the lines.
Parameters: - char_embedding_dimension (int) – The dimension of the character embedding
- word_tokens_namespace (int) – The name space where the words are saved
- char_tokens_namespace (str) – The namespace where the character tokens are saved
- datasets_manager (DatasetsManager) – The dataset manager that handles all the datasets
- hidden_dimension (int) – The hidden dimension of the LSTM which will be used to get character embeddings
-
forward
(lines: List[sciwing.data.line.Line])¶
-
get_embedding_dimension
() → int¶
-
concat_embedders¶
-
class
sciwing.modules.embedders.concat_embedders.
ConcatEmbedders
(embedders: List[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154f8c10>], datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.modules.embedders.base_embedders.BaseEmbedder
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(embedders: List[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154f8c10>], datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None)¶ Concatenates a set of embedders into a single embedder.
Parameters: embedders (List[nn.Module]) – A list of embedders that can be concatenated
-
forward
(lines: List[sciwing.data.line.Line])¶ Parameters: lines (List[Line]) – A list of Lines. Returns: Returns the concatenated embedding that is of the size [batch_size, time_steps, embedding_dimension]
where theembedding_dimension
is after the concatenationReturn type: torch.FloatTensor
-
get_embedding_dimension
()¶
-
elmo_embedder¶
-
class
sciwing.modules.embedders.elmo_embedder.
ElmoEmbedder
(dropout_value: float = 0.5, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, word_tokens_namespace: str = 'tokens', device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154cd190> = <sphinx.ext.autodoc.importer._MockObject object>, fine_tune: bool = False)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.modules.embedders.base_embedders.BaseEmbedder
,sciwing.utils.class_nursery.ClassNursery
-
forward
(lines: List[sciwing.data.line.Line])¶
-
get_embedding_dimension
()¶
-
flair_embedder¶
-
class
sciwing.modules.embedders.flair_embedder.
FlairEmbedder
(embedding_type: str, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c13f55f50>] = 'cpu', word_tokens_namespace: str = 'tokens')¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.utils.class_nursery.ClassNursery
,sciwing.modules.embedders.base_embedders.BaseEmbedder
-
__init__
(embedding_type: str, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c13f55f50>] = 'cpu', word_tokens_namespace: str = 'tokens')¶ Flair Embeddings. This is used to produce Named Entity Recognition. Note: This only works if your tokens are produced by splitting based on white space
Parameters: - embedding_type –
- datasets_manager –
- device –
- word_tokens_namespace –
-
forward
(lines: List[sciwing.data.line.Line])¶
-
get_embedding_dimension
()¶
-
trainable_word_embedder¶
-
class
sciwing.modules.embedders.trainable_word_embedder.
TrainableWordEmbedder
(embedding_type: str, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, word_tokens_namespace: str = 'tokens', device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c14740410> = <sphinx.ext.autodoc.importer._MockObject object>)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.modules.embedders.base_embedders.BaseEmbedder
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(embedding_type: str, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, word_tokens_namespace: str = 'tokens', device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c14740410> = <sphinx.ext.autodoc.importer._MockObject object>)¶ This represents trainable word embeddings which are trained along with the parameters of the network. The embeddings in the class WordEmbedder are not trainable. They are static
Parameters: embedding_type (str) – The type of embedding that you would want - datasets_manager: DatasetsManager
- The datasets manager which is running your experiments
- word_tokens_namespace: str
- The namespace where the word tokens are stored in your data
- device: Union[torch.device, str]
- The device on which this embedder is run
-
forward
(lines: List[sciwing.data.line.Line]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c14740a50>¶
-
get_embedding_dimension
() → int¶
-
word_embedder¶
-
class
sciwing.modules.embedders.word_embedder.
WordEmbedder
(embedding_type: str, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, word_tokens_namespace='tokens', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154f8510>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.modules.embedders.base_embedders.BaseEmbedder
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(embedding_type: str, datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, word_tokens_namespace='tokens', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154f8510>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶ Word Embedder embeds the tokens using the desired embeddings. These are static embeddings.
Parameters: - embedding_type (str) – The type of embedding that you would want
- datasets_manager (DatasetsManager) – The datasets manager which is running your experiments
- word_tokens_namespace (str) – The namespace where the word tokens are stored in your data
- device (Union[torch.device, str]) – The device on which this embedder is run
-
forward
(lines: List[sciwing.data.line.Line]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154f8850>¶ This will only consider the “tokens” present in the line. The namespace for the tokens is set with the class instantiation
Parameters: lines (List[Line]) – Returns: It returns the embedding of the size [batch_size, max_num_timesteps, embedding_dimension]
Return type: torch.FloatTensor
-
get_embedding_dimension
() → int¶
-