sciwing.modules¶

sciwing.modules.embedders

bow_encoder¶

class sciwing.modules.bow_encoder.BOW_Encoder(embedder=None, dropout_value: float = 0, aggregation_type='sum', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3f90>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(embedder=None, dropout_value: float = 0, aggregation_type='sum', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3f90>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶

Bag of Words Encoder

Parameters:

embedder (nn.Module) – Any embedder that you would want to use
dropout_value (float) – The input dropout value that you would want to use
aggregation_type (str) –

The strategy for aggregating words

sum

Aggregate word embedding by summing them

average

Aggregate word embedding by averaging them
device (Union[torch.device, str]) – The device where the embeddings are stored

forward(lines: List[sciwing.data.line.Line]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c90d0>¶

Parameters:	lines (Dict[str, Any]) – The iter_dict returned by a dataset
Returns:	The bag of words encoded embedding either average or summed The size is [batch_size, embedding_dimension]
Return type:	torch.FloatTensor

charlstm_encoder¶

class sciwing.modules.charlstm_encoder.CharLSTMEncoder(char_embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5050>, char_emb_dim: int, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5090> = <sphinx.ext.autodoc.importer._MockObject object>)¶

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(char_embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5050>, char_emb_dim: int, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5090> = <sphinx.ext.autodoc.importer._MockObject object>)¶

Encodes character tokens using lstms

Parameters:

char_embedder (nn.Module) – An embedder that embeds character tokens
char_emb_dim (int) – The embedding of characters
hidden_dim (int) – Hidden dimension of the LSTM
bidirectional (bool) – Should the LSTM be bi-directional
combine_strategy (str) – Combine strategy for the lstm hidden dimensions
device (torch.device("cpu)) – The device on which the lstm will run

forward(iter_dict: Dict[str, Any])¶

Parameters:	iter_dict (Dict[str, Any]) – expects char_tokens to be present in the `iter_dict` from any dataset
Returns:	`[batch_size, num_time_steps, hidden_dim]` The hidden dimension is the hidden dimension of the LSTM if it is bidirectional and concat then `hidden_dim` will be 2 * self.hidden_dim
Return type:	torch.Tensor

lstm2seqencoder¶

class sciwing.modules.lstm2seqencoder.Lstm2SeqEncoder(embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9990>, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, num_layers: int = 1, combine_strategy: str = 'concat', rnn_bias: bool = False, device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9a50> = <sphinx.ext.autodoc.importer._MockObject object>, add_projection_layer: bool = True, projection_activation: str = 'Tanh')¶

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9990>, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, num_layers: int = 1, combine_strategy: str = 'concat', rnn_bias: bool = False, device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9a50> = <sphinx.ext.autodoc.importer._MockObject object>, add_projection_layer: bool = True, projection_activation: str = 'Tanh')¶

Encodes a set of tokens to a set of hidden states.

Parameters:

embedder (nn.Module) – Any embedder can be used for this purpose
dropout_value (float) – The dropout value for the embedding
hidden_dim (int) – The hidden dimensions for the LSTM
bidirectional (bool) – Whether the LSTM is bidirectional
num_layers (int) – The number of layers of the LSTM
combine_strategy (str) –
The strategy to combine the different layers of the LSTM This can be one of

sum

Sum the different layers of the embedding

concat

Concat the layers of the embedding
rnn_bias (bool) – Set this to false only for debugging purposes
device (torch.device) –
add_projection_layer (bool) – Adds a projection layer after the lstm over the hidden activation
projection_activation (str) – Refer to torch.nn activations. Use any class name as a projection here

forward(lines: List[sciwing.data.line.Line], c0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9bd0> = None, h0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9c10> = None) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9810>¶

Parameters:	lines (List[Line]) – A list of lines c0 (torch.FloatTensor) – The initial state vector for the LSTM h0 (torch.FloatTensor) – The initial hidden state for the LSTM
Returns:	Returns the vector encoding of the set of instances [batch_size, seq_len, hidden_dim] if single direction [batch_size, seq_len, 2*hidden_dim] if bidirectional
Return type:	torch.Tensor

get_initial_hidden(batch_size: int)¶

lstm2vecencoder¶

class sciwing.modules.lstm2vecencoder.LSTM2VecEncoder(embedder, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', rnn_bias: bool = True, device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9490>] = <sphinx.ext.autodoc.importer._MockObject object>)¶

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(embedder, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', rnn_bias: bool = True, device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9490>] = <sphinx.ext.autodoc.importer._MockObject object>)¶

LSTM2Vec encoder that encodes a series of tokens to a single vector representation

Parameters:

embedder (nn.Module) – Any embedder can be passed
dropout_value (float) – The dropout value for input embeddings
hidden_dim (int) – The hidden dimension for the LSTM
bidirectional (bool) – Whether the LSTM is bidirectional or no
combine_strategy (str) – Strategy to combine the vectors from two different directions
rnn_bias (str) – Whether to use the bias layer in RNN. Should be set to false only for debugging purposes
device (Union[str, torch.device]) – The device on which the model is run

forward(lines: List[sciwing.data.line.Line], c0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9290> = None, h0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c91d0> = None) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c96d0>¶

Parameters:	lines (List[Line]) – A list of lines to be encoder c0 (torch.FloatTensor) – The initial state vector for the LSTM h0 (torch.FloatTensor) – The initial hidden state for the LSTM
Returns:	Returns the vector encoding of the set of instances [batch_size, hidden_dim] if single direction [batch_size, 2*hidden_dim] if bidirectional
Return type:	torch.Tensor

get_initial_hidden(batch_size: int)¶

Gets the initial hidden states of the LSTM2Vec encoder

Parameters:	batch_size (int) – The batch size of the current forward pass
Returns:
Return type:	torch.Tensor, torch.Tensor