sciwing.modules

bow_encoder

class sciwing.modules.bow_encoder.BOW_Encoder(embedder=None, dropout_value: float = 0, aggregation_type='sum', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3f90>, str] = <sphinx.ext.autodoc.importer._MockObject object>)

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(embedder=None, dropout_value: float = 0, aggregation_type='sum', device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154e3f90>, str] = <sphinx.ext.autodoc.importer._MockObject object>)

Bag of Words Encoder

Parameters:
  • embedder (nn.Module) – Any embedder that you would want to use
  • dropout_value (float) – The input dropout value that you would want to use
  • aggregation_type (str) –
    The strategy for aggregating words
    sum
    Aggregate word embedding by summing them
    average
    Aggregate word embedding by averaging them
  • device (Union[torch.device, str]) – The device where the embeddings are stored
forward(lines: List[sciwing.data.line.Line]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c90d0>
Parameters:lines (Dict[str, Any]) – The iter_dict returned by a dataset
Returns:The bag of words encoded embedding either average or summed The size is [batch_size, embedding_dimension]
Return type:torch.FloatTensor

charlstm_encoder

class sciwing.modules.charlstm_encoder.CharLSTMEncoder(char_embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5050>, char_emb_dim: int, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5090> = <sphinx.ext.autodoc.importer._MockObject object>)

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(char_embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5050>, char_emb_dim: int, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154d5090> = <sphinx.ext.autodoc.importer._MockObject object>)

Encodes character tokens using lstms

Parameters:
  • char_embedder (nn.Module) – An embedder that embeds character tokens
  • char_emb_dim (int) – The embedding of characters
  • hidden_dim (int) – Hidden dimension of the LSTM
  • bidirectional (bool) – Should the LSTM be bi-directional
  • combine_strategy (str) – Combine strategy for the lstm hidden dimensions
  • device (torch.device("cpu)) – The device on which the lstm will run
forward(iter_dict: Dict[str, Any])
Parameters:iter_dict (Dict[str, Any]) – expects char_tokens to be present in the iter_dict from any dataset
Returns:[batch_size, num_time_steps, hidden_dim] The hidden dimension is the hidden dimension of the LSTM if it is bidirectional and concat then hidden_dim will be 2 * self.hidden_dim
Return type:torch.Tensor

lstm2seqencoder

class sciwing.modules.lstm2seqencoder.Lstm2SeqEncoder(embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9990>, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, num_layers: int = 1, combine_strategy: str = 'concat', rnn_bias: bool = False, device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9a50> = <sphinx.ext.autodoc.importer._MockObject object>, add_projection_layer: bool = True, projection_activation: str = 'Tanh')

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(embedder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9990>, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, num_layers: int = 1, combine_strategy: str = 'concat', rnn_bias: bool = False, device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9a50> = <sphinx.ext.autodoc.importer._MockObject object>, add_projection_layer: bool = True, projection_activation: str = 'Tanh')

Encodes a set of tokens to a set of hidden states.

Parameters:
  • embedder (nn.Module) – Any embedder can be used for this purpose
  • dropout_value (float) – The dropout value for the embedding
  • hidden_dim (int) – The hidden dimensions for the LSTM
  • bidirectional (bool) – Whether the LSTM is bidirectional
  • num_layers (int) – The number of layers of the LSTM
  • combine_strategy (str) –

    The strategy to combine the different layers of the LSTM This can be one of

    sum
    Sum the different layers of the embedding
    concat
    Concat the layers of the embedding
  • rnn_bias (bool) – Set this to false only for debugging purposes
  • device (torch.device) –
  • add_projection_layer (bool) – Adds a projection layer after the lstm over the hidden activation
  • projection_activation (str) – Refer to torch.nn activations. Use any class name as a projection here
forward(lines: List[sciwing.data.line.Line], c0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9bd0> = None, h0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9c10> = None) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9810>
Parameters:
  • lines (List[Line]) – A list of lines
  • c0 (torch.FloatTensor) – The initial state vector for the LSTM
  • h0 (torch.FloatTensor) – The initial hidden state for the LSTM
Returns:

Returns the vector encoding of the set of instances [batch_size, seq_len, hidden_dim] if single direction [batch_size, seq_len, 2*hidden_dim] if bidirectional

Return type:

torch.Tensor

get_initial_hidden(batch_size: int)

lstm2vecencoder

class sciwing.modules.lstm2vecencoder.LSTM2VecEncoder(embedder, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', rnn_bias: bool = True, device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9490>] = <sphinx.ext.autodoc.importer._MockObject object>)

Bases: sphinx.ext.autodoc.importer._MockObject, sciwing.utils.class_nursery.ClassNursery

__init__(embedder, dropout_value: float = 0.0, hidden_dim: int = 1024, bidirectional: bool = False, combine_strategy: str = 'concat', rnn_bias: bool = True, device: Union[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9490>] = <sphinx.ext.autodoc.importer._MockObject object>)

LSTM2Vec encoder that encodes a series of tokens to a single vector representation

Parameters:
  • embedder (nn.Module) – Any embedder can be passed
  • dropout_value (float) – The dropout value for input embeddings
  • hidden_dim (int) – The hidden dimension for the LSTM
  • bidirectional (bool) – Whether the LSTM is bidirectional or no
  • combine_strategy (str) – Strategy to combine the vectors from two different directions
  • rnn_bias (str) – Whether to use the bias layer in RNN. Should be set to false only for debugging purposes
  • device (Union[str, torch.device]) – The device on which the model is run
forward(lines: List[sciwing.data.line.Line], c0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c9290> = None, h0: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c91d0> = None) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c154c96d0>
Parameters:
  • lines (List[Line]) – A list of lines to be encoder
  • c0 (torch.FloatTensor) – The initial state vector for the LSTM
  • h0 (torch.FloatTensor) – The initial hidden state for the LSTM
Returns:

Returns the vector encoding of the set of instances [batch_size, hidden_dim] if single direction [batch_size, 2*hidden_dim] if bidirectional

Return type:

torch.Tensor

get_initial_hidden(batch_size: int)

Gets the initial hidden states of the LSTM2Vec encoder

Parameters:batch_size (int) – The batch size of the current forward pass
Returns:
Return type:torch.Tensor, torch.Tensor