sciwing.models¶
Simple Classifier¶
-
class
sciwing.models.simpleclassifier.
SimpleClassifier
(encoder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c1558c2d0>, encoding_dim: int, num_classes: int, classification_layer_bias: bool = True, label_namespace: str = 'label', datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c1551b750>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.utils.class_nursery.ClassNursery
-
__init__
(encoder: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c1558c2d0>, encoding_dim: int, num_classes: int, classification_layer_bias: bool = True, label_namespace: str = 'label', datasets_manager: sciwing.data.datasets_manager.DatasetsManager = None, device: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c1551b750>, str] = <sphinx.ext.autodoc.importer._MockObject object>)¶ SimpleClassifier is a linear classifier head on top of any encoder
Parameters: - encoder (nn.Module) – Any encoder that takes in lines and produces a single vector for every line.
- encoding_dim (int) – The encoding dimension
- num_classes (int) – The number of classes
- classification_layer_bias (bool) – Whether to add classification layer bias or no This is set to false only for debugging purposes ff
- label_namespace (str) – The namespace used for labels in the dataset
- datasets_manager (DatasetsManager) – The datasets manager for the model
- device (torch.device) – The device on which the model is run
-
forward
(lines: List[sciwing.data.line.Line], labels: List[sciwing.data.label.Label] = None, is_training: bool = False, is_validation: bool = False, is_test: bool = False) → Dict[str, Any]¶ Parameters: - lines (List[Line]) –
iter_dict
from any dataset that will be passed on to the encoder - labels (List[Label]) – A list of labels for every instance
- is_training (bool) – running forward on training dataset?
- is_validation (bool) – running forward on validation dataset?
- is_test (bool) – running forward on test dataset?
Returns: - logits: torch.FloatTensor
Un-normalized probabilities over all the classes of the shape
[batch_size, num_classes]
- normalized_probs: torch.FloatTensor
Normalized probabilities over all the classes of the shape
[batch_size, num_classes]
- loss: float
Loss value if this is a training forward pass or validation loss. There will be no loss if this is the test dataset
Return type: Dict[str, Any]
- lines (List[Line]) –
-
Simple Tagger¶
-
class
sciwing.models.simple_tagger.
SimpleTagger
(rnn2seqencoder: sciwing.modules.lstm2seqencoder.Lstm2SeqEncoder, encoding_dim: int, datasets_manager: sciwing.data.datasets_manager.DatasetsManager, device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c12052b90> = <sphinx.ext.autodoc.importer._MockObject object>, label_namespace: str = 'seq_label')¶ Bases:
sphinx.ext.autodoc.importer._MockObject
,sciwing.utils.class_nursery.ClassNursery
PyTorch module for Neural Parscit
-
__init__
(rnn2seqencoder: sciwing.modules.lstm2seqencoder.Lstm2SeqEncoder, encoding_dim: int, datasets_manager: sciwing.data.datasets_manager.DatasetsManager, device: <sphinx.ext.autodoc.importer._MockObject object at 0x7f6c12052b90> = <sphinx.ext.autodoc.importer._MockObject object>, label_namespace: str = 'seq_label')¶ Parameters: - rnn2seqencoder (Lstm2SeqEncoder) – Lstm2SeqEncoder that encodes a set of instances to a sequence of hidden states
- encoding_dim (int) – Hidden dimension of the lstm2seq encoder
-
forward
(lines: List[sciwing.data.line.Line], labels: List[sciwing.data.seq_label.SeqLabel] = None, is_training: bool = False, is_validation: bool = False, is_test: bool = False)¶ Parameters: - lines (List[lines]) – A list of lines
- labels (List[SeqLabel]) – A list of sequence labels
- is_training (bool) – running forward on training dataset?
- is_validation (bool) – running forward on training dataset ?
- is_test (bool) – running forward on test dataset?
Returns: - logits: torch.FloatTensor
Un-normalized probabilities over all the classes of the shape
[batch_size, num_classes]
- predicted_tags: List[List[int]]
Set of predicted tags for the batch
- loss: float
Loss value if this is a training forward pass or validation loss. There will be no loss if this is the test dataset
Return type: Dict[str, Any]
-
Neural Parscit¶
-
class
sciwing.models.neural_parscit.
NeuralParscit
(device: Optional[Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f6c1482dc90>, int]] = -1)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
It defines a neural parscit model. The model is used for citation string parsing. This model helps you use a pre-trained model who architecture is fixed and is trained by SciWING. You can also fine-tune the model on your own dataset.
For practitioners, we provide ways to obtain results quickly from a set of citations stored in a file or from a string. If you want to see the demo head over to our demo site.
-
interact
()¶ Interact with the pretrained model You can also interact from command line using sciwing interact neural-parscit
-
predict_for_file
(filename: str) → List[str]¶ Parse the references in a file where every line is a reference
Parameters: filename (str) – The filename where the references are stored Returns: A list of parsed tags Return type: List[str]
-
predict_for_text
(text: str, show=True) → str¶ Parse the citation string for the given text
Parameters: - text (str) – reference string to parse
- show (bool) – If True, then we print the stylized string - where the stylized string provides different colors for different tags If False - then we do not print the stylized string
Returns: The parsed citation string
Return type: str
-
Citation Intent Classification¶
-
class
sciwing.models.citation_intent_clf.
CitationIntentClassification
¶ Bases:
sphinx.ext.autodoc.importer._MockObject
-
interact
()¶ Interact with the pretrained model
-
predict_for_file
(filename: str) → List[str]¶ Predict the intents for all the citations in the filename The citations should be contained one per line
Parameters: filename (str) – The filename where the citations are stored Returns: Returns the intents for each line of citation Return type: List[str]
-
predict_for_text
(text: str) → str¶ Predict the intent for citation
Parameters: text (str) – The citation string Returns: The predicted label for the citation Return type: str
-
Generic Section Header Classification¶
-
class
sciwing.models.generic_sect.
GenericSect
¶ Bases:
object
-
interact
()¶ Interact with the pretrained model
-
predict_for_file
(filename: str) → List[str]¶ Make predictions for every line in the file
Parameters: filename (str) – The filename where section headers are stored one per line Returns: A list of predictions Return type: List[str]
-
predict_for_text
(text: str, show=True) → str¶ Predicts the generic section headers of the text
Parameters: - text (str) – The section header string to be normalized
- show (bool) – If True then we print the prediction.
Returns: The prediction for the section header
Return type: str
-
I2B2 NER¶
-
class
sciwing.models.i2b2.
I2B2NER
¶ Bases:
sphinx.ext.autodoc.importer._MockObject
It defines a I2B2 clinical NER model trained using SciWING
For practitioners, we provide ways to obtain results quickly from a set of citations stored in a file or from a string. If you want to see the demo head over to our demo site.
-
interact
()¶
-
predict_for_file
(filename: str) → List[str]¶
-
predict_for_text
(text: str)¶
-
SectLabel¶
-
class
sciwing.models.sectlabel.
SectLabel
(log_file: str = None, device: str = 'cpu')¶ Bases:
object
-
dehyphenate
(lines: List[str]) → List[str]¶ Dehyphenates a list of strings
Parameters: lines (List[str]) – A list of hyphenated strings Returns: A list of dehyphenated strings Return type: List[str]
-
extract_abstract_for_file
(pdf_filename: pathlib.Path, dehyphenate: bool = True) → str¶ Extracts abstracts from a pdf using sectlabel. This is the python programmatic version of the API. The APIs can be found in sciwing/api. You can see that for more information
Parameters: - pdf_filename (pathlib.Path) – The path where the pdf is stored
- dehyphenate (bool) – Scientific documents are two columns sometimes and there are a lot of hyphenation introduced. If this is true, we remove the hyphens from the code
Returns: The abstract of the pdf
Return type: str
-
extract_abstract_for_folder
(foldername: pathlib.Path, dehyphenate=True)¶ Extracts the abstracts for all the pdf fils stored in a folder
Parameters: - foldername (pathlib.Path) – THe path of the folder containing pdf files
- dehyphenate (bool) – We will try to dehyphenate the lines. Useful if the pdfs are two column research paper
Returns: Writes the abstracts to files
Return type: None
-
extract_all_info
(pdf_filename: pathlib.Path)¶ Extracts information from the pdf file.
Parameters: pdf_filename (pathlib.Path) – The path of the pdf file Returns: A dictionary containing information parsed from the pdf file Return type: Dict[str, Any]
-
interact
()¶ Interact with the pre-trained model
-
predict_for_file
(filename: str) → List[str]¶ Predicts the logical sections for all the sentences in a file, with one sentence per line
Parameters: filename (str) – The path of the file Returns: The predictions for each line. Return type: List[str]
-
predict_for_pdf
(pdf_filename: pathlib.Path) -> (typing.List[str], typing.List[str])¶ Predicts lines and labels given a pdf filename
Parameters: pdf_filename (pathlib.Path) – The location where pdf files are stored Returns: The lines and labels inferred on the file Return type: List[str], List[str]
-
predict_for_text
(text: str) → str¶ Predicts the logical section that the line belongs to
Parameters: text (str) – A single line of text Returns: The logical section of the text. Return type: str
-
predict_for_text_batch
(texts: List[str]) → List[str]¶ Predicts the logical section for a batch of text.
Parameters: texts (List[str]) – A batch of text Returns: A batch of predictions Return type: List[str]
-