EduNLP.ModelZoo¶
base_model¶
- class EduNLP.ModelZoo.base_model.BaseModel[source]¶
- base_model_prefix = ''¶
- forward(*input)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
rnn¶
- class EduNLP.ModelZoo.rnn.ElmoLM(vocab_size: int, embedding_dim: int, hidden_size: int, num_layers: int = 2, dropout_rate: float = 0.5, use_pack_pad=False, **kwargs)[source]¶
- base_model_prefix = 'elmo'¶
- forward(seq_idx=None, seq_len=None) ModelOutput[source]¶
- Parameters
seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices
seq_len (Tensor, of shape (batch_size)) – length
- Returns
pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)
- Return type
ElmoLMOutput
- training: bool¶
- class EduNLP.ModelZoo.rnn.ElmoLMForKnowledgePrediction(vocab_size: int, embedding_dim: int, hidden_size: int, num_classes_list: List[int], num_total_classes: int, dropout_rate: float = 0.5, batch_first=True, head_dropout: Optional[float] = 0.5, flat_cls_weight: Optional[float] = 0.5, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, **kwargs)[source]¶
- base_model_prefix = 'elmo'¶
- training: bool¶
- forward(seq_idx=None, seq_len=None, labels=None) ModelOutput[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class EduNLP.ModelZoo.rnn.ElmoLMForPreTraining(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, use_pack_pad=False, **kwargs)[source]¶
- base_model_prefix = 'elmo'¶
- forward(seq_idx=None, seq_len=None) ModelOutput[source]¶
- Parameters
seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices
seq_len (Tensor, of shape (batch_size)) – length
pred_mask (Tensor, of shape(batch_size, sequence_length)) –
idx_mask (Tensor, of shape (batch_size, sequence_length)) –
- Returns
loss pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)
- Return type
ElmoLMForPreTrainingOutput
- training: bool¶
- class EduNLP.ModelZoo.rnn.ElmoLMForPropertyPrediction(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, head_dropout=0.5, **kwargs)[source]¶
- base_model_prefix = 'elmo'¶
- forward(seq_idx=None, seq_len=None, labels=None) ModelOutput[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class EduNLP.ModelZoo.rnn.LM(rnn_type: str, vocab_size: int, embedding_dim: int, hidden_size: int, num_layers=1, bidirectional=False, embedding=None, model_params=None, use_pack_pad=True, **kwargs)[source]¶
- Parameters
rnn_type:str – Legal types including RNN, LSTM, GRU, BiLSTM
vocab_size (int) –
embedding_dim (int) –
hidden_size (int) –
num_layers –
bidirectional –
embedding –
model_params –
kwargs –
Examples
>>> import torch >>> seq_idx = torch.LongTensor([[1, 2, 3], [1, 2, 0], [3, 0, 0]]) >>> seq_len = torch.LongTensor([3, 2, 1]) >>> lm = LM("RNN", 4, 3, 2) >>> output, hn = lm(seq_idx, seq_len) >>> output.shape torch.Size([3, 3, 2]) >>> hn.shape torch.Size([1, 3, 2]) >>> lm = LM("RNN", 4, 3, 2, num_layers=2) >>> output, hn = lm(seq_idx, seq_len) >>> output.shape torch.Size([3, 3, 2]) >>> hn.shape torch.Size([2, 3, 2])
- forward(seq_idx, seq_len)[source]¶
- Parameters
seq_idx (Tensor) – a list of indices
seq_len (Tensor) – length
- Returns
a PackedSequence object
- Return type
sequence
- training: bool¶
disenqnet¶
- class EduNLP.ModelZoo.disenqnet.DisenQNet(vocab_size: int, hidden_size: int, dropout_rate: float, wv=None, **kwargs)[source]¶
- base_model_prefix = 'disenq'¶
DisenQNet question representation model
- Parameters
vocab_size (int) – size of vocabulary
hidden_size (int) – size of word and question embedding
dropout_rate (float) – dropout rate
wv (torch.Tensor) – Tensor of (vocab_size, hidden_size) or None, initial word embedding, default = None
- forward(seq_idx=None, seq_len=None, get_vk=True, get_vi=True) ModelOutput[source]¶
- Parameters
seq_idx (Tensor of (batch_size, seq_len)) – word index
seq_len (Tensor of (batch_size)) – valid sequence length of each batch
get_vk (bool) – whether to return vk
get_vi (bool) – whether to return vi
- Returns
embed: Tensor of (batch_size, seq_len, hidden_size), word embedding
k_hidden: Tensor of (batch_size, hidden_size) or None, concept representation of question
i_hidden: Tensor of (batch_size, hidden_size) or None, individual representation of question
- Return type
DisenQNetOutput
- training: bool¶
- class EduNLP.ModelZoo.disenqnet.DisenQNetForPreTraining(vocab_size, concept_size, hidden_size, dropout_rate, pos_weight, w_cp, w_mi, w_dis, warmup, n_adversarial, wv=None, **kwargs)[source]¶
- base_model_prefix = 'disenq'¶
- training: bool¶
- forward(seq_idx=None, seq_len=None, concept=None) ModelOutput[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
quesnet¶
- class EduNLP.ModelZoo.quesnet.QuesNet(_stoi=None, meta='know_name', pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, lambda_input=None, feat_size=256, emb_size=256, rnn_type='LSTM', layers=4, **kwargs)[source]¶
- base_model_prefix = 'quesnet'¶
- forward(inputs: SeqBatch)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class EduNLP.ModelZoo.quesnet.QuesNetForPreTraining(_stoi=None, pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, meta='know_name', emb_size=256, feat_size=512, rnn_type='LSTM', lambda_input=None, lambda_loss=None, layers=4, **kwargs)[source]¶
- base_model_prefix = 'quesnet'¶
Sequence-to-sequence feature extractor based on RNN. Supports different input forms and different RNN types (LSTM/GRU),
- training: bool¶
- forward(batch)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class EduNLP.ModelZoo.quesnet.AE[source]¶
- factor = 1¶
- forward(item)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
utils¶
- class EduNLP.ModelZoo.utils.PadSequence(length, pad_val=0, clip=True)[source]¶
Pad the sequence.
Pad the sequence to the given length by inserting pad_val. If clip is set, sequence that has length larger than length will be clipped.
- Parameters
length (int) – The maximum length to pad/clip the sequence
pad_val (number) – The pad value. Default 0
clip (bool) –
- Returns
list of number
- Return type
ret
- EduNLP.ModelZoo.utils.pad_sequence(sequence: list, max_length=None, pad_val=0, clip=True)[source]¶
- Parameters
sequence –
max_length –
pad_val –
clip –
- Returns
Modified list – padding the sequence in the same size.
- Return type
list
Examples
>>> seq = [[4, 3, 3], [2], [3, 3, 2]] >>> pad_sequence(seq) [[4, 3, 3], [2, 0, 0], [3, 3, 2]] >>> pad_sequence(seq, pad_val=1) [[4, 3, 3], [2, 1, 1], [3, 3, 2]] >>> pad_sequence(seq, max_length=2) [[4, 3], [2, 0], [3, 3]] >>> pad_sequence(seq, max_length=2, clip=False) [[4, 3, 3], [2, 0], [3, 3, 2]]
- class EduNLP.ModelZoo.utils.Masker(mask: (<class 'int'>, <class 'str'>, Ellipsis) = 0, per=0.2, seed=None)[source]¶
- Parameters
mask (int, str) –
per –
seed –
Examples
>>> masker = Masker(per=0.5, seed=10) >>> items = [[1, 1, 3, 4, 6], [2], [5, 9, 1, 4]] >>> masked_seq, mask_label = masker(items) >>> masked_seq [[1, 1, 0, 0, 6], [2], [0, 9, 0, 4]] >>> mask_label [[0, 0, 1, 1, 0], [0], [1, 0, 1, 0]] >>> items = [[1, 2, 3], [1, 1, 0], [2, 0, 0]] >>> masked_seq, mask_label = masker(items, [3, 2, 1]) >>> masked_seq [[1, 0, 3], [0, 1, 0], [2, 0, 0]] >>> mask_label [[0, 1, 0], [1, 0, 0], [0, 0, 0]] >>> masker = Masker(mask="[MASK]", per=0.5, seed=10) >>> items = [["a", "b", "c"], ["d", "[PAD]", "[PAD]"], ["hello", "world", "[PAD]"]] >>> masked_seq, mask_label = masker(items, length=[3, 1, 2]) >>> masked_seq [['a', '[MASK]', 'c'], ['d', '[PAD]', '[PAD]'], ['hello', '[MASK]', '[PAD]']] >>> mask_label [[0, 1, 0], [0, 0, 0], [0, 1, 0]]
- Returns
list of masked_seq and list of masked_list
- Return type
list
- class EduNLP.ModelZoo.utils.MLP(in_dim, n_classes, hidden_dim, dropout, n_layers=2, act=<function leaky_relu>)[source]¶
- forward(input)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class EduNLP.ModelZoo.utils.TextCNN(embed_dim, hidden_dim)[source]¶
- forward(embed)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- EduNLP.ModelZoo.utils.gather_nd(params, indices)[source]¶
_summary_
- Parameters
params (_type_) – _description_
indices (_type_) – _description_
- Returns
_type_ – _description_
Examples
———
>>> gather_nd(
… params=torch.tensor([[1, 2, 3],
… [4, 5, 6]]),
… indices=torch.tensor([[1],
… [0]]))
tensor([[4, 5, 6], – [1, 2, 3]])
- EduNLP.ModelZoo.utils.sequence_mask(lengths, max_len=None)[source]¶
Same as tf.sequence_mask, Returns a mask tensor representing the first N positions of each cell.
- Parameters
lengths (_type_) – integer tensor, all its values <= maxlen.
max_len (_type_, optional) – scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths.
- Returns
_type_ – A mask tensor of shape lengths.shape + (maxlen,)
Examples
———
>>> sequence_mask(torch.tensor([1, 3, 2]), 5)
tensor([[ True, False, False, False, False], – [ True, True, True, False, False], [ True, True, False, False, False]])
>>> sequence_mask(torch.tensor([[1, 3],[2,0]]))
tensor([[[ True, False, False], – [ True, True, True]],
<BLANKLINE> –
- [[ True, True, False],
[False, False, False]]])