EduNLP.ModelZoo

base_model

class EduNLP.ModelZoo.base_model.BaseModel[source]
base_model_prefix = ''
forward(*input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

save_pretrained(output_dir)[source]
classmethod from_pretrained(pretrained_model_path, *args, **kwargs)[source]
save_config(config_dir)[source]
classmethod from_config(config_path, *args, **kwargs)[source]
training: bool

rnn

class EduNLP.ModelZoo.rnn.ElmoLM(vocab_size: int, embedding_dim: int, hidden_size: int, num_layers: int = 2, dropout_rate: float = 0.5, use_pack_pad=False, **kwargs)[source]
base_model_prefix = 'elmo'
forward(seq_idx=None, seq_len=None) ModelOutput[source]
Parameters
  • seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices

  • seq_len (Tensor, of shape (batch_size)) – length

Returns

pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)

Return type

ElmoLMOutput

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.rnn.ElmoLMForKnowledgePrediction(vocab_size: int, embedding_dim: int, hidden_size: int, num_classes_list: List[int], num_total_classes: int, dropout_rate: float = 0.5, batch_first=True, head_dropout: Optional[float] = 0.5, flat_cls_weight: Optional[float] = 0.5, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, **kwargs)[source]
base_model_prefix = 'elmo'
training: bool
forward(seq_idx=None, seq_len=None, labels=None) ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
class EduNLP.ModelZoo.rnn.ElmoLMForPreTraining(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, use_pack_pad=False, **kwargs)[source]
base_model_prefix = 'elmo'
forward(seq_idx=None, seq_len=None) ModelOutput[source]
Parameters
  • seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices

  • seq_len (Tensor, of shape (batch_size)) – length

  • pred_mask (Tensor, of shape(batch_size, sequence_length)) –

  • idx_mask (Tensor, of shape (batch_size, sequence_length)) –

Returns

loss pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)

Return type

ElmoLMForPreTrainingOutput

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.rnn.ElmoLMForPropertyPrediction(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, head_dropout=0.5, **kwargs)[source]
base_model_prefix = 'elmo'
forward(seq_idx=None, seq_len=None, labels=None) ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.rnn.LM(rnn_type: str, vocab_size: int, embedding_dim: int, hidden_size: int, num_layers=1, bidirectional=False, embedding=None, model_params=None, use_pack_pad=True, **kwargs)[source]
Parameters
  • rnn_type:str – Legal types including RNN, LSTM, GRU, BiLSTM

  • vocab_size (int) –

  • embedding_dim (int) –

  • hidden_size (int) –

  • num_layers

  • bidirectional

  • embedding

  • model_params

  • kwargs

Examples

>>> import torch
>>> seq_idx = torch.LongTensor([[1, 2, 3], [1, 2, 0], [3, 0, 0]])
>>> seq_len = torch.LongTensor([3, 2, 1])
>>> lm = LM("RNN", 4, 3, 2)
>>> output, hn = lm(seq_idx, seq_len)
>>> output.shape
torch.Size([3, 3, 2])
>>> hn.shape
torch.Size([1, 3, 2])
>>> lm = LM("RNN", 4, 3, 2, num_layers=2)
>>> output, hn = lm(seq_idx, seq_len)
>>> output.shape
torch.Size([3, 3, 2])
>>> hn.shape
torch.Size([2, 3, 2])
forward(seq_idx, seq_len)[source]
Parameters
  • seq_idx (Tensor) – a list of indices

  • seq_len (Tensor) – length

Returns

a PackedSequence object

Return type

sequence

training: bool

disenqnet

class EduNLP.ModelZoo.disenqnet.DisenQNet(vocab_size: int, hidden_size: int, dropout_rate: float, wv=None, **kwargs)[source]
base_model_prefix = 'disenq'

DisenQNet question representation model

Parameters
  • vocab_size (int) – size of vocabulary

  • hidden_size (int) – size of word and question embedding

  • dropout_rate (float) – dropout rate

  • wv (torch.Tensor) – Tensor of (vocab_size, hidden_size) or None, initial word embedding, default = None

forward(seq_idx=None, seq_len=None, get_vk=True, get_vi=True) ModelOutput[source]
Parameters
  • seq_idx (Tensor of (batch_size, seq_len)) – word index

  • seq_len (Tensor of (batch_size)) – valid sequence length of each batch

  • get_vk (bool) – whether to return vk

  • get_vi (bool) – whether to return vi

Returns

  • embed: Tensor of (batch_size, seq_len, hidden_size), word embedding

  • k_hidden: Tensor of (batch_size, hidden_size) or None, concept representation of question

  • i_hidden: Tensor of (batch_size, hidden_size) or None, individual representation of question

Return type

DisenQNetOutput

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.disenqnet.DisenQNetForPreTraining(vocab_size, concept_size, hidden_size, dropout_rate, pos_weight, w_cp, w_mi, w_dis, warmup, n_adversarial, wv=None, **kwargs)[source]
base_model_prefix = 'disenq'
training: bool
forward(seq_idx=None, seq_len=None, concept=None) ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]

quesnet

class EduNLP.ModelZoo.quesnet.QuesNet(_stoi=None, meta='know_name', pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, lambda_input=None, feat_size=256, emb_size=256, rnn_type='LSTM', layers=4, **kwargs)[source]
base_model_prefix = 'quesnet'
init_h(batch_size)[source]
load_emb(emb)[source]
load_img(img_layer: Module)[source]
load_meta(meta_layer: Module)[source]
make_batch(data, device, pretrain=False)[source]

Returns embeddings

forward(inputs: SeqBatch)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.quesnet.QuesNetForPreTraining(_stoi=None, pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, meta='know_name', emb_size=256, feat_size=512, rnn_type='LSTM', lambda_input=None, lambda_loss=None, layers=4, **kwargs)[source]
base_model_prefix = 'quesnet'

Sequence-to-sequence feature extractor based on RNN. Supports different input forms and different RNN types (LSTM/GRU),

training: bool
forward(batch)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
class EduNLP.ModelZoo.quesnet.AE[source]
factor = 1
enc(item, *args, **kwargs)[source]
dec(item, *args, **kwargs)[source]
loss(item, emb=None)[source]
forward(item)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class EduNLP.ModelZoo.quesnet.ImageAE(emb_size)[source]
encoder(item, detach_tensor=False)[source]
decoder(emb, detach_tensor=False)[source]
training: bool
class EduNLP.ModelZoo.quesnet.MetaAE(meta_size, emb_size)[source]
training: bool

utils

class EduNLP.ModelZoo.utils.PadSequence(length, pad_val=0, clip=True)[source]

Pad the sequence.

Pad the sequence to the given length by inserting pad_val. If clip is set, sequence that has length larger than length will be clipped.

Parameters
  • length (int) – The maximum length to pad/clip the sequence

  • pad_val (number) – The pad value. Default 0

  • clip (bool) –

Returns

list of number

Return type

ret

EduNLP.ModelZoo.utils.pad_sequence(sequence: list, max_length=None, pad_val=0, clip=True)[source]
Parameters
  • sequence

  • max_length

  • pad_val

  • clip

Returns

Modified list – padding the sequence in the same size.

Return type

list

Examples

>>> seq = [[4, 3, 3], [2], [3, 3, 2]]
>>> pad_sequence(seq)
[[4, 3, 3], [2, 0, 0], [3, 3, 2]]
>>> pad_sequence(seq, pad_val=1)
[[4, 3, 3], [2, 1, 1], [3, 3, 2]]
>>> pad_sequence(seq, max_length=2)
[[4, 3], [2, 0], [3, 3]]
>>> pad_sequence(seq, max_length=2, clip=False)
[[4, 3, 3], [2, 0], [3, 3, 2]]
EduNLP.ModelZoo.utils.set_device(_net, ctx, *args, **kwargs)[source]

code from longling v1.3.26

class EduNLP.ModelZoo.utils.Masker(mask: (<class 'int'>, <class 'str'>, Ellipsis) = 0, per=0.2, seed=None)[source]
Parameters
  • mask (int, str) –

  • per

  • seed

Examples

>>> masker = Masker(per=0.5, seed=10)
>>> items = [[1, 1, 3, 4, 6], [2], [5, 9, 1, 4]]
>>> masked_seq, mask_label = masker(items)
>>> masked_seq
[[1, 1, 0, 0, 6], [2], [0, 9, 0, 4]]
>>> mask_label
[[0, 0, 1, 1, 0], [0], [1, 0, 1, 0]]
>>> items = [[1, 2, 3], [1, 1, 0], [2, 0, 0]]
>>> masked_seq, mask_label = masker(items, [3, 2, 1])
>>> masked_seq
[[1, 0, 3], [0, 1, 0], [2, 0, 0]]
>>> mask_label
[[0, 1, 0], [1, 0, 0], [0, 0, 0]]
>>> masker = Masker(mask="[MASK]", per=0.5, seed=10)
>>> items = [["a", "b", "c"], ["d", "[PAD]", "[PAD]"], ["hello", "world", "[PAD]"]]
>>> masked_seq, mask_label = masker(items, length=[3, 1, 2])
>>> masked_seq
[['a', '[MASK]', 'c'], ['d', '[PAD]', '[PAD]'], ['hello', '[MASK]', '[PAD]']]
>>> mask_label
[[0, 1, 0], [0, 0, 0], [0, 1, 0]]
Returns

list of masked_seq and list of masked_list

Return type

list

EduNLP.ModelZoo.utils.load_items(data_path)[source]
class EduNLP.ModelZoo.utils.MLP(in_dim, n_classes, hidden_dim, dropout, n_layers=2, act=<function leaky_relu>)[source]
forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class EduNLP.ModelZoo.utils.TextCNN(embed_dim, hidden_dim)[source]
forward(embed)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
EduNLP.ModelZoo.utils.gather_nd(params, indices)[source]

_summary_

Parameters
  • params (_type_) – _description_

  • indices (_type_) – _description_

Returns

  • _type_ – _description_

  • Examples

  • ———

  • >>> gather_nd(

  • … params=torch.tensor([[1, 2, 3],

  • … [4, 5, 6]]),

  • … indices=torch.tensor([[1],

  • … [0]]))

  • tensor([[4, 5, 6], – [1, 2, 3]])

EduNLP.ModelZoo.utils.sequence_mask(lengths, max_len=None)[source]

Same as tf.sequence_mask, Returns a mask tensor representing the first N positions of each cell.

Parameters
  • lengths (_type_) – integer tensor, all its values <= maxlen.

  • max_len (_type_, optional) – scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths.

Returns

  • _type_ – A mask tensor of shape lengths.shape + (maxlen,)

  • Examples

  • ———

  • >>> sequence_mask(torch.tensor([1, 3, 2]), 5)

  • tensor([[ True, False, False, False, False], – [ True, True, True, False, False], [ True, True, False, False, False]])

  • >>> sequence_mask(torch.tensor([[1, 3],[2,0]]))

  • tensor([[[ True, False, False], – [ True, True, True]],

  • <BLANKLINE>

    [[ True, True, False],

    [False, False, False]]])