EduNLP.ModelZoo

base_model

class EduNLP.ModelZoo.base_model.BaseModel[source]
base_model_prefix = ''
forward(*input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

save_pretrained(output_dir)[source]
classmethod from_pretrained(pretrained_model_path, *args, **kwargs)[source]
save_config(config_dir)[source]
classmethod from_config(config_path, *args, **kwargs)[source]
training: bool

rnn

class EduNLP.ModelZoo.rnn.ElmoLM(vocab_size: int, embedding_dim: int, hidden_size: int, num_layers: int = 2, dropout_rate: float = 0.5, use_pack_pad=False, **kwargs)[source]
base_model_prefix = 'elmo'
forward(seq_idx=None, seq_len=None) ModelOutput[source]
Parameters:
  • seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices

  • seq_len (Tensor, of shape (batch_size)) – length

Returns:

pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)

Return type:

ElmoLMOutput

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.rnn.ElmoLMForKnowledgePrediction(vocab_size: int, embedding_dim: int, hidden_size: int, num_classes_list: List[int], num_total_classes: int, dropout_rate: float = 0.5, batch_first=True, head_dropout: Optional[float] = 0.5, flat_cls_weight: Optional[float] = 0.5, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, **kwargs)[source]
base_model_prefix = 'elmo'
training: bool
forward(seq_idx=None, seq_len=None, labels=None) ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
class EduNLP.ModelZoo.rnn.ElmoLMForPreTraining(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, use_pack_pad=False, **kwargs)[source]
base_model_prefix = 'elmo'
forward(seq_idx=None, seq_len=None) ModelOutput[source]
Parameters:
  • seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices

  • seq_len (Tensor, of shape (batch_size)) – length

  • pred_mask (Tensor, of shape(batch_size, sequence_length)) –

  • idx_mask (Tensor, of shape (batch_size, sequence_length)) –

Returns:

loss pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)

Return type:

ElmoLMForPreTrainingOutput

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.rnn.ElmoLMForPropertyPrediction(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, head_dropout=0.5, **kwargs)[source]
base_model_prefix = 'elmo'
forward(seq_idx=None, seq_len=None, labels=None) ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.rnn.HAM(num_classes_list: List[int], num_total_classes: int, sequence_model_hidden_size: int, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, dropout_rate=None)[source]
forward(sequential_embeddings)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class EduNLP.ModelZoo.rnn.LM(rnn_type: str, vocab_size: int, embedding_dim: int, hidden_size: int, num_layers=1, bidirectional=False, embedding=None, model_params=None, use_pack_pad=True, **kwargs)[source]
Parameters:
  • rnn_type:str – Legal types including RNN, LSTM, GRU, BiLSTM

  • vocab_size (int) –

  • embedding_dim (int) –

  • hidden_size (int) –

  • num_layers

  • bidirectional

  • embedding

  • model_params

  • kwargs

Examples

>>> import torch
>>> seq_idx = torch.LongTensor([[1, 2, 3], [1, 2, 0], [3, 0, 0]])
>>> seq_len = torch.LongTensor([3, 2, 1])
>>> lm = LM("RNN", 4, 3, 2)
>>> output, hn = lm(seq_idx, seq_len)
>>> output.shape
torch.Size([3, 3, 2])
>>> hn.shape
torch.Size([1, 3, 2])
>>> lm = LM("RNN", 4, 3, 2, num_layers=2)
>>> output, hn = lm(seq_idx, seq_len)
>>> output.shape
torch.Size([3, 3, 2])
>>> hn.shape
torch.Size([2, 3, 2])
forward(seq_idx, seq_len)[source]
Parameters:
  • seq_idx (Tensor) – a list of indices

  • seq_len (Tensor) – length

Returns:

a PackedSequence object

Return type:

sequence

training: bool

disenqnet

class EduNLP.ModelZoo.disenqnet.DisenQNet(vocab_size: int, hidden_size: int, dropout_rate: float, wv=None, **kwargs)[source]
base_model_prefix = 'disenq'

DisenQNet question representation model

Parameters:
  • vocab_size (int) – size of vocabulary

  • hidden_size (int) – size of word and question embedding

  • dropout_rate (float) – dropout rate

  • wv (torch.Tensor) – Tensor of (vocab_size, hidden_size) or None, initial word embedding, default = None

forward(seq_idx=None, seq_len=None, get_vk=True, get_vi=True) ModelOutput[source]
Parameters:
  • seq_idx (Tensor of (batch_size, seq_len)) – word index

  • seq_len (Tensor of (batch_size)) – valid sequence length of each batch

  • get_vk (bool) – whether to return vk

  • get_vi (bool) – whether to return vi

Returns:

  • embed: Tensor of (batch_size, seq_len, hidden_size), word embedding

  • k_hidden: Tensor of (batch_size, hidden_size) or None, concept representation of question

  • i_hidden: Tensor of (batch_size, hidden_size) or None, individual representation of question

Return type:

DisenQNetOutput

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.disenqnet.DisenQNetForPreTraining(vocab_size, concept_size, hidden_size, dropout_rate, pos_weight, w_cp, w_mi, w_dis, warmup, n_adversarial, wv=None, **kwargs)[source]
base_model_prefix = 'disenq'
forward(seq_idx=None, seq_len=None, concept=None) ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.disenqnet.DisenQNetForPropertyPrediction(vocab_size: int, hidden_size: int, dropout_rate: float, wv=None, head_dropout=0.5, **kwargs)[source]
base_model_prefix = 'disenq'
forward(seq_idx=None, seq_len=None, labels=None, vector_type='i') ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.disenqnet.DisenQNetForKnowledgePrediction(vocab_size: int, hidden_size: int, dropout_rate: float, num_classes_list: List[int], num_total_classes: int, wv=None, head_dropout: Optional[float] = 0.5, flat_cls_weight: Optional[float] = 0.5, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, **kwargs)[source]
base_model_prefix = 'disenq'
training: bool
forward(seq_idx=None, seq_len=None, labels=None, vector_type='i') ModelOutput[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]

quesnet

class EduNLP.ModelZoo.quesnet.QuesNet(_stoi=None, meta='know_name', pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, lambda_input=None, feat_size=256, emb_size=256, rnn_type='LSTM', layers=4, **kwargs)[source]
base_model_prefix = 'quesnet'
init_h(batch_size)[source]
load_emb(emb)[source]
load_img(img_layer: Module)[source]
load_meta(meta_layer: Module)[source]
make_batch(data, device, pretrain=False)[source]

Returns embeddings

forward(inputs: SeqBatch)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
training: bool
class EduNLP.ModelZoo.quesnet.QuesNetForPreTraining(_stoi=None, pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, meta='know_name', emb_size=256, feat_size=512, rnn_type='LSTM', lambda_input=None, lambda_loss=None, layers=4, **kwargs)[source]
base_model_prefix = 'quesnet'

Sequence-to-sequence feature extractor based on RNN. Supports different input forms and different RNN types (LSTM/GRU),

training: bool
forward(batch)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]
class EduNLP.ModelZoo.quesnet.AE[source]
factor = 1
enc(item, *args, **kwargs)[source]
dec(item, *args, **kwargs)[source]
loss(item, emb=None)[source]
forward(item)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class EduNLP.ModelZoo.quesnet.ImageAE(emb_size)[source]
encoder(item, detach_tensor=False)[source]
decoder(emb, detach_tensor=False)[source]
training: bool
class EduNLP.ModelZoo.quesnet.MetaAE(meta_size, emb_size)[source]
training: bool

utils

class EduNLP.ModelZoo.utils.PadSequence(length, pad_val=0, clip=True)[source]

Pad the sequence.

Pad the sequence to the given length by inserting pad_val. If clip is set, sequence that has length larger than length will be clipped.

Parameters:
  • length (int) – The maximum length to pad/clip the sequence

  • pad_val (number) – The pad value. Default 0

  • clip (bool) –

Returns:

list of number

Return type:

ret

EduNLP.ModelZoo.utils.pad_sequence(sequence: list, max_length=None, pad_val=0, clip=True)[source]
Parameters:
  • sequence

  • max_length

  • pad_val

  • clip

Returns:

Modified list – padding the sequence in the same size.

Return type:

list

Examples

>>> seq = [[4, 3, 3], [2], [3, 3, 2]]
>>> pad_sequence(seq)
[[4, 3, 3], [2, 0, 0], [3, 3, 2]]
>>> pad_sequence(seq, pad_val=1)
[[4, 3, 3], [2, 1, 1], [3, 3, 2]]
>>> pad_sequence(seq, max_length=2)
[[4, 3], [2, 0], [3, 3]]
>>> pad_sequence(seq, max_length=2, clip=False)
[[4, 3, 3], [2, 0], [3, 3, 2]]
EduNLP.ModelZoo.utils.set_device(_net, ctx, *args, **kwargs)[source]

code from longling v1.3.26

class EduNLP.ModelZoo.utils.Masker(mask: (<class 'int'>, <class 'str'>, Ellipsis) = 0, per=0.2, seed=None)[source]
Parameters:
  • mask (int, str) –

  • per

  • seed

Examples

>>> masker = Masker(per=0.5, seed=10)
>>> items = [[1, 1, 3, 4, 6], [2], [5, 9, 1, 4]]
>>> masked_seq, mask_label = masker(items)
>>> masked_seq
[[1, 1, 0, 0, 6], [2], [0, 9, 0, 4]]
>>> mask_label
[[0, 0, 1, 1, 0], [0], [1, 0, 1, 0]]
>>> items = [[1, 2, 3], [1, 1, 0], [2, 0, 0]]
>>> masked_seq, mask_label = masker(items, [3, 2, 1])
>>> masked_seq
[[1, 0, 3], [0, 1, 0], [2, 0, 0]]
>>> mask_label
[[0, 1, 0], [1, 0, 0], [0, 0, 0]]
>>> masker = Masker(mask="[MASK]", per=0.5, seed=10)
>>> items = [["a", "b", "c"], ["d", "[PAD]", "[PAD]"], ["hello", "world", "[PAD]"]]
>>> masked_seq, mask_label = masker(items, length=[3, 1, 2])
>>> masked_seq
[['a', '[MASK]', 'c'], ['d', '[PAD]', '[PAD]'], ['hello', '[MASK]', '[PAD]']]
>>> mask_label
[[0, 1, 0], [0, 0, 0], [0, 1, 0]]
Returns:

list of masked_seq and list of masked_list

Return type:

list

EduNLP.ModelZoo.utils.load_items(data_path)[source]
class EduNLP.ModelZoo.utils.MLP(in_dim, n_classes, hidden_dim, dropout, n_layers=2, act=<function leaky_relu>)[source]
forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class EduNLP.ModelZoo.utils.TextCNN(embed_dim, hidden_dim)[source]
forward(embed)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class EduNLP.ModelZoo.utils.KnowledgePredictionOutput[source]
loss: FloatTensor = None
logits: FloatTensor = None
class EduNLP.ModelZoo.utils.ModelOutput[source]

Base class for all model outputs as dataclass. Has a __getitem__ that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None attributes. Otherwise behaves like a regular python dictionary.

<Tip warning={true}>

You can’t unpack a ModelOutput directly. Use the [~utils.ModelOutput.to_tuple] method to convert it to a tuple before.

</Tip>

setdefault(*args, **kwargs)[source]

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

pop(k[, d]) v, remove specified key and return the corresponding[source]

value. If key is not found, d is returned if given, otherwise KeyError is raised.

update([E, ]**F) None.  Update D from dict/iterable E and F.[source]

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

to_tuple() Tuple[Any][source]

Convert self to a tuple containing all the attributes/keys that are not None.

class EduNLP.ModelZoo.utils.PropertyPredictionOutput[source]
loss: FloatTensor = None
logits: FloatTensor = None
EduNLP.ModelZoo.utils.gather_nd(params, indices)[source]

_summary_

Parameters:
  • params (_type_) – _description_

  • indices (_type_) – _description_

Returns:

  • _type_ – _description_

  • Examples

  • ———

  • >>> gather_nd(

  • … params=torch.tensor([[1, 2, 3],

  • … [4, 5, 6]]),

  • … indices=torch.tensor([[1],

  • … [0]]))

  • tensor([[4, 5, 6], – [1, 2, 3]])

EduNLP.ModelZoo.utils.sequence_mask(lengths, max_len=None)[source]

Same as tf.sequence_mask, Returns a mask tensor representing the first N positions of each cell.

Parameters:
  • lengths (_type_) – integer tensor, all its values <= maxlen.

  • max_len (_type_, optional) – scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths.

Returns:

  • _type_ – A mask tensor of shape lengths.shape + (maxlen,)

  • Examples

  • ———

  • >>> sequence_mask(torch.tensor([1, 3, 2]), 5)

  • tensor([[ True, False, False, False, False], – [ True, True, True, False, False], [ True, True, False, False, False]])

  • >>> sequence_mask(torch.tensor([[1, 3],[2,0]]))

  • tensor([[[ True, False, False], – [ True, True, True]],

  • <BLANKLINE>

    [[ True, True, False],

    [False, False, False]]])