EduNLP.ModelZoo¶

base_model¶

class EduNLP.ModelZoo.base_model.BaseModel[source]¶

base_model_prefix = ''¶

forward(*input)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

save_pretrained(output_dir)[source]¶

classmethod from_pretrained(pretrained_model_path, *args, **kwargs)[source]¶

save_config(config_dir)[source]¶

classmethod from_config(config_path, *args, **kwargs)[source]¶

training: bool¶

rnn¶

class EduNLP.ModelZoo.rnn.ElmoLM(vocab_size: int, embedding_dim: int, hidden_size: int, num_layers: int = 2, dropout_rate: float = 0.5, use_pack_pad=False, **kwargs)[source]¶

base_model_prefix = 'elmo'¶

forward(seq_idx=None, seq_len=None) → ModelOutput[source]¶

Parameters:

seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices
seq_len (Tensor, of shape (batch_size)) – length

Returns:

pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)

Return type:

ElmoLMOutput

classmethod from_config(config_path, **kwargs)[source]¶

training: bool¶

class EduNLP.ModelZoo.rnn.ElmoLMForKnowledgePrediction(vocab_size: int, embedding_dim: int, hidden_size: int, num_classes_list: List[int], num_total_classes: int, dropout_rate: float = 0.5, batch_first=True, head_dropout: Optional[float] = 0.5, flat_cls_weight: Optional[float] = 0.5, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, **kwargs)[source]¶

base_model_prefix = 'elmo'¶

training: bool¶

forward(seq_idx=None, seq_len=None, labels=None) → ModelOutput[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]¶

class EduNLP.ModelZoo.rnn.ElmoLMForPreTraining(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, use_pack_pad=False, **kwargs)[source]¶

base_model_prefix = 'elmo'¶

forward(seq_idx=None, seq_len=None) → ModelOutput[source]¶

Parameters:

seq_idx (Tensor, of shape (batch_size, sequence_length)) – a list of indices
seq_len (Tensor, of shape (batch_size)) – length
pred_mask (Tensor, of shape(batch_size, sequence_length)) –
idx_mask (Tensor, of shape (batch_size, sequence_length)) –

Returns:

loss pred_forward: of shape (batch_size, sequence_length) pred_backward: of shape (batch_size, sequence_length) forward_output: of shape (batch_size, sequence_length, hidden_size) backward_output: of shape (batch_size, sequence_length, hidden_size)

Return type:

ElmoLMForPreTrainingOutput

classmethod from_config(config_path, **kwargs)[source]¶

training: bool¶

class EduNLP.ModelZoo.rnn.ElmoLMForPropertyPrediction(vocab_size: int, embedding_dim: int, hidden_size: int, dropout_rate: float = 0.5, batch_first=True, head_dropout=0.5, **kwargs)[source]¶

base_model_prefix = 'elmo'¶

forward(seq_idx=None, seq_len=None, labels=None) → ModelOutput[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]¶

training: bool¶

class EduNLP.ModelZoo.rnn.HAM(num_classes_list: List[int], num_total_classes: int, sequence_model_hidden_size: int, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, dropout_rate=None)[source]¶

forward(sequential_embeddings)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class EduNLP.ModelZoo.rnn.LM(rnn_type: str, vocab_size: int, embedding_dim: int, hidden_size: int, num_layers=1, bidirectional=False, embedding=None, model_params=None, use_pack_pad=True, **kwargs)[source]¶

Parameters:

rnn_type：str – Legal types including RNN, LSTM, GRU, BiLSTM
vocab_size (int) –
embedding_dim (int) –
hidden_size (int) –
num_layers –
bidirectional –
embedding –
model_params –
kwargs –

Examples

>>> import torch
>>> seq_idx = torch.LongTensor([[1, 2, 3], [1, 2, 0], [3, 0, 0]])
>>> seq_len = torch.LongTensor([3, 2, 1])
>>> lm = LM("RNN", 4, 3, 2)
>>> output, hn = lm(seq_idx, seq_len)
>>> output.shape
torch.Size([3, 3, 2])
>>> hn.shape
torch.Size([1, 3, 2])
>>> lm = LM("RNN", 4, 3, 2, num_layers=2)
>>> output, hn = lm(seq_idx, seq_len)
>>> output.shape
torch.Size([3, 3, 2])
>>> hn.shape
torch.Size([2, 3, 2])

forward(seq_idx, seq_len)[source]¶

Parameters:

seq_idx (Tensor) – a list of indices
seq_len (Tensor) – length

Returns:

a PackedSequence object

Return type:

sequence

training: bool¶

disenqnet¶

class EduNLP.ModelZoo.disenqnet.DisenQNet(vocab_size: int, hidden_size: int, dropout_rate: float, wv=None, **kwargs)[source]¶

base_model_prefix = 'disenq'¶

DisenQNet question representation model

Parameters:

vocab_size (int) – size of vocabulary
hidden_size (int) – size of word and question embedding
dropout_rate (float) – dropout rate
wv (torch.Tensor) – Tensor of (vocab_size, hidden_size) or None, initial word embedding, default = None

forward(seq_idx=None, seq_len=None, get_vk=True, get_vi=True) → ModelOutput[source]¶

Parameters:

seq_idx (Tensor of (batch_size, seq_len)) – word index
seq_len (Tensor of (batch_size)) – valid sequence length of each batch
get_vk (bool) – whether to return vk
get_vi (bool) – whether to return vi

Returns:

embed: Tensor of (batch_size, seq_len, hidden_size), word embedding
k_hidden: Tensor of (batch_size, hidden_size) or None, concept representation of question
i_hidden: Tensor of (batch_size, hidden_size) or None, individual representation of question

Return type:

DisenQNetOutput

classmethod from_config(config_path, **kwargs)[source]¶

training: bool¶

class EduNLP.ModelZoo.disenqnet.DisenQNetForPreTraining(vocab_size, concept_size, hidden_size, dropout_rate, pos_weight, w_cp, w_mi, w_dis, warmup, n_adversarial, wv=None, **kwargs)[source]¶

base_model_prefix = 'disenq'¶

forward(seq_idx=None, seq_len=None, concept=None) → ModelOutput[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]¶

training: bool¶

class EduNLP.ModelZoo.disenqnet.DisenQNetForPropertyPrediction(vocab_size: int, hidden_size: int, dropout_rate: float, wv=None, head_dropout=0.5, **kwargs)[source]¶

base_model_prefix = 'disenq'¶

forward(seq_idx=None, seq_len=None, labels=None, vector_type='i') → ModelOutput[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]¶

training: bool¶

class EduNLP.ModelZoo.disenqnet.DisenQNetForKnowledgePrediction(vocab_size: int, hidden_size: int, dropout_rate: float, num_classes_list: List[int], num_total_classes: int, wv=None, head_dropout: Optional[float] = 0.5, flat_cls_weight: Optional[float] = 0.5, attention_unit_size: Optional[int] = 256, fc_hidden_size: Optional[int] = 512, beta: Optional[float] = 0.5, **kwargs)[source]¶

base_model_prefix = 'disenq'¶

training: bool¶

forward(seq_idx=None, seq_len=None, labels=None, vector_type='i') → ModelOutput[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]¶

quesnet¶

class EduNLP.ModelZoo.quesnet.QuesNet(_stoi=None, meta='know_name', pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, lambda_input=None, feat_size=256, emb_size=256, rnn_type='LSTM', layers=4, **kwargs)[source]¶

base_model_prefix = 'quesnet'¶

init_h(batch_size)[source]¶

load_emb(emb)[source]¶

load_img(img_layer: Module)[source]¶

load_meta(meta_layer: Module)[source]¶

make_batch(data, device, pretrain=False)[source]¶: Returns embeddings

forward(inputs: SeqBatch)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]¶

training: bool¶

class EduNLP.ModelZoo.quesnet.QuesNetForPreTraining(_stoi=None, pretrained_embs: Optional[ndarray] = None, pretrained_image: Optional[Module] = None, pretrained_meta: Optional[Module] = None, meta='know_name', emb_size=256, feat_size=512, rnn_type='LSTM', lambda_input=None, lambda_loss=None, layers=4, **kwargs)[source]¶

base_model_prefix = 'quesnet'¶: Sequence-to-sequence feature extractor based on RNN. Supports different input forms and different RNN types (LSTM/GRU),

training: bool¶

forward(batch)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config_path, **kwargs)[source]¶

class EduNLP.ModelZoo.quesnet.AE[source]¶

factor = 1¶

enc(item, *args, **kwargs)[source]¶

dec(item, *args, **kwargs)[source]¶

loss(item, emb=None)[source]¶

forward(item)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class EduNLP.ModelZoo.quesnet.ImageAE(emb_size)[source]¶

encoder(item, detach_tensor=False)[source]¶

decoder(emb, detach_tensor=False)[source]¶

training: bool¶

class EduNLP.ModelZoo.quesnet.MetaAE(meta_size, emb_size)[source]¶

training: bool¶

utils¶

class EduNLP.ModelZoo.utils.PadSequence(length, pad_val=0, clip=True)[source]¶

Pad the sequence.

Pad the sequence to the given length by inserting pad_val. If clip is set, sequence that has length larger than length will be clipped.

Parameters:

length (int) – The maximum length to pad/clip the sequence
pad_val (number) – The pad value. Default 0
clip (bool) –

Returns:

list of number

Return type:

ret

EduNLP.ModelZoo.utils.pad_sequence(sequence: list, max_length=None, pad_val=0, clip=True)[source]¶

Parameters:

sequence –
max_length –
pad_val –
clip –

Returns:

Modified list – padding the sequence in the same size.

Return type:

list

Examples

>>> seq = [[4, 3, 3], [2], [3, 3, 2]]
>>> pad_sequence(seq)
[[4, 3, 3], [2, 0, 0], [3, 3, 2]]
>>> pad_sequence(seq, pad_val=1)
[[4, 3, 3], [2, 1, 1], [3, 3, 2]]
>>> pad_sequence(seq, max_length=2)
[[4, 3], [2, 0], [3, 3]]
>>> pad_sequence(seq, max_length=2, clip=False)
[[4, 3, 3], [2, 0], [3, 3, 2]]

EduNLP.ModelZoo.utils.set_device(_net, ctx, *args, **kwargs)[source]¶: code from longling v1.3.26

class EduNLP.ModelZoo.utils.Masker(mask: (<class 'int'>, <class 'str'>, Ellipsis) = 0, per=0.2, seed=None)[source]¶

Parameters:

mask (int, str) –
per –
seed –

Examples

>>> masker = Masker(per=0.5, seed=10)
>>> items = [[1, 1, 3, 4, 6], [2], [5, 9, 1, 4]]
>>> masked_seq, mask_label = masker(items)
>>> masked_seq
[[1, 1, 0, 0, 6], [2], [0, 9, 0, 4]]
>>> mask_label
[[0, 0, 1, 1, 0], [0], [1, 0, 1, 0]]
>>> items = [[1, 2, 3], [1, 1, 0], [2, 0, 0]]
>>> masked_seq, mask_label = masker(items, [3, 2, 1])
>>> masked_seq
[[1, 0, 3], [0, 1, 0], [2, 0, 0]]
>>> mask_label
[[0, 1, 0], [1, 0, 0], [0, 0, 0]]
>>> masker = Masker(mask="[MASK]", per=0.5, seed=10)
>>> items = [["a", "b", "c"], ["d", "[PAD]", "[PAD]"], ["hello", "world", "[PAD]"]]
>>> masked_seq, mask_label = masker(items, length=[3, 1, 2])
>>> masked_seq
[['a', '[MASK]', 'c'], ['d', '[PAD]', '[PAD]'], ['hello', '[MASK]', '[PAD]']]
>>> mask_label
[[0, 1, 0], [0, 0, 0], [0, 1, 0]]

Returns:: list of masked_seq and list of masked_list
Return type:: list

EduNLP.ModelZoo.utils.load_items(data_path)[source]¶

class EduNLP.ModelZoo.utils.MLP(in_dim, n_classes, hidden_dim, dropout, n_layers=2, act=<function leaky_relu>)[source]¶

forward(input)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class EduNLP.ModelZoo.utils.TextCNN(embed_dim, hidden_dim)[source]¶

forward(embed)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class EduNLP.ModelZoo.utils.KnowledgePredictionOutput[source]¶

loss: FloatTensor = None¶

logits: FloatTensor = None¶

class EduNLP.ModelZoo.utils.ModelOutput[source]¶

Base class for all model outputs as dataclass. Has a __getitem__ that allows indexing by integer or slice (like a tuple) or strings (like a dictionary) that will ignore the None attributes. Otherwise behaves like a regular python dictionary.

You can’t unpack a ModelOutput directly. Use the [~utils.ModelOutput.to_tuple] method to convert it to a tuple before.

</Tip>

setdefault(*args, **kwargs)[source]¶

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

pop(k[, d]) → v, remove specified key and return the corresponding[source]¶: value. If key is not found, d is returned if given, otherwise KeyError is raised.

update([E, ]**F) → None. Update D from dict/iterable E and F.[source]¶: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

to_tuple() → Tuple[Any][source]¶: Convert self to a tuple containing all the attributes/keys that are not None.

class EduNLP.ModelZoo.utils.PropertyPredictionOutput[source]¶

loss: FloatTensor = None¶

logits: FloatTensor = None¶

EduNLP.ModelZoo.utils.gather_nd(params, indices)[source]¶

_summary_

Parameters:

params (_type_) – _description_
indices (_type_) – _description_

Returns:

_type_ – _description_
Examples
———
>>> gather_nd(
… params=torch.tensor([[1, 2, 3],
… [4, 5, 6]]),
… indices=torch.tensor([[1],
… [0]]))
tensor([[4, 5, 6], – [1, 2, 3]])

EduNLP.ModelZoo.utils.sequence_mask(lengths, max_len=None)[source]¶

Same as tf.sequence_mask, Returns a mask tensor representing the first N positions of each cell.

Parameters:

lengths (_type_) – integer tensor, all its values <= maxlen.
max_len (_type_, optional) – scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths.

Returns:

_type_ – A mask tensor of shape lengths.shape + (maxlen,)
Examples
———
>>> sequence_mask(torch.tensor([1, 3, 2]), 5)
tensor([[ True, False, False, False, False], – [ True, True, True, False, False], [ True, True, False, False, False]])
>>> sequence_mask(torch.tensor([[1, 3],[2,0]]))
tensor([[[ True, False, False], – [ True, True, True]],
<BLANKLINE> –

[[ True, True, False],
[False, False, False]]])