EduNLP.I2V

class EduNLP.I2V.i2v.I2V(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]

It just a api, so you shouldn’t use it directly. If you want to get vector from item, you can use other model like D2V and W2V.

Parameters:
  • tokenizer (str) – the name of tokenizer. eg. bert, pure_text, …

  • t2v (str) – the name of token2vector model

  • args – the parameters passed to t2v

  • tokenizer_kwargs (dict) – the parameters passed to tokenizer

  • pretrained_t2v (bool) –

    • True: use pretrained t2v model

    • False: use your own t2v model

  • model_dir (str) – local directionary for saving online pretrained models, work only when pretrained_t2v=True

  • kwargs – the parameters passed to t2v

Examples

>>> item = {"如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$,     ... 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,    ... 此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"}
>>> model_dir = "examples/test_model/d2v"
>>> url, model_name, *args = get_pretrained_model_info('d2v_test_256')
>>> (); path = get_data(url, model_dir); () 
(...)
>>> path = path_append(path, os.path.basename(path) + '.bin', to_str=True)
>>> i2v = D2V("pure_text", "d2v", filepath=path, pretrained_t2v=False)
>>> i2v(item)
([array([ ...dtype=float32)], None)
Returns:

i2v model

Return type:

I2V

tokenize(items, *args, key=<function I2V.<lambda>>, **kwargs) list[source]
infer_vector(items, key=<function I2V.<lambda>>, **kwargs) tuple[source]
infer_item_vector(tokens, *args, **kwargs) ...[source]
infer_token_vector(tokens, *args, **kwargs) ...[source]
save(config_path)[source]
classmethod load(config_path, *args, **kwargs)[source]
classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
property vector_size
class EduNLP.I2V.i2v.D2V(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]

The model aims to transfer item to vector directly.

Bases

I2V

param tokenizer:

the tokenizer name

type tokenizer:

str

param t2v:

the name of token2vector model

type t2v:

str

param args:

the parameters passed to t2v

param tokenizer_kwargs:

the parameters passed to tokenizer

type tokenizer_kwargs:

dict

param pretrained_t2v:

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v:

bool

param kwargs:

the parameters passed to t2v

Examples

>>> item = {"如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$,     ... 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,    ... 此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"}
>>> model_dir = "examples/test_model/d2v"
>>> url, model_name, *args = get_pretrained_model_info('d2v_test_256')
>>> (); path = get_data(url, model_dir); () 
(...)
>>> path = path_append(path, os.path.basename(path) + '.bin', to_str=True)
>>> i2v = D2V("pure_text","d2v",filepath=path, pretrained_t2v = False)
>>> i2v(item)
([array([ ...dtype=float32)], None)
returns:

i2v model

rtype:

I2V

infer_vector(items, tokenize=True, key=<function D2V.<lambda>>, *args, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is necessary to load model.

Parameters:
  • items (str) – the text of question

  • tokenize (bool) – True: tokenize the item

  • key (function) – determine how to get the text of each item

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns:

vector

Return type:

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
class EduNLP.I2V.i2v.W2V(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]

The model aims to transfer tokens to vector.

Bases

I2V

param tokenizer:

the tokenizer name

type tokenizer:

str

param t2v:

the name of token2vector model

type t2v:

str

param args:

the parameters passed to t2v

param tokenizer_kwargs:

the parameters passed to tokenizer

type tokenizer_kwargs:

dict

param pretrained_t2v:

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v:

bool

param kwargs:

the parameters passed to t2v

Examples

>>> (); i2v = get_pretrained_i2v("w2v_test_256", "examples/test_model/w2v"); () 
(...)
>>> item_vector, token_vector = i2v(["有学者认为:‘学习’,必须适应实际"]) 
>>> item_vector 
[array([...], dtype=float32)]
returns:

i2v model

rtype:

W2V

infer_vector(items, tokenize=True, key=<function W2V.<lambda>>, *args, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is necessary to load model.

Parameters:
  • items (str) – the text of question

  • tokenize (bool) – True: tokenize the item

  • key (function) – determine how to get the text of each item

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns:

vector

Return type:

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
class EduNLP.I2V.i2v.Elmo(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]

The model aims to transfer item and tokens to vector with Elmo.

Bases

I2V

param tokenizer:

the tokenizer name

type tokenizer:

str

param t2v:

the name of token2vector model

type t2v:

str

param args:

the parameters passed to t2v

param tokenizer_kwargs:

the parameters passed to tokenizer

type tokenizer_kwargs:

dict

param pretrained_t2v:

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v:

bool

param kwargs:

the parameters passed to t2v

returns:

i2v model

rtype:

Elmo

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function Elmo.<lambda>>, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is necessary to load model.

Parameters:
  • items (str or dict or list) – the item of question, or question list

  • return_tensors (str) – tensor type used in tokenizer

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns:

vector

Return type:

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', device='cpu', *args, **kwargs)[source]
class EduNLP.I2V.i2v.Bert(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]

The model aims to transfer item and tokens to vector with Bert.

Bases

I2V

param tokenizer:

the tokenizer name

type tokenizer:

str

param t2v:

the name of token2vector model

type t2v:

str

param args:

the parameters passed to t2v

param tokenizer_kwargs:

the parameters passed to tokenizer

type tokenizer_kwargs:

dict

param pretrained_t2v:

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v:

bool

param kwargs:

the parameters passed to t2v

returns:

i2v model

rtype:

Bert

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function Bert.<lambda>>, return_tensors='pt', **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is nesseary to load model.

Parameters:
  • items (str or dict or list) – the item of question, or question list

  • return_tensors (str) – tensor type used in tokenizer

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns:

vector

Return type:

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', device='cpu', *args, **kwargs)[source]
class EduNLP.I2V.i2v.DisenQ(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]

The model aims to transfer item and tokens to vector with DisenQ. Bases ——- I2V :param tokenizer: the tokenizer name :type tokenizer: str :param t2v: the name of token2vector model :type t2v: str :param args: the parameters passed to t2v :param tokenizer_kwargs: the parameters passed to tokenizer :type tokenizer_kwargs: dict :param pretrained_t2v: True: use pretrained t2v model

False: use your own t2v model

Parameters:

kwargs – the parameters passed to t2v

Returns:

i2v model

Return type:

DisenQ

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function DisenQ.<lambda>>, vector_type=None, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is nesseary to load model. :param items: the item of question, or question list :type items: str or dict or list :param key: determine how to get the text of each item :type key: function :param args: the parameters passed to t2v :param kwargs: the parameters passed to t2v

Returns:

vector

Return type:

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]
class EduNLP.I2V.i2v.QuesNet(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', device='cpu', **kwargs)[source]

The model aims to transfer item and tokens to vector with quesnet. Bases ——- I2V

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function QuesNet.<lambda>>, meta=['know_name'], **kwargs)[source]

It is a function to switch item to vector. And before using the function, it is nesseary to load model. :param items: the item of question, or question list :type items: str or dict or list :param tokenize: True: tokenize the item :type tokenize: bool, optional :param key: determine how to get the text of each item, by default lambdax: x :type key: function, optional :param meta: meta information, by default [‘know_name’] :type meta: list, optional :param args: the parameters passed to t2v :param kwargs: the parameters passed to t2v

Returns:

  • token embeddings

  • question embedding

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', device='cpu', *args, **kwargs)[source]
EduNLP.I2V.i2v.get_pretrained_i2v(name, model_dir='/home/docs/.EduNLP/model', device='cpu')[source]

It is a good idea if you want to switch item to vector earily.

Parameters:
  • name (str) – the name of item2vector model e.g.: d2v_math_300 w2v_math_300 elmo_math_2048 bert_math_768 bert_taledu_768 disenq_math_256 quesnet_math_512

  • model_dir (str) – the path of model, default: MODEL_DIR = ‘~/.EduNLP/model’

Returns:

i2v model

Return type:

I2V

Examples

>>> item = {"如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$,     ... 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,    ... 此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"}
>>> (); i2v = get_pretrained_i2v("d2v_test_256", "examples/test_model/d2v"); () 
(...)
>>> print(i2v(item)) 
([array([ ...dtype=float32)], None)