EduNLP.Pipeline¶

Pipeline¶

class EduNLP.Pipeline.base.PreProcessingPipeline(pipe_names: Optional[Union[str, List[str]]] = None)[source]¶

A pipeline for tokenization processing. You should use it by calling pipeline(‘pre-process’), instead of itself directly.

Parameters: pipe_names (str or List[str], optional) – The quickly initialized pipeline components. For availabel pipes, check TOKENIZE_PIPES in components. To add componets more flexiblely with specific arguments or custom name, use add_pipe.

Examples

>>> tkn = PreProcessingPipeline(['is_sif', 'to_sif', 'is_sif', 'seg_describe'])
>>> tkn.add_pipe(name='seg', symbol='fm', before='seg_describe')
>>> tkn.component_names
['is_sif', 'to_sif', 'is_sif', 'seg', 'seg_describe']
>>> item = "如图所示，则三角形ABC的面积是_。"
>>> tkn(item)
False
True
{'t': 3, 'f': 1, 'g': 0, 'm': 1}
['如图所示，则三角形', '[FORMULA]', '的面积是', '[MARK]', '。']
>>> tkn.rename_pipe(0, 'is_sif_lol')
>>> tkn.add_pipe('to_sif', component=lambda x:x, first=True) # This won't succeed for the same name pipe exists
>>> tkn.add_pipe('identify', component=lambda x:x, before=1)
>>> tkn.component_names
['is_sif_lol', 'identify', 'to_sif', 'is_sif_lol', 'seg', 'seg_describe']

add_pipe(name: str, component: Optional[Callable] = None, before: Optional[Union[int, str]] = None, after: Optional[Union[int, str]] = None, first: Optional[bool] = None, last: Optional[bool] = None, *args, **kwargs)[source]¶

Add a component to the tokenization pipeline. Valid component must be Callable and feat its next component. Only one parameter of before/after/first/last can be set. Default setting is last. Notice: 1. Please try to avoid more than one usages of one same pipe, otherwise you can only modify them with index.

i.e. before and after works well only when the pipe is unique.

The *args, **kwargs parameters will be passed to component constructor in PREPROCESSING_PIPES,
and this only works when you do not give a callable component.

Parameters

name (str, required) – the name of pipe
component (Callable, optional) – the custom pipe component, be careful with its nearest components’ input&output.
before (str or int, optional) – name or index of the component to insert new component directly before. Index start from 0.
after (str or int, optional) – name or index of the component to insert new component directly after. Index start from 0.
first (bool, optional) – if true, insert the component first in the pipeline.
last (bool, optional) – if true, insert the component last in the pipeline.

remove_pipe(pipe: Union[str, int])[source]¶: Remove a component from the pre-processing pipeline

rename_pipe(old_pipe: Union[str, int], new_name: str)[source]¶

Rename a component from the pre-processing pipeline.

Parameters

old_pipe (str or int, required) – old component name for str, or old component index in the pipeline for int
new_name (str, required) – new name for the component

property component_names¶: Get the names of pipeline components

property pipeline¶: Get the processing pipeline consisting of (name, component) tuples.

class EduNLP.Pipeline.base.Pipeline(task: Optional[str] = None, model: Optional[BaseModel] = None, tokenizer: Optional[PretrainedEduTokenizer] = None, preproc_pipe_names: Optional[List] = None, **kwargs)[source]¶

The pipeline class is the class from which all pipelines inherit. Pipeline workflow is defined as a sequence of the following operations:

Input -> PreProcessingPipeline -> Tokenization -> Model Inference
-> Post-Processing (downstream task dependent) -> Output

This class is not for using directly, refer to Pipeline.__init__.pipeline function.

abstract postprocess(model_outputs: ModelOutput, **postprocess_parameters: Dict) → Any[source]¶: postprocess will receive the outputs of _forward method and reformat them into something more friendly based on specific task.

add_pipe(*args, **kwargs)[source]¶: refer to PreProcessingPipeline.add_pipe

remove_pipe(*args, **kwargs)[source]¶: refer to PreProcessingPipeline.remove_pipe

rename_pipe(*args, **kwargs)[source]¶: refer to PreProcessingPipeline.rename_pipe

property component_names¶: Get the names of pipeline components

property pipeline¶: Get the processing pipeline consisting of (name, component) tuples.

run_single(inputs, tokenize_params, model_params, postprocess_params)[source]¶

Components¶

class EduNLP.Pipeline.components.BasePipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.IsSifPipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.ToSifPipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.Dict2Str4SifPipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.Sif4SciPipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.SegPipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.SegDescribePipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.SegFilterPipe(*args, **kwargs)[source]¶

class EduNLP.Pipeline.components.TokenizePipe(*args, **kwargs)[source]¶

Property prediction¶

class EduNLP.Pipeline.property_prediction.PropertyPredictionPipeline(**kwargs)[source]¶

postprocess(model_outputs, **postprocess_params)[source]¶: postprocess will receive the outputs of _forward method and reformat them into something more friendly based on specific task.

Knowledge prediction¶

class EduNLP.Pipeline.knowledge_prediction.KnowledgePredictionPipeline(**kwargs)[source]¶

postprocess(model_outputs, **postprocess_params)[source]¶: postprocess will receive the outputs of _forward method and reformat them into something more friendly based on specific task.