EduNLP.Pipeline¶
Pipeline¶
- class EduNLP.Pipeline.base.PreProcessingPipeline(pipe_names: Optional[Union[str, List[str]]] = None)[source]¶
A pipeline for tokenization processing. You should use it by calling pipeline(‘pre-process’), instead of itself directly.
- Parameters
pipe_names (str or List[str], optional) – The quickly initialized pipeline components. For availabel pipes, check TOKENIZE_PIPES in components. To add componets more flexiblely with specific arguments or custom name, use add_pipe.
Examples
>>> tkn = PreProcessingPipeline(['is_sif', 'to_sif', 'is_sif', 'seg_describe']) >>> tkn.add_pipe(name='seg', symbol='fm', before='seg_describe') >>> tkn.component_names ['is_sif', 'to_sif', 'is_sif', 'seg', 'seg_describe'] >>> item = "如图所示,则三角形ABC的面积是_。" >>> tkn(item) False True {'t': 3, 'f': 1, 'g': 0, 'm': 1} ['如图所示,则三角形', '[FORMULA]', '的面积是', '[MARK]', '。'] >>> tkn.rename_pipe(0, 'is_sif_lol') >>> tkn.add_pipe('to_sif', component=lambda x:x, first=True) # This won't succeed for the same name pipe exists >>> tkn.add_pipe('identify', component=lambda x:x, before=1) >>> tkn.component_names ['is_sif_lol', 'identify', 'to_sif', 'is_sif_lol', 'seg', 'seg_describe']
- add_pipe(name: str, component: Optional[Callable] = None, before: Optional[Union[int, str]] = None, after: Optional[Union[int, str]] = None, first: Optional[bool] = None, last: Optional[bool] = None, *args, **kwargs)[source]¶
Add a component to the tokenization pipeline. Valid component must be Callable and feat its next component. Only one parameter of before/after/first/last can be set. Default setting is last. Notice: 1. Please try to avoid more than one usages of one same pipe, otherwise you can only modify them with index.
i.e. before and after works well only when the pipe is unique.
- The *args, **kwargs parameters will be passed to component constructor in PREPROCESSING_PIPES,
and this only works when you do not give a callable component.
- Parameters
name (str, required) – the name of pipe
component (Callable, optional) – the custom pipe component, be careful with its nearest components’ input&output.
before (str or int, optional) – name or index of the component to insert new component directly before. Index start from 0.
after (str or int, optional) – name or index of the component to insert new component directly after. Index start from 0.
first (bool, optional) – if true, insert the component first in the pipeline.
last (bool, optional) – if true, insert the component last in the pipeline.
- rename_pipe(old_pipe: Union[str, int], new_name: str)[source]¶
Rename a component from the pre-processing pipeline.
- Parameters
old_pipe (str or int, required) – old component name for str, or old component index in the pipeline for int
new_name (str, required) – new name for the component
- property component_names¶
Get the names of pipeline components
- property pipeline¶
Get the processing pipeline consisting of (name, component) tuples.
- class EduNLP.Pipeline.base.Pipeline(task: Optional[str] = None, model: Optional[BaseModel] = None, tokenizer: Optional[PretrainedEduTokenizer] = None, preproc_pipe_names: Optional[List] = None, **kwargs)[source]¶
The pipeline class is the class from which all pipelines inherit. Pipeline workflow is defined as a sequence of the following operations:
- Input -> PreProcessingPipeline -> Tokenization -> Model Inference
-> Post-Processing (downstream task dependent) -> Output
This class is not for using directly, refer to Pipeline.__init__.pipeline function.
- abstract postprocess(model_outputs: ModelOutput, **postprocess_parameters: Dict) Any[source]¶
postprocess will receive the outputs of _forward method and reformat them into something more friendly based on specific task.
- property component_names¶
Get the names of pipeline components
- property pipeline¶
Get the processing pipeline consisting of (name, component) tuples.