Usage#
Via Command Line Interface (CLI)#
!ekorpkit
/usr/bin/sh: 1: ekorpkit: not found
CLI example to build a corpus#
ekorpkit --config-dir /workspace/projects/ekorpkit-book/config \
project=esgml \
dir.workspace=/workspace \
verbose=false \
print_config=false \
num_workers=1 \
cmd=fetch_builtin_corpus \
+corpus/builtin=_dummy_fomc_minutes \
corpus.builtin.io.force.summarize=true \
corpus.builtin.io.force.preprocess=true \
corpus.builtin.io.force.build=false \
corpus.builtin.io.force.download=false
CLI Help#
To see the available configurations for CLI, run the command:
!ekorpkit --help
/usr/bin/sh: 1: ekorpkit: not found
!ekorpkit --info defaults
/usr/bin/sh: 1: ekorpkit: not found
Via Python#
Compose an ekorpkit config#
from ekorpkit import eKonf
cfg = eKonf.compose()
print('Config type:', type(cfg))
eKonf.pprint(cfg)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[4], line 1
----> 1 from ekorpkit import eKonf
2 cfg = eKonf.compose()
3 print('Config type:', type(cfg))
ModuleNotFoundError: No module named 'ekorpkit'
Instantiating objects with an ekorpkit config#
compose a config for the nltk class#
from ekorpkit import eKonf
config_group='preprocessor/tokenizer=nltk'
cfg = eKonf.compose(config_group=config_group)
eKonf.pprint(cfg)
nltk = eKonf.instantiate(cfg)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[5], line 1
----> 1 from ekorpkit import eKonf
2 config_group='preprocessor/tokenizer=nltk'
3 cfg = eKonf.compose(config_group=config_group)
ModuleNotFoundError: No module named 'ekorpkit'
text = "I shall reemphasize some of those thoughts today in the context of legislative proposals that are now before the current Congress."
nltk.tokenize(text)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 2
1 text = "I shall reemphasize some of those thoughts today in the context of legislative proposals that are now before the current Congress."
----> 2 nltk.tokenize(text)
NameError: name 'nltk' is not defined
nltk.nouns(text)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 nltk.nouns(text)
NameError: name 'nltk' is not defined
compose a config for the mecab class#
config_group='preprocessor/tokenizer=mecab'
cfg = eKonf.compose(config_group=config_group)
eKonf.pprint(cfg)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 2
1 config_group='preprocessor/tokenizer=mecab'
----> 2 cfg = eKonf.compose(config_group=config_group)
3 eKonf.pprint(cfg)
NameError: name 'eKonf' is not defined
intantiate a mecab config and tokenize a text#
mecab = eKonf.instantiate(cfg)
text = 'IMF가 推定한 우리나라의 GDP갭률은 今年에도 소폭의 마이너스(−)를 持續하고 있다.'
mecab.tokenize(text)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 mecab = eKonf.instantiate(cfg)
2 text = 'IMF가 推定한 우리나라의 GDP갭률은 今年에도 소폭의 마이너스(−)를 持續하고 있다.'
3 mecab.tokenize(text)
NameError: name 'eKonf' is not defined
compose and instantiate a formal_ko
config for the normalizer class#
config_group='preprocessor/normalizer=formal_ko'
cfg_norm = eKonf.compose(config_group=config_group)
norm = eKonf.instantiate(cfg_norm)
norm(text)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 2
1 config_group='preprocessor/normalizer=formal_ko'
----> 2 cfg_norm = eKonf.compose(config_group=config_group)
3 norm = eKonf.instantiate(cfg_norm)
4 norm(text)
NameError: name 'eKonf' is not defined
instantiate a mecab config with the above normalizer config#
config_group='preprocessor/tokenizer=mecab'
cfg = eKonf.compose(config_group=config_group)
cfg.normalize = cfg_norm
mecab = eKonf.instantiate(cfg)
mecab.tokenize(text)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 2
1 config_group='preprocessor/tokenizer=mecab'
----> 2 cfg = eKonf.compose(config_group=config_group)
3 cfg.normalize = cfg_norm
4 mecab = eKonf.instantiate(cfg)
NameError: name 'eKonf' is not defined