{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "PfSMqenAwpKI", "pycharm": { "name": "#%% md\n" } }, "source": [ "# Improving classification datasets" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "hide-cell" ] }, "source": [ "## Install or upgrade of ekorpkit\n", "\n", "```{note}\n", "Install ekorpkit package first.\n", "\n", "Set logging level to Warning, if you don't want to see verbose logging.\n", "\n", "If you run this notebook in Colab, set Hardware accelerator to GPU.\n", "```\n", "```{toggle}\n", "!pip install -U --pre ekorpkit\n", "\n", "exit()\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "!pip install -U --pre ekorpkit\n", "!pip install -U matplotlib simpletransformers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare an environment" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "gN8kk98Rwuzs", "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "version: 0.1.38+40.geb12c43.dirty\n", "is notebook? True\n", "is colab? False\n", "environment variables:\n", "{'CUDA_DEVICE_ORDER': None,\n", " 'CUDA_VISIBLE_DEVICES': None,\n", " 'EKORPKIT_CONFIG_DIR': '/workspace/projects/ekorpkit-book/config',\n", " 'EKORPKIT_DATA_DIR': None,\n", " 'EKORPKIT_LOG_LEVEL': 'WARNING',\n", " 'EKORPKIT_PROJECT': 'ekorpkit-book',\n", " 'EKORPKIT_WORKSPACE_ROOT': '/workspace',\n", " 'KMP_DUPLICATE_LIB_OK': 'TRUE',\n", " 'NUM_WORKERS': 230}\n" ] } ], "source": [ "%config InlineBackend.figure_format='retina'\n", "from ekorpkit import eKonf\n", "\n", "eKonf.setLogger(\"WARNING\")\n", "print(\"version:\", eKonf.__version__)\n", "print(\"is notebook?\", eKonf.is_notebook())\n", "print(\"is colab?\", eKonf.is_colab())\n", "print(\"environment variables:\")\n", "eKonf.print(eKonf.env().dict())" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "data_dir = \"../data/cointax\"" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Building `cointax_polarity` dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 714 entries, 0 to 713\n", "Data columns (total 2 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 긍정 478 non-null object\n", " 1 부정 236 non-null object\n", "dtypes: object(2)\n", "memory usage: 11.3+ KB\n" ] } ], "source": [ "raw_data = eKonf.load_data(\"cointax.csv\", data_dir)\n", "raw_data.info()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
textlabels
0가상자산을 안해서긍정
1기타 자산과 동일하게 적용긍정
2그냥 저냥긍정
3부당이익에 대한 해소긍정
4땀흘려 일하지 않고 쉽게 돈 벌려는 사회적인 풍토를 근절해야 된다고 생각해서긍정
.........
709소득과세가 너누 높다부정
710주식 거래세 수수료보다도 너무 높다고 생각하기에부정
711너무 심하다고 생각부정
712......부정
713수익의 20%는 너무 많다고 생각한다부정
\n", "

714 rows × 2 columns

\n", "
" ], "text/plain": [ " text labels\n", "0 가상자산을 안해서 긍정\n", "1 기타 자산과 동일하게 적용 긍정\n", "2 그냥 저냥 긍정\n", "3 부당이익에 대한 해소 긍정\n", "4 땀흘려 일하지 않고 쉽게 돈 벌려는 사회적인 풍토를 근절해야 된다고 생각해서 긍정\n", ".. ... ...\n", "709 소득과세가 너누 높다 부정 \n", "710 주식 거래세 수수료보다도 너무 높다고 생각하기에 부정 \n", "711 너무 심하다고 생각 부정 \n", "712 ...... 부정 \n", "713 수익의 20%는 너무 많다고 생각한다 부정 \n", "\n", "[714 rows x 2 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "data = []\n", "for polarity, rows in raw_data.to_dict().items():\n", " data += [dict(text=text, labels=polarity) for text in rows.values() if text is not np.NaN]\n", "data = eKonf.records_to_dataframe(data)\n", "eKonf.save_data(data, \"cointax_rawdata.parquet\", data_dir)\n", "data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:ekorpkit.base:Loaded .env from /workspace/projects/ekorpkit-book/config/.env\n", "INFO:ekorpkit.base:setting environment variable CACHED_PATH_CACHE_ROOT to /workspace/.cache/cached_path\n", "INFO:ekorpkit.base:setting environment variable KMP_DUPLICATE_LIB_OK to TRUE\n", "INFO:ekorpkit.base:Applying pipe: functools.partial()\n", "INFO:ekorpkit.base:Applying pipe: functools.partial()\n", "INFO:ekorpkit.base:Applying pipe: functools.partial()\n", "INFO:ekorpkit.base:Using batcher with minibatch size: 3\n" ] }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.01369476318359375, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "apply len_bytes to num_bytes", "rate": null, "total": 171, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "499fe8183a2a402680d882b49846e10d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "apply len_bytes to num_bytes: 0%| | 0/171 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtextlabelssplit
713379소득이 있으면 과세하여야된다긍정dev
\n", "" ], "text/plain": [ " id text labels split\n", "713 379 소득이 있으면 과세하여야된다 긍정 dev" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.data.tail(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross validation of `cointax_polarity_kr` dataset" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:ekorpkit.base:Calling eval\n" ] }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.033258676528930664, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 143, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "f02bb003b6984bd4967fe1fe11e21174", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/143 [00:00" ] }, "metadata": { "image/png": { "height": 691, "width": 770 }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "overrides=[\n", " '+model/transformer=classification',\n", " '+model/transformer/pretrained=ekonelectra-base',\n", "]\n", "model_cfg = eKonf.compose('model/transformer=classification', overrides)\n", "model_cfg.name = \"cointax_polarity_kr_improved\"\n", "model_cfg.dataset = ds_cfg\n", "model_cfg.verbose = False\n", "model_cfg.config.num_train_epochs = 2\n", "model_cfg.config.max_seq_length = 256\n", "model_cfg.config.train_batch_size = 32\n", "model_cfg.config.eval_batch_size = 32\n", "model_cfg._method_ = ['train', 'eval']\n", "model_cfg.model.eval.visualize.plot.figure.figsize = (12,10)\n", "model = eKonf.instantiate(model_cfg)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Some weights of the model checkpoint at entelecheia/ekonelectra-base-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.weight', 'discriminator_predictions.dense.bias']\n", "- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", "- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", "Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at entelecheia/ekonelectra-base-discriminator and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013764619827270508, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 450, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "3b6e6c69fe0e43e59e352778c1e7fcb8", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/450 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced iconic-fire-2: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/12dicm7k
Synced 5 W&B file(s), 1 media file(s), 1 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105602-12dicm7k/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:12dicm7k). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105714-290stj9y" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run robust-snowball-3 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013561248779296875, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "Running Epoch 0 of 2", "rate": null, "total": 15, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "ce03fe20b3f94337ad9164d7b662b0cb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Running Epoch 0 of 2: 0%| | 0/15 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc▁█
eval_loss█▁
fn▁▁
fp█▁
global_step▁█
mcc▁█
tn▁█
tp▁▁
train_loss▁█

Run summary:


acc0.72566
eval_loss0.48114
fn0
fp31
global_step30
mcc0.31468
tn5
tp77
train_loss0.50216

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced robust-snowball-3: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/290stj9y
Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105714-290stj9y/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:290stj9y). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105735-h58skvyd" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run driven-universe-4 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013956546783447266, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 141, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "6716ae2bf63d4d3a8a4c69ca04136f4e", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/141 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced driven-universe-4: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/h58skvyd
Synced 5 W&B file(s), 1 media file(s), 1 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105735-h58skvyd/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:h58skvyd). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105800-3qlwbqjn" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run devoted-wood-5 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013576745986938477, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "Running Epoch 0 of 2", "rate": null, "total": 15, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "5f394912307e4fe9a75e6191cd81ac65", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Running Epoch 0 of 2: 0%| | 0/15 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc▁█
eval_loss█▁
fn▁▁
fp█▁
global_step▁█
mcc▁█
tn▁█
tp▁▁
train_loss█▁

Run summary:


acc0.68142
eval_loss0.55352
fn0
fp36
global_step30
mcc0.25879
tn4
tp73
train_loss0.2729

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced devoted-wood-5: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/3qlwbqjn
Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105800-3qlwbqjn/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:3qlwbqjn). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105819-4ejvp2n6" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run desert-sunset-6 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.012962102890014648, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 141, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "216c36ce57ce429190dbaac985077eae", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/141 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced desert-sunset-6: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/4ejvp2n6
Synced 5 W&B file(s), 1 media file(s), 1 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105819-4ejvp2n6/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:4ejvp2n6). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105841-35oyb7gm" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run royal-planet-7 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013506889343261719, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "Running Epoch 0 of 2", "rate": null, "total": 15, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "4866814254284a47b91a696d7a182ed6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Running Epoch 0 of 2: 0%| | 0/15 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc▁▁
eval_loss█▁
fn▁▁
fp▁▁
global_step▁█
mcc▁▁
tn▁▁
tp▁▁
train_loss█▁

Run summary:


acc0.65487
eval_loss0.57384
fn0
fp39
global_step30
mcc0.0
tn0
tp74
train_loss0.36175

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced royal-planet-7: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/35oyb7gm
Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105841-35oyb7gm/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:35oyb7gm). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105902-2j3r3dbt" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run laced-wind-8 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013118505477905273, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 141, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "1688a735a8e040c19ac1621cb5226ae4", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/141 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced laced-wind-8: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/2j3r3dbt
Synced 5 W&B file(s), 1 media file(s), 1 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105902-2j3r3dbt/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:2j3r3dbt). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105923-39gu4tqy" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run divine-dawn-9 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013357162475585938, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "Running Epoch 0 of 2", "rate": null, "total": 15, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "ce47c6ce66544d21998a50d7cb8dfb60", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Running Epoch 0 of 2: 0%| | 0/15 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc▁█
eval_loss█▁
fn▁▁
fp█▁
global_step▁█
mcc▁█
tn▁█
tp▁▁
train_loss▁█

Run summary:


acc0.79646
eval_loss0.47833
fn0
fp23
global_step30
mcc0.48514
tn10
tp80
train_loss0.82224

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced divine-dawn-9: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/39gu4tqy
Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105923-39gu4tqy/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:39gu4tqy). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105942-2ztdpcuc" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run lunar-voice-10 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.012902021408081055, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 141, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "f5d424f8aa9d4afab17a65b788a10ed8", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/141 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.001 MB of 0.002 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=0.433193…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced lunar-voice-10: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/2ztdpcuc
Synced 5 W&B file(s), 1 media file(s), 1 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_105942-2ztdpcuc/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:2ztdpcuc). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_110004-1of77ter" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run golden-smoke-11 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013396739959716797, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "Running Epoch 0 of 2", "rate": null, "total": 15, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "4e3bc526bcb3459c983c4267e121d64e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Running Epoch 0 of 2: 0%| | 0/15 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc▁█
eval_loss█▁
fn█▁
fp▁▁
global_step▁█
mcc▁█
tn▁▁
tp▁█
train_loss▁█

Run summary:


acc0.71681
eval_loss0.54457
fn32
fp0
global_step30
mcc0.33268
tn75
tp6
train_loss0.48069

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced golden-smoke-11: https://wandb.ai/entelecheia/ekorpkit-book-cointax_polarity_kr_improved/runs/1of77ter
Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_110004-1of77ter/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Successfully finished last run (ID:1of77ter). Initializing new run:
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.13.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /workspace/projects/ekorpkit-book/outputs/cointax_polarity_kr_improved/ekonelectra-base/wandb/run-20220906_110025-2cjg947h" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run super-universe-12 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.013166189193725586, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 140, "unit": "it", "unit_divisor": 1000, "unit_scale": false }, "application/vnd.jupyter.widget-view+json": { "model_id": "7ab7503ecb224f20a6b57ac22764b60c", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/140 [00:00