diff --git a/docs/篇章4-使用Transformers解决NLP任务/4.6-生成任务-机器翻译.md b/docs/篇章4-使用Transformers解决NLP任务/4.6-生成任务-机器翻译.md index 5a62afe..fcfc887 100644 --- a/docs/篇章4-使用Transformers解决NLP任务/4.6-生成任务-机器翻译.md +++ b/docs/篇章4-使用Transformers解决NLP任务/4.6-生成任务-机器翻译.md @@ -5,9 +5,51 @@ ```python -! pip install datasets transformers sacrebleu sentencepiece +! pip install datasets transformers "sacrebleu>=1.4.12,<2.0.0" sentencepiece ``` + Requirement already satisfied: datasets in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (1.6.2) + Requirement already satisfied: transformers in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (4.4.2) + Collecting sacrebleu<2.0.0,>=1.4.12 + Downloading sacrebleu-1.5.1-py3-none-any.whl (54 kB) + [K |████████████████████████████████| 54 kB 235 kB/s + [?25hCollecting sentencepiece + Downloading sentencepiece-0.1.96-cp38-cp38-macosx_10_6_x86_64.whl (1.1 MB) + [K |████████████████████████████████| 1.1 MB 438 kB/s + [?25hRequirement already satisfied: numpy>=1.17 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (1.21.1) + Requirement already satisfied: multiprocess in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (0.70.12.2) + Requirement already satisfied: fsspec in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (2021.7.0) + Requirement already satisfied: huggingface-hub<0.1.0 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (0.0.15) + Requirement already satisfied: pandas in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (1.3.1) + Requirement already satisfied: dill in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (0.3.4) + Requirement already satisfied: tqdm<4.50.0,>=4.27 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (4.49.0) + Requirement already satisfied: pyarrow>=1.0.0<4.0.0 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (5.0.0) + Requirement already satisfied: xxhash in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (2.0.2) + Requirement already satisfied: requests>=2.19.0 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (2.26.0) + Requirement already satisfied: packaging in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from datasets) (20.9) + Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from transformers) (0.10.3) + Requirement already satisfied: regex!=2019.12.17 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from transformers) (2021.8.3) + Requirement already satisfied: sacremoses in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from transformers) (0.0.45) + Requirement already satisfied: filelock in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from transformers) (3.0.12) + Collecting portalocker==2.0.0 + Downloading portalocker-2.0.0-py2.py3-none-any.whl (11 kB) + Requirement already satisfied: typing-extensions in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from huggingface-hub<0.1.0->datasets) (3.10.0.0) + Requirement already satisfied: pyparsing>=2.0.2 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from packaging->datasets) (2.4.7) + Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (1.26.6) + Requirement already satisfied: certifi>=2017.4.17 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2021.5.30) + Requirement already satisfied: charset-normalizer~=2.0.0 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2.0.4) + Requirement already satisfied: idna<4,>=2.5 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (3.2) + Requirement already satisfied: python-dateutil>=2.7.3 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from pandas->datasets) (2.8.2) + Requirement already satisfied: pytz>=2017.3 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from pandas->datasets) (2021.1) + Requirement already satisfied: six>=1.5 in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.16.0) + Requirement already satisfied: click in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from sacremoses->transformers) (8.0.1) + Requirement already satisfied: joblib in /Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/lib/python3.8/site-packages (from sacremoses->transformers) (1.0.1) + Installing collected packages: portalocker, sentencepiece, sacrebleu + Successfully installed portalocker-2.0.0 sacrebleu-1.5.1 sentencepiece-0.1.96 + [33mWARNING: You are using pip version 21.2.3; however, version 21.2.4 is available. + You should consider upgrading via the '/Users/niepig/Desktop/zhihu/learn-nlp-with-transformers/venv/bin/python3 -m pip install --upgrade pip' command.[0m + + 如果您正在本地打开这个notebook,请确保您认真阅读并安装了transformer-quick-start-zh的readme文件中的所有依赖库。您也可以在[这里](https://github.com/huggingface/transformers/tree/master/examples/seq2seq)找到本notebook的多GPU分布式训练版本。 # 微调transformer模型解决翻译任务 @@ -44,6 +86,25 @@ raw_datasets = load_dataset("wmt16", "ro-en") metric = load_metric("sacrebleu") ``` + Downloading: 2.81kB [00:00, 523kB/s] + Downloading: 3.19kB [00:00, 758kB/s] + Downloading: 41.0kB [00:00, 11.0MB/s] + + + Downloading and preparing dataset wmt16/ro-en (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /Users/niepig/.cache/huggingface/datasets/wmt16/ro-en/1.0.0/0d9fb3e814712c785176ad8cdb9f465fbe6479000ee6546725db30ad8a8b5f8a... + + + Downloading: 100%|██████████| 225M/225M [00:18<00:00, 12.2MB/s] + Downloading: 100%|██████████| 23.5M/23.5M [00:16<00:00, 1.44MB/s] + Downloading: 100%|██████████| 38.7M/38.7M [00:03<00:00, 9.82MB/s] + + + Dataset wmt16 downloaded and prepared to /Users/niepig/.cache/huggingface/datasets/wmt16/ro-en/1.0.0/0d9fb3e814712c785176ad8cdb9f465fbe6479000ee6546725db30ad8a8b5f8a. Subsequent calls will reuse this data. + + + Downloading: 5.40kB [00:00, 2.08MB/s] + + 这个datasets对象本身是一种[`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict)数据结构. 对于训练集、验证集和测试集,只需要使用对应的key(train,validation,test)即可得到相应的数据。 @@ -128,23 +189,23 @@ show_random_elements(raw_datasets["train"])