diff --git a/Task1 气象数据分析常用工具.ipynb b/Task1 气象数据分析常用工具.ipynb new file mode 100644 index 0000000..9c9c3a9 --- /dev/null +++ b/Task1 气象数据分析常用工具.ipynb @@ -0,0 +1,8401 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", + "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" + }, + "source": [ + "# Datawhale 气象海洋预测-Task1 气象数据分析常用工具\n", + "气象科学中的数据通常包含多个维度,例如本赛题中给出的数据就包含年、月、经度、纬度四个维度,为了便于数据的读取和操作,气象数据通常采用netCDF文件来存储,文件后缀为.nc。\n", + "\n", + "对于以netCDF文件存储的气象数据,有两个常用的数据分析库,即NetCDF4和Xarray。在此次任务中,我们将学习这两个库的基本对象和基本操作,掌握用这两个库读取和处理气象数据的基本方法。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 学习目标\n", + "1.了解和学习NetCDF4和Xarray的基本对象和基本操作,掌握用这两个库读取和处理气象数据的基本方法。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 内容介绍\n", + "1. NetCDF4\n", + " - 创建、打开和关闭netCDF文件\n", + " - 组(Groups)\n", + " - 维度(Dimensions)\n", + " - 变量(Variables)\n", + " - 属性(Attributes)\n", + " - 写入或读取变量数据\n", + " - 应用\n", + "2. Xarray\n", + " - 创建DataArray\n", + " - 索引\n", + " - 属性\n", + " - 计算\n", + " - GroupBy\n", + " - 绘图\n", + " - 与Pandas对象相互转换\n", + " - Dataset\n", + " - 读/写netCDF文件\n", + " - 应用" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## NetCDF4\n", + "[官方文档](http://unidata.github.io/netcdf4-python/#introduction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NetCDF4是NetCDF C库的Python模块,支持Groups、Dimensions、Variables和Attributes等对象类型及其相关操作。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "安装NetCDF4" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Defaulting to user installation because normal site-packages is not writeable\n", + "Looking in indexes: https://mirrors.aliyun.com/pypi/simple\n", + "Collecting netCDF4\n", + " Downloading https://mirrors.aliyun.com/pypi/packages/11/e1/8f10f857f75dd8250b56fb9e78d6da166f1cf919724c602e016a9d20df63/netCDF4-1.5.7-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB)\n", + "\u001b[K |████████████████████████████████| 4.7 MB 224 kB/s eta 0:00:01\n", + "\u001b[?25hRequirement already satisfied: numpy>=1.9 in /opt/conda/lib/python3.6/site-packages (from netCDF4) (1.19.4)\n", + "Collecting cftime\n", + " Downloading https://mirrors.aliyun.com/pypi/packages/98/92/a680c68f6685a3f299ecf2e360546b56e994730bebe3325cafd36c5bf2cd/cftime-1.5.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (216 kB)\n", + "\u001b[K |████████████████████████████████| 216 kB 211 kB/s eta 0:00:01\n", + "\u001b[?25hInstalling collected packages: cftime, netCDF4\n", + "\u001b[33m WARNING: The scripts nc3tonc4, nc4tonc3 and ncinfo are installed in '/home/admin/.local/bin' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\n", + "Successfully installed cftime-1.5.0 netCDF4-1.5.7\n", + "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 21.1.2 is available.\n", + "You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.\u001b[0m\n" + ] + } + ], + "source": [ + "!pip install netCDF4" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import netCDF4 as nc" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 创建、打开和关闭netCDF文件" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NetCDF4可以通过调用Dataset创建netCDF文件或打开已存在的文件,并通过查看data_model属性确定文件的格式。需要注意创建或打开文件后要先关闭文件才能再次调用Dataset打开文件。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 创建netCDF文件" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from netCDF4 import Dataset\n", + "\n", + "# Dataset包含三个输入参数:文件名,模式(其中'w', 'r+', 'a'为可写入模式),文件格式\n", + "test = Dataset('test.nc', 'w', 'NETCDF4')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 打开已存在的netCDF文件" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# 打开训练样本中的SODA数据\n", + "soda = Dataset('SODA_train.nc')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 查看文件格式" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "NETCDF4\n" + ] + } + ], + "source": [ + "print(soda.data_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 关闭netCDF文件" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "soda.close()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Groups" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NetCDF4支持按层级的组(Groups)来组织数据,类似于文件系统中的目录,Groups中可以包含Variables、Dimenions、Attributes对象以及其他Groups对象,Dataset会创建一个特殊的Groups,称为根组(Root Group),类似于根目录,使用Dataset.createGroup方法创建的组都包含在根组中。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 创建Groups" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# 接受一个字符串参数作为Group名称\n", + "group1 = test.createGroup('group1')\n", + "group2 = test.createGroup('group2')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 查看文件中的所有Groups" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "OrderedDict([('group1', \n", + " group /group1:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: ), ('group2', \n", + " group /group2:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: )])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 返回一个Group字典\n", + "test.groups" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Groups嵌套" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# 在group1和group2下分别再创建一个Group\n", + "group1_1 = test.createGroup('group1/group11')\n", + "group2_1 = test.createGroup('group2/group21')" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "OrderedDict([('group1', \n", + " group /group1:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: group11), ('group2', \n", + " group /group2:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: group21)])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test.groups" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "odict_values([\n", + "group /group1:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: group11, \n", + "group /group2:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: group21])" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test.groups.values()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 遍历查看所有Groups" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "# 定义一个生成器函数用来遍历所有目录树\n", + "def walktree(top):\n", + " values = top.groups.values()\n", + " yield values\n", + " for value in top.groups.values():\n", + " for children in walktree(value):\n", + " yield children" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "group /group1:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: group11\n", + "\n", + "group /group2:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: group21\n", + "\n", + "group /group1/group11:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: \n", + "\n", + "group /group2/group21:\n", + " dimensions(sizes): \n", + " variables(dimensions): \n", + " groups: \n" + ] + } + ], + "source": [ + "for groups in walktree(test):\n", + " for group in groups:\n", + " print(group)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dimensions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NetCDF4用维度来定义各个变量的大小,例如本赛题中训练样本的第二维度month就是一个维度对象,每个样本包含36个月的数据,因此month维度内的变量的大小就是36。变量是包含在维度中的,因此在创建每个变量时要先创建其所在的维度。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 创建Dimensions\n", + "\n", + "Dataset.createDimension方法接受两个参数:维度名称,维度大小。维度大小设置为None或0时表示无穷维度。" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# 创建无穷维度\n", + "level = test.createDimension('level', None)\n", + "time = test.createDimension('time', None)\n", + "# 创建有限维度\n", + "lat = test.createDimension('lat', 180)\n", + "lon = test.createDimension('lon', 360)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 查看Dimensions" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "OrderedDict([('level',\n", + " (unlimited): name = 'level', size = 0),\n", + " ('time',\n", + " (unlimited): name = 'time', size = 0),\n", + " ('lat',\n", + " : name = 'lat', size = 180),\n", + " ('lon',\n", + " : name = 'lon', size = 360)])" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test.dimensions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 查看维度大小" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "360\n" + ] + } + ], + "source": [ + "# 查看维度大小\n", + "print(len(lon))" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " (unlimited): name = 'level', size = 0\n" + ] + } + ], + "source": [ + "# Dimension对象存储在字典中\n", + "print(level)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n", + "False\n" + ] + } + ], + "source": [ + "# 判断维度是否是无穷\n", + "print(time.isunlimited())\n", + "print(lat.isunlimited())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NetCDF4的Variables对象类似于Numpy中的多维数组,不同的是,NetCDF4的Variables变量可以存储在无穷维度中。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 创建Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Dataset.createVariable方法接受的参数为:变量名,变量的数据类型,变量所在的维度。\n", + "\n", + "变量的有效数据类型包括:'f4'(32位浮点数)、'f8'(64位浮点数)、'i1'(8位有符号整型)、'i2'(16位有符号整型)、'i4'(32位有符号整型)、'i8'(64位有符号整型)、'u1'(8位无符号整型)、'u2'(16位无符号整型)、'u4'(32位无符号整型)、'u8'(64位无符号整型)、's1'(单个字符)。" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# 创建单个维度上的变量\n", + "times = test.createVariable('time', 'f8', ('time',))\n", + "levels = test.createVariable('level', 'i4', ('level',))\n", + "lats = test.createVariable('lat', 'f4', ('lat',))\n", + "lons = test.createVariable('lon', 'f4', ('lon',))\n", + "\n", + "# 创建多个维度上的变量\n", + "temp = test.createVariable('temp', 'f4', ('time', 'level', 'lat', 'lon'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 查看Variables" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "float32 temp(time, level, lat, lon)\n", + "unlimited dimensions: time, level\n", + "current shape = (0, 0, 180, 360)\n", + "filling on, default _FillValue of 9.969209968386869e+36 used\n" + ] + } + ], + "source": [ + "print(temp)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# 通过路径的方式在Group中创建变量\n", + "ftemp = test.createVariable('/group1/group11/ftemp', 'f8', ('time', 'level', 'lat', 'lon'))" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "float64 ftemp(time, level, lat, lon)\n", + "path = /group1/group11\n", + "unlimited dimensions: time, level\n", + "current shape = (0, 0, 180, 360)\n", + "filling on, default _FillValue of 9.969209968386869e+36 used\n" + ] + } + ], + "source": [ + "# 可以通过路径查看变量\n", + "print(test['/group1/group11/ftemp'])" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "group /group1/group11:\n", + " dimensions(sizes): \n", + " variables(dimensions): float64 ftemp(time, level, lat, lon)\n", + " groups: \n" + ] + } + ], + "source": [ + "print(test['/group1/group11'])" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OrderedDict([('time', \n", + "float64 time(time)\n", + "unlimited dimensions: time\n", + "current shape = (0,)\n", + "filling on, default _FillValue of 9.969209968386869e+36 used), ('level', \n", + "int32 level(level)\n", + "unlimited dimensions: level\n", + "current shape = (0,)\n", + "filling on, default _FillValue of -2147483647 used), ('lat', \n", + "float32 lat(lat)\n", + "unlimited dimensions: \n", + "current shape = (180,)\n", + "filling on, default _FillValue of 9.969209968386869e+36 used), ('lon', \n", + "float32 lon(lon)\n", + "unlimited dimensions: \n", + "current shape = (360,)\n", + "filling on, default _FillValue of 9.969209968386869e+36 used), ('temp', \n", + "float32 temp(time, level, lat, lon)\n", + "unlimited dimensions: time, level\n", + "current shape = (0, 0, 180, 360)\n", + "filling on, default _FillValue of 9.969209968386869e+36 used)])\n" + ] + } + ], + "source": [ + "# 查看文件中的所有变量\n", + "print(test.variables)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Attributes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Attributes对象用于存储对文件或维变量的描述信息,netcdf文件中包含两种属性:全局属性和变量属性。全局属性提供Groups或整个文件对象的信息,变量属性提供Variables对象的信息,属性的名称可以自己设置,下面例子中的description和history等都是自定义的属性名称。" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "\n", + "# 设置对文件的描述\n", + "test.description = 'bogus example script'\n", + "# 设置文件的历史信息\n", + "test.history = 'Created' + time.ctime(time.time())\n", + "# 设置文件的来源信息\n", + "test.source = 'netCDF4 python module tutorial'\n", + "# 设置变量属性\n", + "lats.units = 'degrees north'\n", + "lons.units = 'degrees east'\n", + "levels.units = 'hPa'\n", + "temp.units = 'K'\n", + "times.units = 'hours since 0001-01-01 00:00:00.0'\n", + "times.calendar = 'gregorian'" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['description', 'history', 'source']\n", + "['units']\n", + "['units', 'calendar']\n" + ] + } + ], + "source": [ + "# 查看文件属性名称\n", + "print(test.ncattrs())\n", + "# 查看变量属性名称\n", + "print(test['lat'].ncattrs())\n", + "print(test['time'].ncattrs())" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Global attr description = bogus example script\n", + "Global attr history = CreatedWed Jun 23 16:10:27 2021\n", + "Global attr source = netCDF4 python module tutorial\n" + ] + } + ], + "source": [ + "# 查看文件属性\n", + "for name in test.ncattrs():\n", + " print('Global attr {} = {}'.format(name, getattr(test, name)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 写入或读取变量数据" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "类似于数组,可以通过切片的方式向变量中写入或读取数据。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 向变量中写入数据" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "temp shape before adding data = (0, 0, 180, 360)\n" + ] + } + ], + "source": [ + "from numpy.random import uniform\n", + "\n", + "nlats = len(test.dimensions['lat'])\n", + "nlons = len(test.dimensions['lon'])\n", + "print('temp shape before adding data = {}'.format(temp.shape))" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "temp shape after adding data = (5, 10, 180, 360)\n" + ] + } + ], + "source": [ + "# 无穷维度的大小会随着写入的数据的大小自动扩展\n", + "temp[0:5, 0:10, :, :] = uniform(size=(5, 10, nlats, nlons))\n", + "print('temp shape after adding data = {}'.format(temp.shape))" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "levels shape after adding pressure data = (10,)\n" + ] + } + ], + "source": [ + "print('levels shape after adding pressure data = {}'.format(levels.shape))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 读取变量中的数据" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.46714965\n" + ] + } + ], + "source": [ + "print(temp[1, 5, 100, 300])" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(10, 10)\n" + ] + } + ], + "source": [ + "print(temp[1, 5, 10:20, 100:110].shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0.30582663 0.84210724 0.1397456 0.9156191 0.73041755]\n" + ] + } + ], + "source": [ + "# 可以用start:stop:step的形式进行切片\n", + "print(temp[1, 5, 10, 100:110:2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 应用" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "我们尝试用NetCDF4来操作一下训练样本中的SODA数据。" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SODA文件格式: NETCDF4\n", + "\n", + "root group (NETCDF4 data model, file format HDF5):\n", + " dimensions(sizes): year(100), month(36), lat(24), lon(72)\n", + " variables(dimensions): float32 sst(year, month, lat, lon), float32 t300(year, month, lat, lon), float64 ua(year, month, lat, lon), float64 va(year, month, lat, lon), int32 year(year), int32 month(month), float64 lat(lat), float64 lon(lon)\n", + " groups: \n" + ] + } + ], + "source": [ + "# 打开SODA文件\n", + "soda = Dataset('SODA_train.nc')\n", + "# 查看文件格式\n", + "print('SODA文件格式:', soda.data_model)\n", + "# 查看文件中包含的对象\n", + "print(soda)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OrderedDict([('year', : name = 'year', size = 100), ('month', : name = 'month', size = 36), ('lat', : name = 'lat', size = 24), ('lon', : name = 'lon', size = 72)])\n", + "OrderedDict([('sst', \n", + "float32 sst(year, month, lat, lon)\n", + " _FillValue: nan\n", + "unlimited dimensions: \n", + "current shape = (100, 36, 24, 72)\n", + "filling on), ('t300', \n", + "float32 t300(year, month, lat, lon)\n", + " _FillValue: nan\n", + "unlimited dimensions: \n", + "current shape = (100, 36, 24, 72)\n", + "filling on), ('ua', \n", + "float64 ua(year, month, lat, lon)\n", + " _FillValue: nan\n", + "unlimited dimensions: \n", + "current shape = (100, 36, 24, 72)\n", + "filling on), ('va', \n", + "float64 va(year, month, lat, lon)\n", + " _FillValue: nan\n", + "unlimited dimensions: \n", + "current shape = (100, 36, 24, 72)\n", + "filling on), ('year', \n", + "int32 year(year)\n", + "unlimited dimensions: \n", + "current shape = (100,)\n", + "filling on, default _FillValue of -2147483647 used), ('month', \n", + "int32 month(month)\n", + "unlimited dimensions: \n", + "current shape = (36,)\n", + "filling on, default _FillValue of -2147483647 used), ('lat', \n", + "float64 lat(lat)\n", + " _FillValue: nan\n", + "unlimited dimensions: \n", + "current shape = (24,)\n", + "filling on), ('lon', \n", + "float64 lon(lon)\n", + " _FillValue: nan\n", + "unlimited dimensions: \n", + "current shape = (72,)\n", + "filling on)])\n" + ] + } + ], + "source": [ + "# 查看维度和变量\n", + "print(soda.dimensions)\n", + "print(soda.variables)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "可以看到,SODA文件中包含year、month、lat、lon四个维度,维度大小分别是100、36、24和72,包含sst、t300、ua、va四个变量,每个变量都定义在(year, month, lat, lon)维度上。" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.5491563\n", + "[ 0.35030824 -0.27190587 -0.39402887 0.5343738 0.37811455 0.3713674\n", + " 0.08229554 0.7542514 0.68257743 0.14785604 0.22067821 0.57408816]\n", + "[[ 1.22284102 1.08418655]\n", + " [-0.10607338 -0.28691578]\n", + " [-0.98331785 -0.89280176]\n", + " [-1.15751183 -1.04381001]\n", + " [ 1.44365835 1.27503896]\n", + " [ 2.17918181 1.77685726]]\n", + "[[ 0.875687 0.6403966 1.3469224 0.53298873 0.98529816 1.02812028\n", + " 0.85326892 0.74691284 0.28933924 -0.40189767 -0.83211648 -0.43214691]\n", + " [ 0.04050779 0.15766144 -0.73416436 -0.70684886 -0.56758839 0.10421887\n", + " 0.58899581 0.22496569 -0.25270063 -0.51971638 -1.15229702 -1.31563485]\n", + " [-1.74257064 -2.09364986 -3.08066273 -2.86321235 -1.13531363 0.05363135\n", + " 0.5130071 1.13993824 1.03027582 1.01840234 0.88233793 2.16193867]\n", + " [ 1.87613273 1.29819739 0.91255933 0.07229906 -0.54798424 0.95893037\n", + " 1.20532691 0.95680737 0.99374217 0.7587797 0.69023252 0.91067171]\n", + " [ 0.56461787 -0.04788923 0.53796363 0.34152603 -0.1429361 -0.16038476\n", + " 0.36168021 0.31549513 0.51516014 0.51351386 0.06654173 0.42326063]]\n" + ] + } + ], + "source": [ + "# 读取每个变量中的数据\n", + "soda_sst = soda['sst'][:]\n", + "print(soda_sst[1, 1, 1, 1])\n", + "\n", + "soda_t300 = soda['t300'][:]\n", + "print(soda_t300[1, 2, 12:24, 36])\n", + "\n", + "soda_ua = soda['ua'][:]\n", + "print(soda_ua[1, 2, 12:24:2, 36:38])\n", + "\n", + "soda_va = soda['va'][:]\n", + "print(soda_va[5:10, 0:12, 12, 36])\n", + "\n", + "# 关闭文件\n", + "soda.close()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Xarray\n", + "[官方文档](http://xarray.pydata.org/en/stable/)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Xarray是一个开源的Python库,支持在类似Numpy的多维数组上引入维度、坐标和属性标记并可以直接使用标记的名称进行相关操作,能够读写netcdf文件并进行进一步的数据分析和可视化。\n", + "\n", + "Xarray有两个基本的数据结构:DataArray和Dataset,这两个数据结构都是在多维数组上建立的,其中DataArray用于标记的实现,Dataset则是一个类似于字典的DataArray容器。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "安装Xarray要求满足以下依赖包:\n", + "\n", + "- Python(3.7+)\n", + "- setuptools(40.4+)\n", + "- Numpy(1.17+)\n", + "- Pandas(1.0+)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Defaulting to user installation because normal site-packages is not writeable\n", + "Looking in indexes: https://mirrors.aliyun.com/pypi/simple\n", + "Collecting xarray\n", + " Downloading https://mirrors.aliyun.com/pypi/packages/10/6f/9aa15b1f9001593d51a0e417a8ad2127ef384d08129a0720b3599133c1ed/xarray-0.16.2-py3-none-any.whl (736 kB)\n", + "\u001b[K |████████████████████████████████| 736 kB 198 kB/s eta 0:00:01\n", + "\u001b[?25hRequirement already satisfied: setuptools>=38.4 in /opt/conda/lib/python3.6/site-packages (from xarray) (51.1.1)\n", + "Requirement already satisfied: numpy>=1.15 in /opt/conda/lib/python3.6/site-packages (from xarray) (1.19.4)\n", + "Requirement already satisfied: pandas>=0.25 in /opt/conda/lib/python3.6/site-packages (from xarray) (1.1.5)\n", + "Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas>=0.25->xarray) (2020.5)\n", + "Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.6/site-packages (from pandas>=0.25->xarray) (2.8.1)\n", + "Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas>=0.25->xarray) (1.15.0)\n", + "Installing collected packages: xarray\n", + "Successfully installed xarray-0.16.2\n", + "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 21.1.2 is available.\n", + "You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.\u001b[0m\n" + ] + } + ], + "source": [ + "!pip install xarray" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import xarray as xr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 创建DataArray" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "xr.DataArray接受三个输入参数:数组,维度,坐标。其中维度为数组的维度名称,坐标以字典的形式给维度赋予坐标标签。" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 2, y: 3)>\n",
+       "array([[-1.89477837, -0.58997363, -1.77758946],\n",
+       "       [-0.21793173,  0.77616912,  0.45868184]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y
" + ], + "text/plain": [ + "\n", + "array([[-1.89477837, -0.58997363, -1.77758946],\n", + " [-0.21793173, 0.77616912, 0.45868184]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 创建一个2x3的数组,将维度命名为'x'和'y',并赋予'x'维度10和20两个坐标标签\n", + "data = xr.DataArray(np.random.randn(2, 3), dims=('x', 'y'), coords={'x': [10, 20]})\n", + "data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "也可以用Pandas的Series或DataFrame数据创建DataArray。" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'foo' (dim_0: 3)>\n",
+       "array([0, 1, 2])\n",
+       "Coordinates:\n",
+       "  * dim_0    (dim_0) object 'a' 'b' 'c'
" + ], + "text/plain": [ + "\n", + "array([0, 1, 2])\n", + "Coordinates:\n", + " * dim_0 (dim_0) object 'a' 'b' 'c'" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# index名称会自动转换成坐标标签\n", + "xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-1.89477837 -0.58997363 -1.77758946]\n", + " [-0.21793173 0.77616912 0.45868184]]\n", + "('x', 'y')\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "{}\n" + ] + } + ], + "source": [ + "# 查看数据\n", + "print(data.values)\n", + "\n", + "# 查看维度\n", + "print(data.dims)\n", + "\n", + "# 查看坐标\n", + "print(data.coords)\n", + "\n", + "# 可以用data.attrs字典来存储任意元数据\n", + "print(data.attrs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 索引" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Xarray支持四种索引方式。" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-1.89477837, -0.58997363, -1.77758946])\n", + "Coordinates:\n", + " x int64 10\n", + "Dimensions without coordinates: y \n", + "\n", + "\n", + "array([-1.89477837, -0.58997363, -1.77758946])\n", + "Coordinates:\n", + " x int64 10\n", + "Dimensions without coordinates: y \n", + "\n", + "\n", + "array([-1.89477837, -0.58997363, -1.77758946])\n", + "Coordinates:\n", + " x int64 10\n", + "Dimensions without coordinates: y \n", + "\n", + "\n", + "array([-1.89477837, -0.58997363, -1.77758946])\n", + "Coordinates:\n", + " x int64 10\n", + "Dimensions without coordinates: y \n", + "\n" + ] + } + ], + "source": [ + "# 通过位置索引,类似于numpy\n", + "print(data[0, :], '\\n')\n", + "\n", + "# 通过坐标标签索引\n", + "print(data.loc[10], '\\n')\n", + "\n", + "# 通过维度名称和位置索引,isel表示\"integer select\"\n", + "print(data.isel(x=0), '\\n')\n", + "\n", + "# 通过维度名称和坐标标签索引,sel表示\"select\"\n", + "print(data.sel(x=10), '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 属性" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "和NetCDF4一样,Xarray也支持自定义DataArray或标记的属性描述。" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'long_name': 'random velocity', 'units': 'metres/sec', 'description': 'A random variable created as an example', 'ramdom_attribute': 123}\n" + ] + } + ], + "source": [ + "# 设置DataArray的属性\n", + "data.attrs['long_name'] = 'random velocity'\n", + "data.attrs['units'] = 'metres/sec'\n", + "data.attrs['description'] ='A random variable created as an example'\n", + "data.attrs['ramdom_attribute'] = 123\n", + "# 查看属性\n", + "print(data.attrs)" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Attributes of x dimension: {'units': 'x units'} \n", + "\n" + ] + } + ], + "source": [ + "# 设置维度标记的属性描述\n", + "data.x.attrs['units'] ='x units'\n", + "print('Attributes of x dimension:', data.x.attrs, '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 计算" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "DataArray的计算方式类似于numpy ndarray。" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 2, y: 3)>\n",
+       "array([[ 8.10522163,  9.41002637,  8.22241054],\n",
+       "       [ 9.78206827, 10.77616912, 10.45868184]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y
" + ], + "text/plain": [ + "\n", + "array([[ 8.10522163, 9.41002637, 8.22241054],\n", + " [ 9.78206827, 10.77616912, 10.45868184]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data + 10" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 2, y: 3)>\n",
+       "array([[-0.94797528, -0.55633911, -0.97869439],\n",
+       "       [-0.21621074,  0.70055084,  0.44276658]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y\n",
+       "Attributes:\n",
+       "    long_name:         random velocity\n",
+       "    units:             metres/sec\n",
+       "    description:       A random variable created as an example\n",
+       "    ramdom_attribute:  123
" + ], + "text/plain": [ + "\n", + "array([[-0.94797528, -0.55633911, -0.97869439],\n", + " [-0.21621074, 0.70055084, 0.44276658]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y\n", + "Attributes:\n", + " long_name: random velocity\n", + " units: metres/sec\n", + " description: A random variable created as an example\n", + " ramdom_attribute: 123" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.sin(data)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (y: 3, x: 2)>\n",
+       "array([[-1.89477837, -0.21793173],\n",
+       "       [-0.58997363,  0.77616912],\n",
+       "       [-1.77758946,  0.45868184]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y\n",
+       "Attributes:\n",
+       "    long_name:         random velocity\n",
+       "    units:             metres/sec\n",
+       "    description:       A random variable created as an example\n",
+       "    ramdom_attribute:  123
" + ], + "text/plain": [ + "\n", + "array([[-1.89477837, -0.21793173],\n", + " [-0.58997363, 0.77616912],\n", + " [-1.77758946, 0.45868184]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y\n", + "Attributes:\n", + " long_name: random velocity\n", + " units: metres/sec\n", + " description: A random variable created as an example\n", + " ramdom_attribute: 123" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.T" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray ()>\n",
+       "array(-3.24542223)
" + ], + "text/plain": [ + "\n", + "array(-3.24542223)" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "可以直接使用维度名称进行聚合操作。" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (y: 3)>\n",
+       "array([-1.05635505,  0.09309774, -0.65945381])\n",
+       "Dimensions without coordinates: y
" + ], + "text/plain": [ + "\n", + "array([-1.05635505, 0.09309774, -0.65945381])\n", + "Dimensions without coordinates: y" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.mean(dim='x')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "DataArray之间的计算操作可以根据维度名称进行广播。" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.04405523, 0.36823828, 0.38351121])\n", + "Coordinates:\n", + " * y (y) int64 0 1 2 \n", + "\n", + "\n", + "array([ 0.62771044, -0.41870179, -1.38038185, -0.19742089])\n", + "Dimensions without coordinates: z \n", + "\n", + "\n", + "array([[ 0.67176567, -0.37464656, -1.33632661, -0.15336566],\n", + " [ 0.99594872, -0.05046351, -1.01214356, 0.17081739],\n", + " [ 1.01122165, -0.03519058, -0.99687063, 0.18609032]])\n", + "Coordinates:\n", + " * y (y) int64 0 1 2\n", + "Dimensions without coordinates: z \n", + "\n" + ] + } + ], + "source": [ + "a = xr.DataArray(np.random.randn(3), [data.coords['y']])\n", + "b = xr.DataArray(np.random.randn(4), dims='z')\n", + "print(a, '\\n')\n", + "print(b, '\\n')\n", + "print(a+b, '\\n')" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 2, y: 3)>\n",
+       "array([[0., 0., 0.],\n",
+       "       [0., 0., 0.]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y
" + ], + "text/plain": [ + "\n", + "array([[0., 0., 0.],\n", + " [0., 0., 0.]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data - data.T" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 1, y: 3)>\n",
+       "array([[0., 0., 0.]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10\n",
+       "Dimensions without coordinates: y
" + ], + "text/plain": [ + "\n", + "array([[0., 0., 0.]])\n", + "Coordinates:\n", + " * x (x) int64 10\n", + "Dimensions without coordinates: y" + ] + }, + "execution_count": 73, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data[:-1] - data[:1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### GroupBy" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Xarray支持使用类似于Pandas的API进行分组操作。" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'labels' (y: 3)>\n",
+       "array(['E', 'F', 'E'], dtype='<U1')\n",
+       "Coordinates:\n",
+       "  * y        (y) int64 0 1 2
" + ], + "text/plain": [ + "\n", + "array(['E', 'F', 'E'], dtype='\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 2, labels: 2)>\n",
+       "array([[-1.83618391, -0.58997363],\n",
+       "       [ 0.12037506,  0.77616912]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "  * labels   (labels) object 'E' 'F'
" + ], + "text/plain": [ + "\n", + "array([[-1.83618391, -0.58997363],\n", + " [ 0.12037506, 0.77616912]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + " * labels (labels) object 'E' 'F'" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 将data的y坐标对齐labels后按labels的值分组求均值\n", + "data.groupby(labels).mean('y')" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 2, y: 3)>\n",
+       "array([[0.        , 0.        , 0.1171889 ],\n",
+       "       [1.67684664, 1.36614276, 2.35346021]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y
" + ], + "text/plain": [ + "\n", + "array([[0. , 0. , 0.1171889 ],\n", + " [1.67684664, 1.36614276, 2.35346021]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y" + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 将data的y坐标按labels分组后减去组内的最小值\n", + "data.groupby(labels).map(lambda x: x - x.min())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 绘图" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Xarray支持简单方便的可视化操作,这里只做简单的介绍,更多的绘图方法感兴趣的同学们可以自行去探索。" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEKCAYAAAAFJbKyAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAljUlEQVR4nO3de7hcdX3v8fdnJ0EEUS5RrlHApioiII0BBT2giJCDgDcKWgsKT0RFofW0onjAYnuKrVIv9AgRUsAiohU0akAiXhAVmoBR7hIjHBIuMUS5iAI7+3P+WGvjuJnZe83MntvO5/U869mz1qzfWt+ZB+ab32X9frJNREREVUO9DiAiIgZLEkdERDQliSMiIpqSxBEREU1J4oiIiKYkcURERFM6ljgkzZL0PUm3SLpZ0onl8Y9KWi1pebnNa1D+IEm3S1oh6eROxRkREc1Rp57jkLQtsK3tGyRtBlwPHA4cATxi+xPjlJ0G/AJ4LbAKWAocZfuWjgQbERGVdazGYfte2zeUrx8GbgW2r1h8LrDC9krbjwNfAg7rTKQREdGM6d24iaQdgZcC1wH7ACdI+mtgGfAB278ZU2R74O6a/VXAXg2uPR+YD7DJJpv+xfNn//nkBh9Pmv7gfb0OYYOgTZ7R6xCmvBtuuWOt7We3c41Zerr/wEilc9fy+LdtH9TO/fpJxxOHpGcAXwVOsv2QpM8BHwNc/v0k8M5Wr297AbAAYLc99vQ3v3t1+0FHXVt8s2HrYkyip+/+il6HMOXN2ON1d7V7jT8wwpvYttK553DXzHbv1086mjgkzaBIGhfZvhTA9v01738e+GadoquBWTX7O5THIiL6goBpqnjyFJsSsGOJQ5KA84BbbZ9Zc3xb2/eWu28AbqpTfCkwW9JOFAnjSOCtnYo1IqJZAjYaqpg51nc0lK7rZI1jH+DtwI2SlpfHPgwcJWkPihx8J/AuAEnbAefanmd7WNIJwLeBacBC2zd3MNaIiKYUNY6qVY6ppWOJw/Y1FN/tWIsbnH8PMK9mf3GjcyMiek5NNFVNMV0ZVRURMdWkxhEREU1pqnN8ikniiIhoiVLjiIiI6gTMSOKIiIiqlM7xiIhoVpqqIiKisnSOR0REUzIcNyIimiI1MeXIFJPEERHRojRVRUREZenjiIiIpigPAEZERLNS44iIiMqKBwA3zMwx1OsAIiIG0ehCTlW2Ca8lLZS0RlK9he2QtJ+kByUtL7dTJ/vzNCM1joiIFkxy5/j5wFnAheOc80Pbh0zaHduQxBER0aLJaqqyfbWkHSflYl2QpqqIiBZIMCRV2oCZkpbVbPNbuOXLJf1M0uWSXjzJH6cpHatxSJpFUe3ammJ98QW2Py3pX4HXA48DvwTeYfu3dcrfCTxMscz7sO05nYo1IqJ5QtXbqta2+Rt2A/A8249Imgd8DZjdxvXa0skaxzDwAdu7AHsD75W0C7AE2NX2bsAvgA+Nc439be+RpBER/UaCaRtNq7S1y/ZDth8pXy8GZkia2faFW9SxxGH7Xts3lK8fBm4Ftrd9pe3h8rRrgR06FUNERMcINE2VtrZvJW0jFW1ekuZS/HY/0PaFW9SVzvGy0+elwHVj3noncEmDYgaulGTgHNsLOhdhRESTJIYmaViVpIuB/Sj6QlYBpwEzAGyfDbwZeLekYeD3wJG2PSk3b0HHE4ekZwBfBU6y/VDN8VMomrMualB0X9urJT0HWCLpNttX17n+fGA+wPY7zJr0+CMiGtHQ5DTa2D5qgvfPohiu2xc6OqpK0gyKpHGR7Utrjh8DHAK8rVHWtL26/LsGuAyY2+C8Bbbn2J6z5VY9a/KLiA2MBEPTVGmbajqWOMr2uPOAW22fWXP8IODvgUNtP9qg7KaSNht9DRwI1H2iMiKiV7rVx9FvOtlUtQ/wduBGScvLYx8GPgM8jaL5CeBa28dL2g441/Y8iiG8l5XvTwe+aPuKDsYaEdEUSZMyYmoQdSxx2L6G4qn8sRY3OP8eYF75eiWwe6dii4hom0BZATAiIqoTQ9M2zMk3kjgiIlpRPsexIUriiIhogZI4IiKiWWmqioiIyiQxbcZgJQ5JW1Y4baTexLO1kjgiIloh0ODVOO4pt/Ha2KYBzx3vIkkcEREtGsCnwm+1/dLxTpD004kuksQREdEKDeRT4S+fjHMGrp4VEdEPVDZVVdn6he0/AEjae3Rap3L/mZL2qj1nPKlxRES0Qgxc53iNzwF71uw/UudYQ0kcEREt0GA/Oa7amcltj0iqnA8G9lNHRPRUF1cA7ICVkt4vaUa5nQisrFo4iSMiohUD2MdR43jgFcBqYBWwF+WCeFWkqSoioiWatBUAu61cIO/IVssP5qeOiOixYgXAoUpbv5H055KuknRTub+bpI9ULd9/nygiYhBIDG00vdLWhz4PfAh4AsD2z2miBpLEERHRkqKpqso24ZWkhZLWjNYA6rwvSZ+RtELSzyVVGjY7jk1s//eYY8NVCydxRES0QqBp0yptFZwPHDTO+wcDs8ttPsUzF+1YK+n5gAEkvRm4t2rhvqxDRUT0O6FJGzFl+2pJO45zymHAheWzF9dK2lzStrYr/9iP8V5gAfBCSauBXwF/VbVwx2ockmZJ+p6kWyTdXI4TRtKWkpZIuqP8u0WD8keX59wh6ehOxRkR0RLB0NBQpQ2YKWlZzVZ56Gtpe+Dumv1V5bGW2F5p+wDg2cALbe9r+86q5TvZVDUMfMD2LsDewHsl7QKcDFxlezZwVbn/J8o540+jGFs8FzitUYKJiOiVJp7jWGt7Ts22oKdxSydKeibwKPBvkm6QdGDV8h1LHLbvtX1D+fph4FaKDHkYcEF52gXA4XWKvw5YYnud7d8ASxi//S8ioqskMTRjeqVtEqwGZtXs71Aea9U7bT8EHAhsBbwdOKNq4a50jpdtdy8FrgO2rmmXuw/Yuk6RytUySfNHq3/rHlg7eUFHRIxH6uaT44uAvy5HV+0NPNhG/wb8cSGneRR9JzfXHJtQxzvHJT0D+Cpwku2HpD/GZtuS3LBwBWWVbwHAbnvs2da1IiIqm8QVACVdDOxH0ReyiqKpfgaA7bOBxRQ/8isompfe0eYtr5d0JbAT8KFyivWRqoU7mjgkzaBIGhfZvrQ8fP/oaABJ2wJr6hRdTfEljtoB+H4nY42IaNZkPRVu+6gJ3jfFSKi2SJph+wngWGAPYKXtRyVtRRPJqJOjqgScR7FU4Zk1by0CRkdJHQ18vU7xbwMHStqi7BQ/sDwWEdEXpMl7ALCLfiLpaxTPgqyz/VsA2w+UT49X0skaxz4UHS43SlpeHvswRQfMlyUdC9wFHAEgaQ5wvO3jbK+T9DFgaVnudNvrOhhrRERzyilHBontOWWf80HApyRtD1wDXA78wPZjVa7TsU9t+xoad7a8ps75y4DjavYXAgs7E11ERPv6rDZRSfm8xtnA2WV3wispEsk/Svq17f850TUGK11GRPQJSQxVm06kb9l+QtJPKZ4z+fuyBjKhJI6IiBb16SJNE5L0feBQihxwPbBG0o9s/22V8oP5qSMiem2wVwB8VvkA4BspnuPYCzigauG+/EQREf1vIEdVjZpePg5xBPDNpgtPfjwREVOfhgZvVFWN0ykecfiR7aWSdgbuqFp4YD91RESv9WltYkK2vwJ8pWZ/JfCmquUH81NHRPSahIamVdr6TdYcj4jolaFp1bb+09aa42mqiohoiWBAm6oo1xyvnXSWJtYcT+KIiGhFueb4gMqa4xERXSfB9I16HUWr6q05/raqhZM4IiJaoPI5jkEjaRrwHtsHSNoUGCpXaa0siSMiohWiXzu+x2V7vaR9y9e/a+UaSRwRES3RQCaO0k8lLaJ4luPJ5FGz4N64kjgiIlo0iE1VpY2BB4BX1xwzkMQREdExGhrkzvFzbf+o9oCkfaoWHth0GRHRU+Vw3CpbpctJB0m6XdIKSSfXef8YSb+WtLzcjqt3nYo+W/FYXalxRES0ZPIeACxHOv078FpgFbBU0iLbt4w59RLbJ7Rxn5cDrwCeLal27Y1nApU7bDqWOCQtBA4B1tjetTx2CfCC8pTNgd/a3qNO2TuBh4H1wLDtOZ2KMyKiJZM7qmousKKcbBBJXwIOA8YmjnZtBDyD4rd/s5rjDwFvrnqRTtY4zgfOAi4cPWD7L0dfS/ok8OA45fe3vbZj0UVEtEXNTGA4U9Kymv0FthfU7G8P3F2zvwrYq8513iTpVcAvgL+xfXedcxqy/QPgB5LOt32XpE1sP9rMNaCDfRy2rwbW1XtPxQQpRwAXd+r+EREdNzRUbSvW9J5Tsy2Y6NJ1fAPY0fZuwBLggjYi307SLcBtAJJ2l/R/qxbuVef4K4H7bTdaOMTAlZKulzR/vAtJmi9pmaRl6x5IBSUiukRDaPpGlbYKVgOzavZ3KI89yfYDth8rd88F/qKN6D8FvI5iSC62fwa8qmrhXiWOoxi/trGv7T2Bg4H3llWzumwvGM3iW241c7LjjIioTzRT45jIUmC2pJ0kbUQxxfmiP7ldsdTrqEOBW9sJv04z1/qqZbs+qkrSdIoF0htmS9ury79rJF1G0XF0dXcijIiYmNCkzY5re1jSCRTLuU4DFtq+WdLpwDLbi4D3SzqUYvrzdcAxbdzybkmvACxpBnAiTSSiXgzHPQC4zfaqem/WTrpVvj6QYn3ciIj+MclzVdleDCwec+zUmtcfolh8aTIcD3yaolN+NXAlxYy5lYybOCT9vMI1fm37NXXKXgzsRzGaYBVwmu3zKKpgF485dzuKJxnnAVsDl5ULjEwHvmj7igpxRER00eDOVVWOWK08jfpYE9U4pgHzxnlfjGmHG2X7qAbHj6lz7J7R+5TjmHefIK6IiN6S0PQZvY6iJZJ2At4H7EhNHrB9aJXyEyWOd9m+a4IA3lPlRhERU44GdtamrwHnUQzxHWm28LiJw/Y1Y49J2gKYVS5uXveciIipT4OcOP5g+zOtFq7UOS7p+xTDv6YD1wNrJP3I9t+OWzAiYgrz4CaOT0s6jaJTfPTZEGzfUKVw1VFVz7L9UDkb44W2T6vYcR4RMTWJQa5xvAR4O8V6HKNNVeZP1+doqGrimF4+fHIEcEqzEUZETD2CYvTnIHoLsLPtx1spXDVx/APFgynX2F4qaWeg0XQhERFTngFPG9iVKW6imKF8TSuFq37qe8uJtYBiyKykM1u5YUTElKCB7hzfHLhN0lL+tI9jUobjjvossGeFYxERG47BTRyntVN4oifHJ2W1qIiIqWdwaxzluhwtm+hTj10tanRrarWoiIipyBqqtPULSd+cjHMmegDwT1aLaiK+iIipr4+SQkX7Sqo7TVRJwC4TXWSipqpP2T4JOEuSx75ftSMlImLK0UBOcnhYhXMmHKI7Uef4F8q/n6hws4iIDUo/NUNV0W7fxqiJmqqun8ybRURMHaq6ut+UU3Wuqn2AjwLPK8sIsO2dOxdaREQfG+wpR9pS9TmO84C/oZjgsPK6tBERU9fgDseV9HrgW7abnlIdqieOB21f3soNIiKmKg8N7JQjfwl8StJXKdY3v62ZwlU/9fck/StwKS1MwRsRMeUM8JQjtv9K0jOBo4Dzy1Gz/wFcbPvhicpX/dR7AXOA/wN8stzGHWklaaGkNZJuqjn2UUmrJS0vt7rL0ko6SNLtklZIOrlijBER3SVV2ypdavzfPUlPk3RJ+f51knZsJ3TbDwH/BXwJ2BZ4A3CDpPdNVLZSjcP2/i3EdT5wFnDhmOP/Zrth0pE0Dfh34LXAKmCppEW2b2khhoiIDpm8GkfF371jgd/Y/jNJRwIfp2hyauV+hwHHAH9G8Rs91/YaSZsAt1DMRdhQ1VFVp9Y7bvv0RmVsX91iRpwLrLC9srz3lygeWkniiIi+MonPcVT53TuMYnQrFDWFsyTJ9lMezq7gjRT/iL+69qDtRyUdO1Hhqp/6dzXbeuBgYMfm4nzSCZJ+XjZlbVHn/e2Bu2v2V5XH6pI0X9IyScvWPbC2xZAiIlqgoWobzBz9nSq3+WOuVOV378lzbA8DDwJbtRj5fWOThqSPl9e+aqLCVZuqPjnmBp+gWNipWZ8DPkaxBsrHKPpK3tnCdWpjWwAsABjaZKb/7DXvb+dyMY5LvvBPvQ5hg3D0CZ/qdQhRgREjVF4BcK3tOZ2Mp0mvBT445tjBdY7V1epYsk2AHZotZPv+0deSPg/Um4VxNTCrZn+H8lhERB8xIy21EtVV5Xdv9JxVkqYDzwIeaOYmkt4NvAd4vqSf17y1GfCjqtep2sdxI0UtAYp1OJ4NNOzfGOc629q+t9x9A8XyhWMtBWZL2oniizoSeGuz94qI6LRJSxvVfvcWAUcDP6FY1uK7LfRvfBG4HPhnoHbk1sO211W9SNUaxyE1r4eB+8s2toYkXQzsR9G2t4pixan9JO1B8X3fCbyrPHc74Fzb82wPSzqBoilsGsXDKTdX/UAREd1gYGSSMkej3z1JpwPLbC+imMHjC5JWAOsokksLt/Kdkt479g1JW1ZNHlX7OJpei8P2UXUOn9fg3HuAeTX7i4HFzd4zIqKbWhvQ1PBaT/nds31qzes/AG9p8zZfpKgIXE+R+2o7aQxUmn9wYJ+Xj4jopcmscXSL7UPKvzu1c53BfF4+IqLXDOsrbv1G0hskPatmf3NJh1ctXylxSHrKUoKS9qt6k4iIqch2pa0PnWb7wdEd27+l6IeupGqN48uSPqjC0yV9lqJXPiJig2RgpOLWh+r99lfuumhmksNZwI8pho3dA+xT9SYREVORXW3rQ8sknSnp+eV2JkWHeSVVE8cTwO+BpwMbA79qdQGQiIipYsTVtj70PuBx4JJyewx4yhDdRqpWTZYCXwdeBswEzpb0JtvtDg2LiBhINqzv0+rERGz/DjhZ0mbFrh9ppnzVxHGs7WXl63uBwyS9vZkbRURMNQOaN5D0Eorp1Lcs99cCR9uuN5vHU1R9AHBZnWNfaCLOiIgppXiOY0AzB5wD/K3t78GTo2QXAK+oUnjcPg5JEy4NW+WciIipyBW3PrTpaNIAsP19YNOqhSeqcbxozAyKY4lihsaIiA1On3Z8V7FS0v8GRluO/gpYWbXwRInjhRWusb7qzSIippLBbanincA/AJeW+z+kibWRxk0crUxuGBGxIbA9yKOqfgO0vOpdJjmMiGjRoDVVSfoG43S72D60ynWSOCIiWmAGsqnqE5NxkaorAO5i+5Yxx/Yre+IjIjZII/06ZqoB2z8YfS3p6cBzbd/e7HUyyWFERIsGda4qSa8HlgNXlPt7SFpUtXwmOYyIaMHoA4BVtj70UWAu8FsA28uByos7Ve3jaHqSQ0kLKZYoXGN71/LYvwKvp5hc65fAO8p54MeWvRN4mGKo77DtORXjjIjoChue6MdVmqp5wvaDUu3KsdXb3arWOJZSJI6XAa8EjpL0lQnKnA8cNObYEmBX27sBvwA+NE75/W3vkaQREf2pGI5bZetDN0t6KzBN0uyy++HHVQtXTRzH2j7V9hO277V9GDBue5jtq4F1Y45daXu43L0W2KFqoBER/aRbTVWStpS0RNId5d8tGpy3XtLycpuov+J9wIspplP/IvAgcFLVmColjg5NcvhO4PJGtwSulHS9pPnjXUTSfEnLJC3z8B/aDCkioiLD+pFqW5tOBq6yPRu4qtyv5/dlK80eFZ7HeKHtU2y/rNw+YrvyD2jVGsekknQKMAxc1OCUfW3vCRwMvFfSqxpdy/YC23Nsz9H0jTsQbUTEU3Wxc/ww4ILy9QXA4e1eEPikpFslfUzSrs0W7nrikHQMRaf529xgFXfbq8u/a4DLKHr/IyL6hoEnRlxpA2aOtoyU27gtKWNsbfve8vV9wNYNztu4vPa1kg4fN3Z7f2B/4NfAOZJulPSRqgF19clxSQcBfw/8D9uPNjhnU2DI9sPl6wOB07sYZkTExAzrq885sna8gT6SvgNsU+etU/7klrYlNbrp82yvlrQz8F1JN9r+ZaN72r4P+Iyk71H8Lp8K/ONEHwQ6mDgkXQzsR5FpVwGnUYyiehqwpBwGdq3t4yVtB5xrex5FNr2sfH868EXbV3QqzoiIVpjJe0bD9gGN3pN0v6Rtbd8raVtgTYNrjLbUrJT0feClFI891Lvmi4C/BN4EPECx7vgHqsbbscRh+6g6h89rcO49wLzy9Upg907FFRExWbr0GMci4GjgjPLv18eeUI60etT2Y5JmUjyg/S/jXHMh8CXgdeXvb1MyyWFERAu6uHTsGRTTPh0L3AUcASBpDnC87eOAF1H0VYxQ9F2fMXZ+wVq2X95OQEkcERGtaK6Po/Xb2A8Ar6lzfBlwXPn6x8BLOh5MKYkjIqIFo6OqNkRJHBERLehiU1XfSeKIiGiFzciA1jjK/pFTgOdR5AFRjPbdrUr5JI6IiBaYro2q6oSLgL8DbgSanhQliSMiokUD3FT1a9uVF24aK4kjIqIFxXoc7c9g2COnSTqXYtLEx0YP2r60SuEkjoiIFgx4U9U7gBcCM/hjU5WBJI6IiE4a4Kaql9l+QauFezKtekTEoPNgrwD4Y0m7tFo4NY6IiFZ06cnxDtkbWC7pVxR9HBmOGxHRaWagE8dB7RRO4oiIaIENjw8P5qgq23dJ2h14ZXnoh7Z/VrV8+jgiIlpgzPqRalu/kXQixUOAzym3/5T0vqrlU+OIiGjFYPdxHAvsZft3AJI+DvwE+GyVwkkcEREtGPA+DgHra/bXl8cqSeKIiGiBB7vG8R/AdZIuK/cPp8EKrfV0tI9D0kJJayTdVHNsS0lLJN1R/t2iQdmjy3PukHR0J+OMiGjFoPZx2D6T4unxdeX2Dtufqlq+053j5/PUYV8nA1fZnk0xT8rJYwtJ2hI4DdgLmEsxr0rdBBMR0QsjNo8Nj1Ta+kX5D/cty9/YO4H/LLe7ymOVdLSpyvbVknYcc/gwYL/y9QXA94EPjjnndcAS2+sAJC2hSEAXdyrWiIhm9WNtYgLXU3TPCHgu8Jvy9ebA/wN2qnKRXgzH3dr2veXr+4Ct65yzPXB3zf6q8thTSJovaZmkZR7+w+RGGhHRwGgfR6ebqiS9RdLNkkbKBZganXeQpNslrZD0lJacImbvZHtn4DvA623PtL0VcAhwZdWYevoch21TZL92rrHA9hzbczR940mKLCJiYl2aq+om4I3A1Y1OkDQN+HfgYGAX4KgJ5qLa2/bi0R3blwOvqBpQLxLH/ZK2BSj/rqlzzmpgVs3+DuWxiIi+0K0HAG3favv2CU6bC6ywvdL248CXKLoFGrlH0kck7VhupwD3VI2pF4ljETA6Supo4Ot1zvk2cKCkLcpO8QPLYxERfWF0ypEqGzBztEm93OZPcjiVm/dLRwHPBi4rt+eUxyrpaOe4pIspOsJnSlpFMVLqDODLko4F7gKOKM+dAxxv+zjb6yR9DFhaXur00Y7yiIh+UDwAWHnE1Frb4/VPfAfYps5bp9iu94/rtpS/pye2Wr7To6oaZbDX1Dl3GXBczf5CYGGHQouIaI8n7xkN2we0eYmmmvcl/Tnwv4AdqckDtl9d5WZ5cjwiogV9NuXIUmC2pJ0oEsaRwFvHOf8rwNnAufzp1COVJHFERLTAhuEuJA5Jb6CYfPDZwLckLbf9OknbAefanmd7WNIJFH3B04CFtm8e57LDtj/XakxJHBERLehWjcP2aAf22OP3APNq9hcDi8ee18A3JL2nvO5jNdeo1JecxBER0QLbA7uQE38c2fp3NccM7FylcBJHRESL+qiPoym2K00t0kgSR0RECwZ8WnUk7UrxlPmTU27YvrBK2SSOiIgWeUATh6TTKJ6x24WiX+Rg4BqgUuLImuMRES2wYWTElbY+9GaK5+nus/0OYHfgWVULp8YREdES4/YnMOyV39sekTQs6ZkUcwbOmqjQqCSOiIhWGNYP7qiqZZI2Bz5PsUbHI8BPqhZO4oiIaIEBD2DekCTgn23/Fjhb0hXAM23/vOo1kjgiIlo0iE1Vti1pMfCScv/OZq+RzvGIiFYMduf4DZJe1mrh1DgiIlrigR2OC+wFvE3SXcDvKNYdt+3dqhRO4oiIaIEN69cPYCdH4XXtFE7iiIho0aDWOGzf1U75JI6IiBYNauJoVxJHREQL7L7t+O64ro+qkvQCSctrtocknTTmnP0kPVhzzqndjjMiYiK2K21TTddrHLZvB/YAkDSNYpnDpyxSAvzQ9iFdDC0ioimD+ADgZOh1U9VrgF+221ETEdFtHuwpR9rS6wcAjwQubvDeyyX9TNLlkl7czaAiIibkonO8yjbV9KzGIWkj4FDgQ3XevgF4nu1HJM0DvgbMbnCd+cB8AGZs2pFYIyKeyoxMwf6LKnpZ4zgYuMH2/WPfsP2Q7UfK14uBGZJm1ruI7QW259ieo+kb1zslImLSFZMcdr7GIektkm6WNCJpzjjn3SnpxnJA0bK2bjqBXvZxHEWDZipJ2wD3l5NxzaVIcA90M7iIiHG5a89x3AS8ETinwrn7217b4Xh6kzgkbQq8FnhXzbHjAWyfTbE61bslDQO/B470VBzTFhEDrRvPcdi+FaCYDb0/9CRx2P4dsNWYY2fXvD4LOKvbcUVEVGWbkepzVc0c03y0wPaCyQ4JuFKSgXM6cP0n9Xo4bkTEwGqixrHW9nj9E98Btqnz1im2v17xHvvaXi3pOcASSbfZvrpqgM1I4oiIaJFH1k/OdewDJuEaq8u/ayRdBswFOpI4ev0cR0TEYLLxyPpKW6dJ2lTSZqOvgQMpOtU7IokjIqIFpjuJQ9IbJK0CXg58S9K3y+PblUvAAmwNXCPpZ8B/A9+yfUVbNx5HmqoiIlphM/LE4124jS+jznx+tu8B5pWvVwK7dzyYUhJHREQryqaqDVESR0REi5I4IiKistE+jg1REkdERCucGkdERDTFjCRxREREVbYZGe78qKp+lMQREdEKG69PjSMiIpqQPo6IiKguz3FERERzkjgiIqIJxdKxldfjmFKSOCIiWpFRVRER0RTnOY6IiGiCIcNxu03SncDDwHpgeOyyiipWZv80xbTBjwLH2L6h23FGRNSVUVU9s7/ttQ3eOxiYXW57AZ8r/0ZE9IEkjn50GHChbQPXStpc0ra27+11YBERG3LnuIrf5R7cWPoV8BuKpsJzbC8Y8/43gTNsX1PuXwV80PayMefNB+aXu7vSwXV2O2Am0KjG1a8GLeZBixcScze8wPZm7VxA0hUUn7uKtbYPaud+/aSXNY59ba+W9BxgiaTbbF/d7EXKhLMAQNKysX0l/WzQ4oXBi3nQ4oXE3A2Slk181vimUiJo1lCvbmx7dfl3DcV6unPHnLIamFWzv0N5LCIieqgniUPSppI2G30NHMhTm5gWAX+twt7Ag+nfiIjovV41VW0NXFaMuGU68EXbV0g6HsD22cBiiqG4KyiG476jwnUXTHxKXxm0eGHwYh60eCExd8OgxdtXetY5HhERg6lnfRwRETGYkjgiIqIpA504JG0paYmkO8q/WzQ4b72k5eW2qAdxHiTpdkkrJJ1c5/2nSbqkfP86STt2O8Y6MU0U8zGSfl3zvR7Xizhr4lkoaY2kus/xlIMsPlN+np9L2rPbMY6JZ6J495P0YM33e2q3Y6wT0yxJ35N0i6SbJZ1Y55y++Z4rxtt33/NAsD2wG/AvwMnl65OBjzc475EexjgN+CWwM7AR8DNglzHnvAc4u3x9JHBJj7/XKjEfA5zV6/8GauJ5FbAncFOD9+cBlwMC9gau6/N49wO+2evvdUxM2wJ7lq83A35R57+LvvmeK8bbd9/zIGwDXeOgmJbkgvL1BcDhvQulobnACtsrbT8OfIki7lq1n+O/gNeUkzz2SpWY+4qLh0fXjXPKk1PY2L4W2FzStt2J7qkqxNt3bN/rcqJR2w8DtwLbjzmtb77nivFGCwY9cWztPz7bcR/FMN96Npa0TNK1kg7vTmhP2h64u2Z/FU/9j/fJc2wPAw8CW3UluvqqxAzwprI54r8kzarzfj+p+pn6ycsl/UzS5ZJe3OtgapXNqS8FrhvzVl9+z+PEC338Pferfp7kEABJ3wG2qfPWKbU7ti2p0dji57mY3mRn4LuSbrT9y8mOdQPzDeBi249JehdFjenVPY5pKrmB4r/bRyTNA75GMVN0z0l6BvBV4CTbD/U6nolMEG/ffs/9rO9rHLYPsL1rne3rwP2j1eDy75oG1xid3mQl8H2Kf3l0S5WpU548R9J04FnAA12Jrr4JY7b9gO3Hyt1zgb/oUmytGqgpbGw/ZPuR8vViYIakqhPqdYykGRQ/whfZvrTOKX31PU8Ub79+z/2u7xPHBBYBR5evjwa+PvYESVtIelr5eiawD3BL1yKEpcBsSTtJ2oii83vsyK7az/Fm4Lu2e/lk5oQxj2m3PpSi/bifDdQUNpK2Ge3nkjSX4v/VXv5jYnRxtfOAW22f2eC0vvmeq8Tbj9/zIOj7pqoJnAF8WdKxwF3AEQCS5gDH2z4OeBFwjqQRiv8ozrDdtcRhe1jSCcC3KUYrLbR9s6TTgWW2F1H8x/0FSSsoOkyP7FZ89VSM+f2SDgWGKWI+pmcBA5IuphghM1PSKuA0YAa0NYVNx1SI983AuyUNA78HjuzxPyag+EfX24EbJS0vj30YeC705fdcJd5+/J77XqYciYiIpgx6U1VERHRZEkdERDQliSMiIpqSxBEREU1J4oiIiKYkcURERFOSOCIioilJHDElSTpd0kk1+/9Ubz2GiGheHgCMKamcDfVS23tKGgLuAObaznQSEW0a9ClHIuqyfaekByS9lGK6/Z8maURMjiSOmMrOpZhDaxtgYW9DiZg60lQVU1Y5s++NFJMHzra9vschRUwJqXHElGX7cUnfA36bpBExeZI4YsoqO8X3Bt7S61gippIMx40pSdIuFGtCXGX7jl7HEzGVpI8jIiKakhpHREQ0JYkjIiKaksQRERFNSeKIiIimJHFERERT/j+uO5PxESvXGAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "%matplotlib inline\n", + "data.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 与Pandas对象互相转换" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Xarray可以方便地转换成Pandas的Series或DataFrame,也可以由Pandas对象转换回Xarray。" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "x y\n", + "10 0 -1.894778\n", + " 1 -0.589974\n", + " 2 -1.777589\n", + "20 0 -0.217932\n", + " 1 0.776169\n", + " 2 0.458682\n", + "dtype: float64" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 转换成Pandas的Series\n", + "series = data.to_series()\n", + "series" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray (x: 2, y: 3)>\n",
+       "array([[-1.89477837, -0.58997363, -1.77758946],\n",
+       "       [-0.21793173,  0.77616912,  0.45868184]])\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "  * y        (y) int64 0 1 2
" + ], + "text/plain": [ + "\n", + "array([[-1.89477837, -0.58997363, -1.77758946],\n", + " [-0.21793173, 0.77616912, 0.45868184]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + " * y (y) int64 0 1 2" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Series转换成Xarray\n", + "series.to_xarray()" + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colname
xy
100-1.894778
1-0.589974
2-1.777589
200-0.217932
10.776169
20.458682
\n", + "
" + ], + "text/plain": [ + " colname\n", + "x y \n", + "10 0 -1.894778\n", + " 1 -0.589974\n", + " 2 -1.777589\n", + "20 0 -0.217932\n", + " 1 0.776169\n", + " 2 0.458682" + ] + }, + "execution_count": 88, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 转换成Pandas的DataFrame\n", + "df = data.to_dataframe(name='colname')\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.Dataset>\n",
+       "Dimensions:  (x: 2, y: 3)\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "  * y        (y) int64 0 1 2\n",
+       "Data variables:\n",
+       "    colname  (x, y) float64 -1.895 -0.59 -1.778 -0.2179 0.7762 0.4587
" + ], + "text/plain": [ + "\n", + "Dimensions: (x: 2, y: 3)\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + " * y (y) int64 0 1 2\n", + "Data variables:\n", + " colname (x, y) float64 -1.895 -0.59 -1.778 -0.2179 0.7762 0.4587" + ] + }, + "execution_count": 89, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# DataFrame转换成Xarray\n", + "xr.Dataset.from_dataframe(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Dataset是一个类似于字典的DataArray的容器,可以看作是一个具有多为结构的DataFrame。对比NetCDF4库中的Dataset,我们可以发现两者的作用是相似的,都是作为容器用来存储其他的对象。" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.Dataset>\n",
+       "Dimensions:  (x: 2, y: 3)\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y\n",
+       "Data variables:\n",
+       "    foo      (x, y) float64 -1.895 -0.59 -1.778 -0.2179 0.7762 0.4587\n",
+       "    bar      (x) int64 1 2\n",
+       "    baz      float64 3.142
" + ], + "text/plain": [ + "\n", + "Dimensions: (x: 2, y: 3)\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y\n", + "Data variables:\n", + " foo (x, y) float64 -1.895 -0.59 -1.778 -0.2179 0.7762 0.4587\n", + " bar (x) int64 1 2\n", + " baz float64 3.142" + ] + }, + "execution_count": 90, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 创建一个Dataset,其中包含三个DataArray\n", + "ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi})\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "可以通过字典的方式或者点索引的方式来查看DataArray,但是只有采用字典方式时才可以进行赋值。" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[-1.89477837, -0.58997363, -1.77758946],\n", + " [-0.21793173, 0.77616912, 0.45868184]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y\n", + "Attributes:\n", + " long_name: random velocity\n", + " units: metres/sec\n", + " description: A random variable created as an example\n", + " ramdom_attribute: 123 \n", + "\n", + "\n", + "array([[-1.89477837, -0.58997363, -1.77758946],\n", + " [-0.21793173, 0.77616912, 0.45868184]])\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y\n", + "Attributes:\n", + " long_name: random velocity\n", + " units: metres/sec\n", + " description: A random variable created as an example\n", + " ramdom_attribute: 123\n" + ] + } + ], + "source": [ + "# 通过字典方式查看DataArray\n", + "print(ds['foo'], '\\n')\n", + "\n", + "# 通过点索引的方式查看DataArray\n", + "print(ds.foo)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "同样可以通过坐标标记来索引。" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'bar' ()>\n",
+       "array(1)\n",
+       "Coordinates:\n",
+       "    x        int64 10
" + ], + "text/plain": [ + "\n", + "array(1)\n", + "Coordinates:\n", + " x int64 10" + ] + }, + "execution_count": 92, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds.bar.sel(x=10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 读/写netCDF文件" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.Dataset>\n",
+       "Dimensions:  (x: 2, y: 3)\n",
+       "Coordinates:\n",
+       "  * x        (x) int64 10 20\n",
+       "Dimensions without coordinates: y\n",
+       "Data variables:\n",
+       "    foo      (x, y) float64 -1.895 -0.59 -1.778 -0.2179 0.7762 0.4587\n",
+       "    bar      (x) int64 1 2\n",
+       "    baz      float64 3.142
" + ], + "text/plain": [ + "\n", + "Dimensions: (x: 2, y: 3)\n", + "Coordinates:\n", + " * x (x) int64 10 20\n", + "Dimensions without coordinates: y\n", + "Data variables:\n", + " foo (x, y) float64 ...\n", + " bar (x) int64 ...\n", + " baz float64 ..." + ] + }, + "execution_count": 93, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 写入到netcdf文件\n", + "ds.to_netcdf('xarray_test.nc')\n", + "\n", + "# 读取已存在的netcdf文件\n", + "xr.open_dataset('xarray_test.nc')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 应用" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "尝试用Xarray来操作一下训练样本中的SODA数据。" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{}\n", + "\n", + "Dimensions: (lat: 24, lon: 72, month: 36, year: 100)\n", + "Coordinates:\n", + " * year (year) int32 1 2 3 4 5 6 7 8 9 10 ... 92 93 94 95 96 97 98 99 100\n", + " * month (month) int32 1 2 3 4 5 6 7 8 9 10 ... 28 29 30 31 32 33 34 35 36\n", + " * lat (lat) float64 -55.0 -50.0 -45.0 -40.0 -35.0 ... 45.0 50.0 55.0 60.0\n", + " * lon (lon) float64 0.0 5.0 10.0 15.0 20.0 ... 340.0 345.0 350.0 355.0\n", + "Data variables:\n", + " sst (year, month, lat, lon) float32 ...\n", + " t300 (year, month, lat, lon) float32 ...\n", + " ua (year, month, lat, lon) float64 ...\n", + " va (year, month, lat, lon) float64 ...\n" + ] + } + ], + "source": [ + "# 打开SODA文件\n", + "soda = xr.open_dataset('SODA_train.nc')\n", + "# 查看文件属性\n", + "print(soda.attrs)\n", + "# 查看文件中包含的对象\n", + "print(soda)" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Frozen(SortedKeysDict({'year': 100, 'month': 36, 'lat': 24, 'lon': 72}))\n", + "Coordinates:\n", + " * year (year) int32 1 2 3 4 5 6 7 8 9 10 ... 92 93 94 95 96 97 98 99 100\n", + " * month (month) int32 1 2 3 4 5 6 7 8 9 10 ... 28 29 30 31 32 33 34 35 36\n", + " * lat (lat) float64 -55.0 -50.0 -45.0 -40.0 -35.0 ... 45.0 50.0 55.0 60.0\n", + " * lon (lon) float64 0.0 5.0 10.0 15.0 20.0 ... 340.0 345.0 350.0 355.0\n" + ] + } + ], + "source": [ + "# 查看维度和坐标\n", + "print(soda.dims)\n", + "print(soda.coords)" + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array(0.549156, dtype=float32)\n", + "Coordinates:\n", + " year int32 2\n", + " month int32 2\n", + " lat float64 -50.0\n", + " lon float64 5.0 \n", + "\n", + "\n", + "array([ 0.350308, -0.271906, -0.394029, 0.534374, 0.378115, 0.371367,\n", + " 0.082296, 0.754251, 0.682577, 0.147856, 0.220678, 0.574088],\n", + " dtype=float32)\n", + "Coordinates:\n", + " year int32 2\n", + " month int32 3\n", + " * lat (lat) float64 5.0 10.0 15.0 20.0 25.0 ... 40.0 45.0 50.0 55.0 60.0\n", + " lon float64 180.0 \n", + "\n", + "\n", + "array([[ 1.222841, 1.084187],\n", + " [-0.106073, -0.286916],\n", + " [-0.983318, -0.892802],\n", + " [-1.157512, -1.04381 ],\n", + " [ 1.443658, 1.275039],\n", + " [ 2.179182, 1.776857]])\n", + "Coordinates:\n", + " year int32 2\n", + " month int32 3\n", + " * lat (lat) float64 5.0 15.0 25.0 35.0 45.0 55.0\n", + " * lon (lon) float64 180.0 185.0 \n", + "\n", + "\n", + "array([[ 0.875687, 0.640397, 1.346922, 0.532989, 0.985298, 1.02812 ,\n", + " 0.853269, 0.746913, 0.289339, -0.401898, -0.832116, -0.432147],\n", + " [ 0.040508, 0.157661, -0.734164, -0.706849, -0.567588, 0.104219,\n", + " 0.588996, 0.224966, -0.252701, -0.519716, -1.152297, -1.315635],\n", + " [-1.742571, -2.09365 , -3.080663, -2.863212, -1.135314, 0.053631,\n", + " 0.513007, 1.139938, 1.030276, 1.018402, 0.882338, 2.161939],\n", + " [ 1.876133, 1.298197, 0.912559, 0.072299, -0.547984, 0.95893 ,\n", + " 1.205327, 0.956807, 0.993742, 0.75878 , 0.690233, 0.910672],\n", + " [ 0.564618, -0.047889, 0.537964, 0.341526, -0.142936, -0.160385,\n", + " 0.36168 , 0.315495, 0.51516 , 0.513514, 0.066542, 0.423261]])\n", + "Coordinates:\n", + " * year (year) int32 6 7 8 9 10\n", + " * month (month) int32 1 2 3 4 5 6 7 8 9 10 11 12\n", + " lat float64 5.0\n", + " lon float64 180.0\n" + ] + } + ], + "source": [ + "# 读取数据\n", + "soda_sst = soda['sst']\n", + "print(soda_sst[1, 1, 1, 1], '\\n')\n", + "\n", + "soda_t300 = soda['t300']\n", + "print(soda_t300[1, 2, 12:24, 36], '\\n')\n", + "\n", + "soda_ua = soda['ua']\n", + "print(soda_ua[1, 2, 12:24:2, 36:38], '\\n')\n", + "\n", + "soda_va = soda['va']\n", + "print(soda_va[5:10, 0:12, 12, 36])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 作业\n", + "基础作业:\n", + "\n", + "1.尝试用NetCDF4和Xarray来操作赛题数据,对数据有基本的了解。\n", + "\n", + "进阶作业:\n", + "\n", + "2.尝试用Xarray对训练数据进行数据探索和数据可视化。" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}