sherlcok314159/ML

You have to be logged in to leave a comment.

数据归一化

有时候data的数据分布十分不均匀，会导致训练得不到想要的结果。这也是需要对数据进行归一化的原因

常见的数据归一化：Constant Factor Normalization,Min-Max,Z-score,Softmax

Constant Factor Normalization

最简单的归一化方法，将所有数据除以一个常数即可

假如我们的data有x1,x2,x3,x4，每个x有两个feature

import numpy as np

data = np.array([[23, 140, 11, 340, 12], [12, 222, 353, 132, 23]], dtype=float)
data *= 0.01
print(data)

[[0.23 1.4  0.11 3.4  0.12]
 [0.12 2.22 3.53 1.32 0.23]]

Min-Max

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings

warnings.filterwarnings("ignore")

data = np.array([[23, 140, 11, 340, 12], [12, 222, 353, 132, 23]], dtype=float)

max_ = np.max(data, axis=1, keepdims=True)
min_ = np.min(data, axis=1, keepdims=True)
data = (data - min_) / (max_ - min_)
sns.distplot(data, fit=stats.norm)
# plt.xlabel("Unchanged data")
plt.xlabel("Changed data")
plt.show()

这样处理之后会使整个数据分布在[0,1]，最大值会变为1，最小值会变为0，越大就会越接近1，越小就越接近0

整个数据不会因此变得符合正态分布

Z-score

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings

warnings.filterwarnings("ignore")

data = np.array([[23, 140, 11, 340, 12], [12, 222, 353, 132, 23]], dtype=float)
mean_ = np.mean(data, axis=1, keepdims=True)
std_ = np.std(data, axis=1, keepdims=True)
data = (data - mean_) / std_
print(data)
[[-0.64718872  0.27399231 -0.74166883  1.84866073 -0.73379549]
 [-1.06590348  0.57515027  1.59885522 -0.12815848 -0.97994352]]

处理后的数据，如果大于平均值就是正的，如果小于平均值则为负的

Softmax

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
from math import exp

warnings.filterwarnings("ignore")


data = np.array([[23, 140, 11, 340, 12], [12, 222, 353, 132, 23]], dtype=float)
d = data.shape[0]
for i in range(d):
    data[i] = list(map(exp, data[i]))
    data[i] = data[i] / (sum(data[i]))
print(data)

[[2.13132283e-138 1.38389653e-087 1.30953001e-143 1.00000000e+000
  3.55967162e-143]
 [8.04603043e-149 1.28062764e-057 1.00000000e+000 1.04934790e-096
  4.81749166e-144]]

在深度学习最后输出时或者在Logistic Regression时候都会经过softmax，这样处理的目的是最后输出为一个机率，所有输出相加和为1，输出的值全部介于[0,1]之间。

Tip!

Press p or to see the previous file or, n or to see the next file

normalization.md 3.3 KB

Permalink History Raw

数据归一化

Constant Factor Normalization

Min-Max

Z-score

Softmax

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

sherlcok314159 / ML mirror of https://github.com/sherlcok314159/ML.git

normalization.md 3.3 KB Permalink History Raw

数据归一化

Constant Factor Normalization

Min-Max

Z-score

Softmax

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

sherlcok314159
/
ML
mirror of https://github.com/sherlcok314159/ML.git

normalization.md 3.3 KB

Permalink History Raw