1 Branches

doc

4649186d25

use dataframe instead of dict

5 years ago

.gitignore

6d649e2da8

add multi-station implementaion

5 years ago

README.md

6d649e2da8

add multi-station implementaion

5 years ago

coef_matrix.py

a8793f8987

verified on spark cluster

5 years ago

max_current.py

a8793f8987

verified on spark cluster

5 years ago

sparkjob.py

a8793f8987

verified on spark cluster

5 years ago

test_one_unit.py

6d649e2da8

add multi-station implementaion

5 years ago

test_stations.py

6d649e2da8

add multi-station implementaion

5 years ago

visualizer.py

d186adeabe

matrix-median and max-current operators verified

5 years ago

window-cor.py

de2574bb08

initial implementaion based on max current method

5 years ago

ycz6502.csv

de2574bb08

initial implementaion based on max current method

5 years ago

DagsHub Storage

You have to be logged in to leave a comment.

Intro

基于最大电流法分析光伏组串状态和故障预警。

算子的训练和运行时说明参考 doc 中文件。

test_one_unit.py 中的测试使用的数据集（ycz6502.csv）格式：所有组串都属于同一个相关性计算单元（集中式光伏系统：汇流箱，或者分布式光伏系统：逆变器），只要取其中时间、组串ID、电流值3列即可。

test_stations.py 中的测试集（数据文件hbpv10days.csv尺寸过大未包含在代码库中）格式：包含时间戳、场站名称、小室编号、汇流箱编号、组串编号、测点ID、电流值，用场站名称 + 小室编号 + 汇流箱编号组成分组标识符，每个分组包含一个汇流箱中所有组串在整个时间域中的电流值，按天分组后分别做相关性计算。

计算方案

单线程 Python 程序

优点：使用全部 Pandas, scikit-learn 库，GICS平台目前已支持，占用系统资源小，记录行数在千万行级别时计算速度尚可接受；

缺点：没有并行能力，计算量继续增加时速度变慢；

PySpark + Pandas

使用 PySpark 2.3 的 pandas_udf 实现 Spark 并行 + 单点使用 Pandas 处理。

优点：并行能力好，结合 Spark 和 Pandas 的优点；

缺点：只能使用 PySpark ，与 Scala Spark 目前没有整合方案；

Scala Spark + Pandas

Scala Spark + jep + Pandas

优点：与 Scala Spark 整合在一起；

缺点：目前尚未验证可行性；

Scala Spark + PyArrow + Pandas

目前尚未验证。

Scala Spark

优点：在 Spark 框架上运行速度最快，与已有平台整合方便；

缺点：目前的 Python 算法实现不能完全移植到 Spark 上；

Python 并行计算框架

Dask

Celery + Pandas

Python multiprocessing library

fabric

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

Intro

计算方案

单线程 Python 程序

PySpark + Pandas

Scala Spark + Pandas

Scala Spark + jep + Pandas

Scala Spark + PyArrow + Pandas

Scala Spark

Python 并行计算框架

Dask

Celery + Pandas

Python multiprocessing library

fabric

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

leo / pv-current

README.md

Intro

计算方案

单线程 Python 程序

PySpark + Pandas

Scala Spark + Pandas

Scala Spark + jep + Pandas

Scala Spark + PyArrow + Pandas

Scala Spark

Python 并行计算框架

Dask

Celery + Pandas

Python multiprocessing library

fabric

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

leo
/
pv-current