Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
erykml f7f2288088
Added data validation reports
1 year ago
3c419f1438
Initialized DVC
1 year ago
d16aab5c9c
reverted forced actions runs
1 year ago
08ee58e11c
updates
1 year ago
f7f2288088
Added data validation reports
1 year ago
src
1 year ago
3c419f1438
Initialized DVC
1 year ago
9e4d75c042
added data validation reports
1 year ago
199baeb8ba
improved the README
1 year ago
669ccdbcfe
moved the robustscaler
1 year ago
8bbe2539c5
added unmet requirement
1 year ago
efa9addaf6
triggered the entire dvc pipeline
1 year ago
301af12104
reran the train stage
1 year ago
94f28e2d40
trained the model using smote'd data
1 year ago
94f28e2d40
trained the model using smote'd data
1 year ago
db9427e782
updated the requirements.txt
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Credit Card Fraud Classification

In this project, we attempt to identify fraudulent credit card transactions. The dataset can be considered highly imbalanced, with only 0.17% of the observations belonging to the positive class.

As the baseline, we train a Random Forest classifier and evaluate its performance using recall, precision and the F1 Score.

In order to account for the class imbalance and to improve the model's performance, we use the following resampling approaches:

  • random undersampling
  • random oversampling
  • SMOTE
  • ADASYN

One of the issues caused by data resampling is the distortion of the relationships among the features, but also with the target. That is why we use deepchecks to investigate how the resampling impacts the distribution of the features in the training data. Additionally, we scheduled a GitHub Action that runs every time the data or the data generating scripts are modified.

For more voice-over, please refer to the following article.

If you would like to contribute to the project (for example, by exploring additional resampling approaches), please create a PR :)

References

Tip!

Press p or to see the previous file or, n or to see the next file

About

A repository containing the code for an article on approaching an imbalanced classification problem

Collaborators 2

Comments

Loading...