Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
This project implements a machine learning pipeline for network security classification, focusing on detecting and classifying network security threats. The pipeline is built with reproducibility, versioning, and tracking in mind, leveraging modern MLOps tools.
Clone the repository:
git clone https://github.com/austinLorenzMccoy/networkSecurity_project.git
cd networkSecurity_project
Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install dependencies:
pip install -r requirements.txt
pip install -e .
Set up environment variables:
# Create a .env file with your MongoDB connection string and DAGsHub credentials
cp .env.template .env
# Edit the .env file with your credentials
Initialize DVC:
dvc init
Connect to DAGsHub (optional):
# Set up DAGsHub as a remote
dvc remote add origin https://dagshub.com/austinLorenzMccoy/networkSecurity_project.dvc
The project uses DVC to define and run the ML pipeline stages:
# Run the entire pipeline
dvc repro
# Run a specific stage
dvc repro -s data_ingestion
dvc repro -s data_validation
dvc repro -s data_transformation
dvc repro -s model_training
# Run the direct training pipeline (using cyber threat intelligence data)
dvc repro -s direct_training
# View pipeline visualization
dvc dag
MLflow is used to track experiments, including parameters, metrics, and artifacts:
# Start the MLflow UI locally
mlflow ui
# Or view experiments on DAGsHub
# Visit: https://dagshub.com/austinLorenzMccoy/networkSecurity_project.mlflow
To enable MLflow tracking with DAGsHub:
Set your DAGsHub credentials in the .env
file:
MLFLOW_TRACKING_USERNAME=your_dagshub_username
MLFLOW_TRACKING_PASSWORD=your_dagshub_token
Run the training pipeline with MLflow tracking:
dvc repro direct_training
View your experiments on DAGsHub's MLflow interface
The project includes unit tests using pytest:
# Run all tests
pytest
# Run tests with coverage report
pytest --cov=networksecurity
Build and run the project using Docker:
# Build the Docker image
docker build -t network-security-project .
# Run the container
docker run -p 8000:8000 -e MONGODB_URI=your_mongodb_connection_string network-security-project
The project includes a FastAPI application for serving predictions:
# Run the FastAPI application
python app.py
# Or use the convenience script
bash run_api.sh
# Check health status
curl -X GET "http://localhost:8000/health"
# Get model information
curl -X GET "http://localhost:8000/model-info"
# Make a prediction with text
curl -X POST "http://localhost:8000/predict/text" \
-H "Content-Type: application/json" \
-d '{"text": "A new ransomware attack has been detected that encrypts files."}'
.
โโโ .dvc/ # DVC configuration
โโโ .dagshub/ # DAGsHub configuration
โโโ artifact/ # Generated artifacts from pipeline
โ โโโ direct_training/ # Artifacts from direct training approach
โโโ data_schema/ # Data schema definitions
โโโ logs/ # Application logs
โโโ Network_Data/ # Raw data (tracked by DVC)
โโโ networksecurity/ # Main package
โ โโโ components/ # Pipeline components
โ โโโ constants/ # Constants and configurations
โ โโโ entity/ # Data entities and models
โ โโโ exception/ # Custom exceptions
โ โโโ logging/ # Logging utilities
โ โโโ pipeline/ # Pipeline orchestration
โ โโโ utils/ # Utility functions
โโโ notebooks/ # Jupyter notebooks for exploration
โโโ reports/ # Generated reports and metrics
โโโ tests/ # Test cases
โโโ .env # Environment variables
โโโ .env.template # Template for environment variables
โโโ .gitignore # Git ignore file
โโโ app.py # FastAPI application
โโโ custom_model_trainer.py # Custom model trainer implementation
โโโ dvc.yaml # DVC pipeline definition
โโโ Dockerfile # Docker configuration
โโโ main.py # Main entry point
โโโ pytest.ini # Pytest configuration
โโโ README.md # Project documentation
โโโ requirements.txt # Python dependencies
โโโ run_api.sh # Script to run the FastAPI application
โโโ setup.py # Package setup file
โโโ train_with_components.py # Direct training script using components
The project is set up with GitHub Actions for CI/CD:
This project is licensed under the MIT License - see the LICENSE file for details.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?