Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Vaishnav Uppalapati 5a7755a50b
Update README.md
3 weeks ago
6198bf0225
added ticket price data and some more inference
1 month ago
e5b8464801
modified workflow
3 weeks ago
1b9cadcdca
small changes and Updated ReamdME
3 weeks ago
3517e7404a
loaded best model for pridiction
4 weeks ago
c6f78ed5ff
RandomSearchCv for 100 iterations
4 weeks ago
1b9cadcdca
small changes and Updated ReamdME
3 weeks ago
src
23e14e69ac
prediction pipeline and testing
1 month ago
5a0eb1fb1f
ui improvement for small screens
3 weeks ago
1b9cadcdca
small changes and Updated ReamdME
3 weeks ago
d77d9ae118
docker and git workflow modification
3 weeks ago
6198bf0225
added ticket price data and some more inference
1 month ago
3517e7404a
loaded best model for pridiction
4 weeks ago
d77d9ae118
docker and git workflow modification
3 weeks ago
ffb8311312
Initial commit
1 month ago
5a7755a50b
Update README.md
3 weeks ago
993c60c4b3
updated app port
4 weeks ago
c6f78ed5ff
RandomSearchCv for 100 iterations
4 weeks ago
9f651b0f0b
model evaluation and registory in mlflow
1 month ago
9f651b0f0b
model evaluation and registory in mlflow
1 month ago
3517e7404a
loaded best model for pridiction
4 weeks ago
d77d9ae118
docker and git workflow modification
3 weeks ago
babf3422ba
data validation completed
1 month ago
706422a90a
requirements added
1 month ago
d6153793dc
directory structure created
1 month ago
d6153793dc
directory structure created
1 month ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

🎬 BoxOfficePrediction

Develop an advanced predictive model to forecast a film's box office revenue with precision and confidence. Utilizing a myriad of parameters, including budget, cast, genre, and past performance, our task is to leverage the power of machine learning to unravel the intricacies of box office dynamics and provide actionable insights for studios and filmmakers.

🚀 Motivation

With the extensive data from the TMDB_5000 dataset from Kaggle, numerous recommendation systems are built. However, the true potential of the dataset remains largely untapped. Our initiative aims to harness this wealth of information to predict a film's expected revenue by leveraging a multitude of parameters and innovative feature engineering techniques, ultimately empowering stakeholders to make more informed decisions in the ever-evolving landscape of the entertainment industry.

📄 Documentation

This section contains detailed information about the approach, experimentation results, and inferences derived from the project. I have created a blog explaining the approach and execution. Please visit my blog:

Blog Image

🛠️ Technology Stack

Frontend Backend ML Library MLOps Tools Deployment Version Control
HTML5 Flask Scikit-Learn MLflow Docker GitHub
CSS3 DVC GitHub Actions
JavaScript Heroku

📊 Implementation Overview

Data:

  • TMDB 5000 Movie Dataset => Kaggle
  • Average Ticket Prices => (Made by me) : Download

🔧 Preprocessing:

  • Formatted complex structure to simple and trainable data.
  • Assigned Scores to special categorical features like crew, hero, heroine with many unique values, based on the cumulative popularity and weighted rating of their previous work to numerically determine their impact on revenue/footfall.
  • Used One-hot encoding for normal categorical features with fewer unique values.
  • Used log-normal transformation to handle skewed data and outliers.
  • Normalized data with StandardScaler.

🎯 Target Metric: Footfall Prediction

To predict expected revenue, we introduced a novel approach by considering footfall (number of tickets sold) as a target metric. While revenue is subject to various external factors such as ticket prices and distribution deals, footfall provides a more consistent and direct measure of a movie's popularity and audience engagement.

expected revenue = predicted footfall * current avg_ticket_price

🤖 Model Selection

Models trained:

Model Best Model
RandomForestRegressor
DecisionTreeRegressor
GradientBoostingRegressor
LinearRegression
XGBRegressor XGBRegressor
CatBoostRegressor
AdaBoostRegressor

📈 Best Model Metrics

Metric Value
RMSE 0.012
neg_mean_squared_error -0.00024

⚙️ Best Model Parameters

Parameter Value
colsample_bytree 0.30000000000000004
learning_rate 0.11
max_depth 4
n_estimators 444

🔍 Hyperparameter Tuning

  • Method: RandomizedSearchCV

📑 MLflow Experiment Logs

All the experiment results and models are logged in MLflow for a clearer understanding and detailed inference: View here

📸 Screenshots

Home Page Form Page Result
home page form page result

🖥️ Run Locally

Clone the project

  git clone https://github.com/uvaishnav/BoxOfficePrediction.git

Create a conda environment after opening the repository

  conda create -n boxoffice python=3.9 -y
  conda activate boxoffice

Install requirements

  pip install -r requirements.txt

Start the server

python app.py
Now,
open up you local host and port

🔧 For Usage/Modification

1. Clone the project

  git clone https://github.com/uvaishnav/BoxOfficePrediction.git

2. Create a conda environment after opening the repository

  conda create -n boxoffice python=3.9 -y
  conda activate boxoffice

3. Install requirements

  pip install -r requirements.txt

4. Create a Kaggle Account and get the kaggle.json file and store it in .kaggle folder in your system (For data_ingestion pipeline)

5. Add Environment Variables

For model evaluation pipeline,

  • Connect repository to dagshub
  • Get mlflow uri and credentials
  • UPdate config.yaml file with your mlflow uri
  • Then add these variables(credentials from dagshub) to your environment
export MLFLOW_TRACKING_URI= your mlflow uri
export MLFLOW_TRACKING_USERNAME= your username
export MLFLOW_TRACKING_PASSWORD= your password

6. Run all the pipelines using Dvc

dvc init
dvc repro

🎥 Demo

Watch the demo video

🚀 Deployment

To Deploy this Project on Heroku

1. Dockerize the Project

Update the Dockerfile as needed and build the Docker image. You need to install Docker Desktop first.

docker build -t boxoffice .

2. Update Secret Variables in GitHub to Deploy Using GitHub Actions

  1. Create an account in heroku and create an app.
  2. In your GitHub repository, navigate to Settings -> Secrets and Variables -> Actions. Add the secret keys according to your main.yaml file in workflow
  • HEROKU_API_KEY
  • HEROKU_APP_NAME
  • HEROKU_EMAIL

The buld will hapen and a new version of your project is deployed every time you make changes and push to github.

📈 Scope of Improvement

Our current model predicts expected revenue based on factors like budget, cast, release month, and genres.

Optimizing Cast Selection and Release Timing

We can enhance its utility by optimizing cast selection and release timing. By analyzing historical data, we can identify optimal combinations of actors and crew members that synergize well, thereby maximizing revenue potential. Additionally, refining our model to recommend the best release windows can help avoid high competition periods and leverage seasonal trends, further boosting a film’s success.

🙏 Acknowledgements

  • TMDB_5000 dataset from Kaggle
  • 247wallst.com for preparing ticket prices dataset

📜 License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Tip!

Press p or to see the previous file or, n or to see the next file

About

Advanced predictive model for box office revenue. With precision forecasting and confidence-building insights, our solution empowers production houses to optimize resources and maximize profitability.

Collaborators 1

Comments

Loading...