Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Eugenia Anello 3a680b5673
Update README.md
1 year ago
ace943cb83
Initialized project
1 year ago
1 year ago
doc
1c2e4f25d1
add doc
1 year ago
271212526e
Delete barchart.html
1 year ago
src
1 year ago
ace943cb83
Initialized project
1 year ago
1 year ago
c0448342ea
Initial commit
1 year ago
3a680b5673
Update README.md
1 year ago
1 year ago
1 year ago
1 year ago
26786c6738
Merge branch 'main' of https://github.com/eugeniaring/topic-modeling-reviews into main
1 year ago
f73f1a8b01
Update requirements.txt
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Topic Modeling project

In this project, we are going to train the BERTopic model to identify topics from e-commerce clothing reviews. We are going to use the E-commerce Clothing Reviews dataset, available on Kaggle. It provides real commercial data of an e-commerce website with reviews provided by customers. There are fields, like clothing id, age of the client, title, review text, and so on.

Detailed description of the project

The article with the explanations is Topic Modeling for E-commerce Reviews using BERTopic.

Tools used in the project

Project Structure

  • data/: contains all the data
    • raw_data/: contains original data
    • processed_data/: contained processed data
  • model/: contains artifact of BERTopic model
  • output/: contains the plots generated with BERTopic model
  • src: contains the following scripts
    • train.py: Python script to train BERTopic model and save artifact
    • mlflow_log.py: Python script to track the experiments of the ML model
    • topic_model.py: Python script to create BERTopic model
    • process_data.py: Python script to clean and filter the data

Visualize Topics

topic_model.visualize_topics()

topic_model.visualize_barchart(top_n_topics = 10)

topic_model.visualize_documents(docs)

topic_model.visualize_hierarchy()

topic_model.visualize_heatmap(n_clusters=10, width=1000, height=1000)

Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...