1 Branches

.ipynb_checkpoints

ba7b595422

naive bayes

3 years ago

NLTK Exploration.ipynb

97169c7f04

initial commit

3 years ago

archive.md

97169c7f04

initial commit

3 years ago

project.ipynb

ba7b595422

naive bayes

3 years ago

readme.md

e7b42b97d4

removed holdout

3 years ago

serial_test.ipynb

97169c7f04

initial commit

3 years ago

DagsHub Storage

You have to be logged in to leave a comment.

Author Classification Project

Using NLP and techniques of supervised learning (including Deep Learning) and unsupervised learning (emphasizing on unsupervised for this project), and collect thousand texts from Gutenberg project (and 7 novels) for at least 10 authors, build a project to classify text-author.

The project should follow the guideline as:

Pre-process data using Spacy and other methods.
Perform data exploration
Using Bag of Word, apply supervised models such as Naive Bayes, Logistic Regression, Decision Tree, Random Forest, KNN, SVM and Gradient Boosting, including GridSearchCV.
Similar to 3., but using TF-IDF.
Similar to 3., but using word2vec.
Apply RNN to do classification.
Using unsupervised technique, visualize bar graphs for clusters containing 10 author documents. Adjust by silhouette scores.
Using LSA, LDA and NNMF, print out top ten words (with their highest loading) for each topic modeling. Analyze and compare among three methods.
Write up analysis and conclusions.

serial_test.ipynb

I experienced a lot of snags during the data cleaning process. I had too many documents and was getting a lot of insufficient memory errors.

I have created and saved serial_test.ipynb to display some of my decision making. This is to help answer any questions such as why I choose the number of documents that I've selected, etc.

It's named serial_test because the process ended with which serialization method that I'm using to save the documents.

Tip!

Press p or to see the previous file or, n or to see the next file

readme.md

Author Classification Project

serial_test.ipynb

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

KalikaKay / Author_Classification mirror of https://github.com/KalikaKay/Author-Classification-Project.git

readme.md

Author Classification Project

serial_test.ipynb

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

KalikaKay
/
Author_Classification
mirror of https://github.com/KalikaKay/Author-Classification-Project.git