Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

prepare_data.py 992 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
  1. import pandas as pd
  2. import re
  3. def remove_special_characters(text):
  4. return re.sub('[^A-Za-z0-9]+', ' ', text)
  5. def prepare_data(path_data):
  6. data = pd.read_csv(path_data)
  7. data["text"] = data["text"].apply(lambda x: remove_special_characters(x))
  8. return data
  9. def encode_sentiments_values(df):
  10. possible_sentiments = df.airline_sentiment.unique()
  11. sentiment_dict = {}
  12. for index, possible_sentiment in enumerate(possible_sentiments):
  13. sentiment_dict[possible_sentiment] = index
  14. # Encode all the sentiment values
  15. df['label'] = df.airline_sentiment.replace(sentiment_dict)
  16. return df
  17. if __name__ == '__main__':
  18. path_to_data = "./data/raw/airline_sentiment_data.csv"
  19. processed_data = prepare_data(path_to_data)
  20. # Encode the labels
  21. processed_data = encode_sentiments_values(processed_data)
  22. # Save the preproccesed data
  23. processed_data.to_csv('./data/preprocessed/airline_sentiment_preprocessed_data.csv')
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...