>>> import praw
>>> import pandas as pd
>>> from sklearn.cluster import KMeans
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> import random
>>> import numpy as np
>>> from transformers import RobertaTokenizer
>>> roberta_tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
>>> reddit = praw.Reddit(client_id='client id',
... client_secret='client secret',
... user_agent='user agent')
Version 7.1.0 of praw is outdated. Version 7.2.0 was released 5 days ago.
>>> def replies_of(top_level_comment, comment_list):
... if len(top_level_comment.replies) == 0:
... return
... else:
... for num, comment in enumerate(top_level_comment.replies):
... try:
... comment_list.append(str(comment.body))
... except:
... continue
... replies_of(comment, comment_list)
>>> list_of_subreddit = ['showerthoughts', 'wallstreetbets', 'askreddit', 'jokes', 'worldnews']
>>> total_comment_list = []
>>> comment_list = []
>>>for j in list_of_subreddit:
... top_posts = reddit.subreddit(j).top('day', limit=1)
... for submission in top_posts:
... submission_comm = reddit.submission(
... for count, top_level_comment in enumerate(submission_comm.comments):
... try:
... replies_of(top_level_comment, comment_list)
... except:
... continue
>>> for i in range(20): #1
>>> # Clustering using TfidfVectorizer method
>>> tf_idf_vectorizor = TfidfVectorizer(stop_words = 'english',max_features = 20000)
>>> tf_idf = tf_idf_vectorizor.fit_transform(total_comment_list)
>>> tf_idf_array = tf_idf.toarray()
>>> kmeans = KMeans(n_clusters=3, algorithm = 'auto') #2
>>> cluster_assignment = kmeans.labels_
>>> clustered_sentences = [[] for i in range(3)]
>>> for sentence_id, cluster_id in enumerate(cluster_assignment):
... clustered_sentences[cluster_id].append(total_comment_list[sentence_id])
>>> for i, cluster in enumerate(clustered_sentences):
... print("Cluster ", i+1)
... print(cluster)
... print("")
Cluster 1
['My god yes. iPad and folding. Or iPad and washing dishes. Like the best few minutes of every day.', 'Nihilistic', '', 'For the Damaged Coda\nSong by Blonde Redhead', 'Thank you for sharing this.', 'Very well written. I started reading and before I knew it I was standing in the middle of the kitchen on my phone for 10 minutes.', 'Me whenever I finish a jar of Nutella.', 'Oily penis']
Cluster 2
['Yeh I get that, that\'s why it screams peaked in high school a bit, not conclusively.\n\nTheres many kids who feel like a lot of it is a drag at the time, and don\'t _appreciate_ it at the time.\n\nIf you don\'t appreciate, and don\'t feel like it\'s good, they won\'t be the best times of your life.\n\nIn the same way, you can have times that are less good, but you feel much happier in the moment, and will back on it fondly as a result.\n\nPart of it being "good" is your mindset at the time.', "You always have tomorrow to kill yourself\n\nYou don't get a do-over, the option to take it back, to make different choices, or to seek redress/forgiveness for those poor decisions made.\n\nYou don't get to spend time with the things/places/people that make life a little less shitty. You give up on the hope that things could get better.\n\nYou make a hole in the hearts of those that love and care for you. A hurt that never heals or goes away, and can leave more scars on them - a cycle of trauma and depression that may not end.\n\nAnd because you're depressed, there's probably a list of tasks that you just don't have the energy to get to, like dishes/laundry/dusting. What's another task on that list?\n\nIf you put suicide off until tomorrow, you always have a tomorrow. And maybe, just maybe, tomorrow will be different from today", 'I’m a lady and the r/askmen subreddit is so much better than r/askwomen. That one is way too heavily moderated. Posts get removed and tons of comments are deleted in almost every thread that gets traction. Anyway, I’m 39 and I’m not very surprised by these answers, but I would have been 20 years ago. Men are people and are stereotyped just like women are. I’m glad to see there’s a spectrum.', '1. Depends on how and how much you do it. E.g., you could be gripping your dick to hard, making it less sensitive. On a positive note, masturbation and self-exploration in general are great tools to find out for yourself how you like to be stimulated, which you can then easier communicate to you partner.\n2. Everyone is different. Some people do, some don\'t. 6" is above average. What\'s most important for you: don\'t obsess about your size. It\'s neither a guarantee for good sex nor an absolute deal-breaker. Learn to love what you\'ve got.\nEdit: something that every girl will like is a clean dick. Wash it. If you\'re uncut, pull the foreskin back and carefully clean it. Don\'t use aggressive soap.\n3. Again, everyone is different. Depends on what you and/or your partner likes. You want to finish fast? Great go ahead. Want to make your partner cum first, maybe not even come yourself at all, so that your orgasm in an hour or two when you\'re having a second round is even better? You do you. Spread your sex session over an entire evening with several breaks in between? If you\'ve got the stamina for it, enjoy it! \nThere\'s no right or wrong when it comes to how sex works. The only important thing is that all parties involved are doing so voluntarily, are enjoying themselves and things stop or change immediately if someone feels uncomfortable.\n4. No. Just trimming my pubes.', "You are correct that you enjoy it now but what im saying is you dont see how trully enjoyable it is to be young, once you grow up you find out that all the things you've been doing are infinitely harder to do and enjoy", 'Oh my goodness! I’m so sorry you experienced that. My 7th dose was bumped up 10mg and I did think I was dying. I remember crying a lot and being worried about my kids. Straight up felt like a near death experience but I did feel much better in the following days. I try to just remind myself during the sessions that what I experience during the session isn’t what I’m there for. I try to focus on the repairing that it’s doing to the neurons in my brain. I’m going in for my 8th visit tomorrow. It’s been 4 weeks since my last and I could tell last week that it was time. It’s the only thing that’s worked for me.', 'If you ever want to impress a woman just make her a scarf if you get to that level or like a shirt or even something really simple', "Yep science is tough, really tough. You really have to love it, I mean I do but there are/were periods where it's really tough. Actually working on your own projects helps a lot though, I feel like that's when I can really get passionate about it. Even the most annoying problems can be extremely interesting when you get to the heart of it, I love it.", "This is so wild to me cos I dont cuddle people unless they're my partner. I just don't really like physical touch all that much in general. People are so diverse", "I so badly want to leave my gross, fragile body. Not that mine is any more disgusting or weak than average. Just the entire concept of our bodies and how vulnerable they are is so fucked up. And you can't get a new one if yours sucks or something breaks. You just have to suffer for the rest of your life."]
Cluster 3
['You’d be surprised how many guys like it.\n\nWas infantry in the Army and we blasted Miley Cyrus, Taylor swift and Katy Perry in our truck on patrols sometimes.\n\nIt’s just catchy music.', "I am pretty sure it's the other way around. Most guys wipe.\n\nCan't believe I had say it."]
>>> total_array = np.array([roberta_tokenizer.encode_plus(total_comment_list[0],add_special_tokens=True,max_length=30,pad_to_max_length=True,return_attention_mask=True)['input_ids']])
>>> for i in total_comment_list[1:]:
... new_array = np.array([roberta_tokenizer.encode_plus(i,add_special_tokens=True,max_length=30,pad_to_max_length=True,return_attention_mask=True)['input_ids']])
... total_array = np.append(total_array,new_array,axis=0)
>>> clustered_sentences = [[] for i in range(3)]
>>> for sentence_id, cluster_id in enumerate(cluster_assignment):
... clustered_sentences[cluster_id].append(total_comment_list[sentence_id])
>>> for i, cluster in enumerate(clustered_sentences):
... print("Cluster ", i+1)
... print(cluster)
... print("")
Cluster 1
['', 'You’d be surprised how many guys like it.\n\nWas infantry in the Army and we blasted Miley Cyrus, Taylor swift and Katy Perry in our truck on patrols sometimes.\n\nIt’s just catchy music.', "I am pretty sure it's the other way around. Most guys wipe.\n\nCan't believe I had say it."]
Cluster 2
['Yeh I get that, that\'s why it screams peaked in high school a bit, not conclusively.\n\nTheres many kids who feel like a lot of it is a drag at the time, and don\'t _appreciate_ it at the time.\n\nIf you don\'t appreciate, and don\'t feel like it\'s good, they won\'t be the best times of your life.\n\nIn the same way, you can have times that are less good, but you feel much happier in the moment, and will back on it fondly as a result.\n\nPart of it being "good" is your mindset at the time.', '1. Depends on how and how much you do it. E.g., you could be gripping your dick to hard, making it less sensitive. On a positive note, masturbation and self-exploration in general are great tools to find out for yourself how you like to be stimulated, which you can then easier communicate to you partner.\n2. Everyone is different. Some people do, some don\'t. 6" is above average. What\'s most important for you: don\'t obsess about your size. It\'s neither a guarantee for good sex nor an absolute deal-breaker. Learn to love what you\'ve got.\nEdit: something that every girl will like is a clean dick. Wash it. If you\'re uncut, pull the foreskin back and carefully clean it. Don\'t use aggressive soap.\n3. Again, everyone is different. Depends on what you and/or your partner likes. You want to finish fast? Great go ahead. Want to make your partner cum first, maybe not even come yourself at all, so that your orgasm in an hour or two when you\'re having a second round is even better? You do you. Spread your sex session over an entire evening with several breaks in between? If you\'ve got the stamina for it, enjoy it! \nThere\'s no right or wrong when it comes to how sex works. The only important thing is that all parties involved are doing so voluntarily, are enjoying themselves and things stop or change immediately if someone feels uncomfortable.\n4. No. Just trimming my pubes.']
Cluster 3
['My god yes. iPad and folding. Or iPad and washing dishes. Like the best few minutes of every day.', 'Nihilistic', "You always have tomorrow to kill yourself\n\nYou don't get a do-over, the option to take it back, to make different choices, or to seek redress/forgiveness for those poor decisions made.\n\nYou don't get to spend time with the things/places/people that make life a little less shitty. You give up on the hope that things could get better.\n\nYou make a hole in the hearts of those that love and care for you. A hurt that never heals or goes away, and can leave more scars on them - a cycle of trauma and depression that may not end.\n\nAnd because you're depressed, there's probably a list of tasks that you just don't have the energy to get to, like dishes/laundry/dusting. What's another task on that list?\n\nIf you put suicide off until tomorrow, you always have a tomorrow. And maybe, just maybe, tomorrow will be different from today", 'I’m a lady and the r/askmen subreddit is so much better than r/askwomen. That one is way too heavily moderated. Posts get removed and tons of comments are deleted in almost every thread that gets traction. Anyway, I’m 39 and I’m not very surprised by these answers, but I would have been 20 years ago. Men are people and are stereotyped just like women are. I’m glad to see there’s a spectrum.', 'For the Damaged Coda\nSong by Blonde Redhead', "You are correct that you enjoy it now but what im saying is you dont see how trully enjoyable it is to be young, once you grow up you find out that all the things you've been doing are infinitely harder to do and enjoy", 'Oh my goodness! I’m so sorry you experienced that. My 7th dose was bumped up 10mg and I did think I was dying. I remember crying a lot and being worried about my kids. Straight up felt like a near death experience but I did feel much better in the following days. I try to just remind myself during the sessions that what I experience during the session isn’t what I’m there for. I try to focus on the repairing that it’s doing to the neurons in my brain. I’m going in for my 8th visit tomorrow. It’s been 4 weeks since my last and I could tell last week that it was time. It’s the only thing that’s worked for me.', 'If you ever want to impress a woman just make her a scarf if you get to that level or like a shirt or even something really simple', 'Thank you for sharing this.', "Yep science is tough, really tough. You really have to love it, I mean I do but there are/were periods where it's really tough. Actually working on your own projects helps a lot though, I feel like that's when I can really get passionate about it. Even the most annoying problems can be extremely interesting when you get to the heart of it, I love it.", 'Very well written. I started reading and before I knew it I was standing in the middle of the kitchen on my phone for 10 minutes.', "This is so wild to me cos I dont cuddle people unless they're my partner. I just don't really like physical touch all that much in general. People are so diverse", 'Me whenever I finish a jar of Nutella.', 'Oily penis', "I so badly want to leave my gross, fragile body. Not that mine is any more disgusting or weak than average. Just the entire concept of our bodies and how vulnerable they are is so fucked up. And you can't get a new one if yours sucks or something breaks. You just have to suffer for the rest of your life."]
