Aspek | Bag of Words (BoW) | Word Embedding |
---|---|---|
Definisi | Model yang mewakili teks sebagai kumpulan kata tanpa urutan, berdasarkan frekuensi kata. | Representasi kata dalam bentuk vektor berdimensi rendah yang mencerminkan makna semantik kata tersebut. |
Representasi | Setiap kata diwakili dengan frekuensi atau apakah kata tersebut ada dalam dokumen (misalnya, dengan 0 atau 1). | Setiap kata diwakili dengan vektor angka dalam ruang berdimensi rendah yang menangkap konteks semantik kata tersebut. |
Konteks dan Urutan | Mengabaikan urutan kata, hanya mempertimbangkan kemunculan kata dalam dokumen. | Mempertimbangkan konteks kata dalam kali |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def convertDate(text): | |
import re | |
pattern = r'\b(\d{1,2})\s+(jan|january|januari|feb|februari|february|mar|march|maret|apr|april|may|mei|jun|juni|jul|juli|july|aug|august|agustus|sep|september|sept|okt|oct|oktober|october|nov|november|dec|des|desember|december)\s+(\d{4})\b' | |
matches = re.findall(pattern, text, flags=re.IGNORECASE) | |
if len(matches) > 0: | |
day = matches[0][0] | |
month = matches[0][1].lower() # normalize ke lowercase |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"size": 10, | |
"query": { | |
"bool": { | |
"should": [ | |
{ | |
"multi_match": { | |
"query": "kopi manis", | |
"fields": [ | |
"deskripsi^3", // Boost 3 untuk deskripsi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"bool": { | |
"must": [ | |
{ "match": { "judul": "kopi" } } | |
], | |
"should": [ | |
{ "match": { "deskripsi": "manis" } }, | |
{ "match": { "warna": "coklat" } } | |
], | |
"must_not": [ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
grouped_df2 = grouped_df.groupby('bulan')['total'].agg( | |
Mean='mean', # Menghitung rata-rata | |
Q1=lambda x: x.quantile(0.25), # Menghitung kuartil pertama (Q1) | |
Q3=lambda x: x.quantile(0.75), # Menghitung kuartil ketiga (Q3) | |
Min='min', # Menghitung nilai minimum | |
Max='max' # Menghitung nilai maksimum | |
).reset_index() | |
grouped_df2[['Mean', 'Q1', 'Q3', 'Min', 'Max']] = np.floor(grouped_df2[['Mean', 'Q1', 'Q3', 'Min', 'Max']]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def updateData(df): | |
data = df.loc[:,['profesi_master','updated_at','userid']].values.tolist() | |
sql_update =""" | |
UPDATE tblcustomfieldsvalues | |
SET value = %s, updated_at = %s | |
WHERE relid = %s | |
""" | |
mycursor1 = connWarehouse.cursor() | |
mycursor1.executemany(sql_update,data) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
.replace(/"/g,'\\"').replace(/'/g,"\\'") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pymongo | |
import json | |
from datetime import datetime, timedelta, date, timezone | |
from pymongo import MongoClient | |
import datetime as dd | |
# 1. Create Connection | |
client = MongoClient( | |
host = '123.456.789', | |
port = int(27017), # |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#import clean function | |
from cleantext import clean | |
#provide string with emojis | |
text = "This sample text contains laughing emojis 😀 😃 😄 😁 😆 😅 😂 🤣" | |
#print text after removing the emojis from it | |
print(clean(text, no_emoji=True)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Concat WS | |
df.groupby(['name','month'])['text'].apply(','.join).reset_index() |
NewerOlder