Skip to content

Instantly share code, notes, and snippets.

Avatar

Takuya Kitazawa takuti

View GitHub Profile
@xerial
xerial / td-spark-usage.md
Last active Jan 29, 2019
td-spark usage notes
View td-spark-usage.md

td-spark usage notes

What You Can Do With td-spark

  • Reading and writing tables in TD through DataFrames of Spark.
  • Running Spark SQL queries against DataFrames.
  • Submitting Presto SQL queries to TD and reading the query results as DataFrame.
  • If you use PySpark, you can use both Spark's DataFrames and Pandas DataFrames interchangeably.
@zyfnhct
zyfnhct / draft.md
Created Sep 27, 2018 — forked from kumarbhrgv/draft.md
Diversity in Recommendation Systems
View draft.md

Improvising diversity of personalized recommendation systems

Recent Research papers:

  • Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques:

    107 Citations : IEEE Transactions on Knowledge and Data Engineering
    we introduce and explore a number of item ranking techniques that can generate substantially more diverse recommendations across all users while maintaining comparable levels of recommendation accuracy. Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several real-world rating data sets and different rating prediction algorithms

  • Recommendation Diversification Using Explanations: (Data Engineering, 2009. ICDE '09. IEEE 25th International Conference)

Traditionally, the problem is addressed through attribute-based diversification grouping items in the result set that share many common attributes (e.g., genre for movies) and selecting only a limited number of items from each group. It is, however,

@kumarbhrgv
kumarbhrgv / draft.md
Last active Feb 1, 2022
Diversity in Recommendation Systems
View draft.md

Improvising diversity of personalized recommendation systems

Recent Research papers:

  • Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques:

    107 Citations : IEEE Transactions on Knowledge and Data Engineering
    we introduce and explore a number of item ranking techniques that can generate substantially more diverse recommendations across all users while maintaining comparable levels of recommendation accuracy. Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several real-world rating data sets and different rating prediction algorithms

  • Recommendation Diversification Using Explanations: (Data Engineering, 2009. ICDE '09. IEEE 25th International Conference)

Traditionally, the problem is addressed through attribute-based diversification grouping items in the result set that share many common attributes (e.g., genre for movies) and selecting only a limited number of items from each group. It is, however,

View auc.py
def auc(num_positives, num_negatives, predicted):
l_sorted = sorted(range(len(predicted)),key=lambda i: predicted[i],
reverse=True)
fp_cur = 0.0
tp_cur = 0.0
fp_prev = 0.0
tp_prev = 0.0
fp_sum = 0.0
auc_tmp = 0.0
last_score = float("nan")
@faulker
faulker / update_json.sql
Created Oct 27, 2016
Example of how to update a Postgresql JSON field using jsonb_set
View update_json.sql
update tests_summary_data set data = (jsonb_set(to_jsonb(data), '{misc,gap,pa}', '-1', false))::json where data->'misc'->'gap'->>'pa' = '0';
@veselosky
veselosky / s3gzip.py
Last active Apr 19, 2022
How to store and retrieve gzip-compressed objects in AWS S3
View s3gzip.py
# vim: set fileencoding=utf-8 :
#
# How to store and retrieve gzip-compressed objects in AWS S3
###########################################################################
#
# Copyright 2015 Vince Veselosky and contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
@ganmacs
ganmacs / gist:a1c81c37fa023516d23e
Last active Aug 29, 2015
cloud create lxc an git repository
View gist:a1c81c37fa023516d23e
#!/bin/bash
# const
USAGE="[USAGE]
$ ./setup <username> <reponame> <ip>
[Requirement]
exec by root
[Example]
View movielens_20m.md

First of all, make sure that your Treasure Data cluster is HDP2, not CDH4. Matrix Factorization is only supported in the up-to-date HDP2 cluster. HDP2 is allocated for users who signed Treasure Data after Feb 2015. CDH4 is allcoated for the others.

NOTE: please ask our customer support to use HDP2 if you get an error.

Data preparation

Download ml-20m.zip and unzip it.

@martindemello
martindemello / chain-of-responsibility.rb
Created Feb 20, 2015
chain of responsibility example in ruby
View chain-of-responsibility.rb
class PurchaseApprover
# Implements the chain of responsibility pattern. Does not know anything
# about the approval process, merely whether the current handler can approve
# the request, or must pass it to a successor.
attr_reader :successor
def initialize successor
@successor = successor
end
@sonots
sonots / fluentd_hacking_guide.md
Last active Aug 30, 2021
Fluentd ソースコード完全解説 (v0.10向け)
View fluentd_hacking_guide.md

Fluentd ソースコード完全解説

英題:Fluentd Hacking Guide

目次

30分しかないため斜線部分は今回省く

  • Fluentd の起動シーケンスとプラグインの読み込み
  • Fluentd の設定ファイルのパース
  • Input Plugin から Output Plugin にデータが渡る流れ