- Reading and writing tables in TD through DataFrames of Spark.
- Running Spark SQL queries against DataFrames.
- Submitting Presto SQL queries to TD and reading the query results as DataFrame.
- If you use PySpark, you can use both Spark's DataFrames and Pandas DataFrames interchangeably.
-
107 Citations : IEEE Transactions on Knowledge and Data Engineering
we introduce and explore a number of item ranking techniques that can generate substantially more diverse recommendations across all users while maintaining comparable levels of recommendation accuracy. Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several real-world rating data sets and different rating prediction algorithms
Traditionally, the problem is addressed through attribute-based diversification grouping items in the result set that share many common attributes (e.g., genre for movies) and selecting only a limited number of items from each group. It is, however,
-
107 Citations : IEEE Transactions on Knowledge and Data Engineering
we introduce and explore a number of item ranking techniques that can generate substantially more diverse recommendations across all users while maintaining comparable levels of recommendation accuracy. Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several real-world rating data sets and different rating prediction algorithms
Traditionally, the problem is addressed through attribute-based diversification grouping items in the result set that share many common attributes (e.g., genre for movies) and selecting only a limited number of items from each group. It is, however,
def auc(num_positives, num_negatives, predicted): | |
l_sorted = sorted(range(len(predicted)),key=lambda i: predicted[i], | |
reverse=True) | |
fp_cur = 0.0 | |
tp_cur = 0.0 | |
fp_prev = 0.0 | |
tp_prev = 0.0 | |
fp_sum = 0.0 | |
auc_tmp = 0.0 | |
last_score = float("nan") |
update tests_summary_data set data = (jsonb_set(to_jsonb(data), '{misc,gap,pa}', '-1', false))::json where data->'misc'->'gap'->>'pa' = '0'; |
# vim: set fileencoding=utf-8 : | |
# | |
# How to store and retrieve gzip-compressed objects in AWS S3 | |
########################################################################### | |
# | |
# Copyright 2015 Vince Veselosky and contributors | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at |
#!/bin/bash | |
# const | |
USAGE="[USAGE] | |
$ ./setup <username> <reponame> <ip> | |
[Requirement] | |
exec by root | |
[Example] |
First of all, make sure that your Treasure Data cluster is HDP2, not CDH4. Matrix Factorization is only supported in the up-to-date HDP2 cluster. HDP2 is allocated for users who signed Treasure Data after Feb 2015. CDH4 is allcoated for the others.
NOTE: please ask our customer support to use HDP2 if you get an error.
Download ml-20m.zip and unzip it.
class PurchaseApprover | |
# Implements the chain of responsibility pattern. Does not know anything | |
# about the approval process, merely whether the current handler can approve | |
# the request, or must pass it to a successor. | |
attr_reader :successor | |
def initialize successor | |
@successor = successor | |
end |