Skip to content

Instantly share code, notes, and snippets.

View divy9881's full-sized avatar
💭
I can !!!!!

Divy Patel divy9881

💭
I can !!!!!
View GitHub Profile
@divy9881
divy9881 / Paper summary: SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics.md
Last active December 7, 2023 05:31
Paper summary: SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

Challenges

Training Data from Multiple Sources

  • Particularly large scale machine learning applications, like federated learning requires huge amount of data to be ingested as part of the training from multiple sources. As these sources cannot be trusted and also making sanity checks on these data can also be expensive. Hence, it opens up a opportunity for an adversary to corrupt the data
  • Specifically for more insidiuous attacks like backdoor attacks, where the adversary intends to embed a backdoor in a prediction model such that its prediction is being altered on the specific test input samples which are being modified to activate the model in a certain manner. And as the model does not behave differently on the clean samples, the model can be deployed without knowing about the backdoor

Small Data requirements to carry out Backdoor Attacks

  • From most of research works, it has been found out that only a small amount of corrupted data is enough to plant a backdoor in a neural network
  • A experim
@divy9881
divy9881 / Paper Summary: Spectral Signatures in Backdoor Attacks.md
Last active December 3, 2023 23:30
Paper Summary: Spectral Signatures in Backdoor Attacks

Challenges

Sensitive Applications of Neural Network

  • Today Neural Networks are being used widely in areas of Computer Visiion, Text Analysis, and Speech Recognition. The widespread use of neural networks also extends to being used for very sensitive applications. Hence, the security aspect for neural networks is becoming far more important than ever before
  • Most of the study involving finding solutions to securing the neural networks involved adversarial settings. As part of adversarial attack, the adversary perturbates the test samples by unimaginable amount in order to affect the classification under a deep neural network classifier

High Data Volume Requirements of Deep Neural Network

  • Deep Neural Networks cannot achieve high accuracy for classification without feeding them with huge amount of data. Hence, makes difficult to examine the data before being actually fed to the neural network for training
  • Also, huge volume of data might come from multiple sources, and hence increases the exposu
@divy9881
divy9881 / Lec-19-Offloading Distributed Applications onto SmartNICs using iPipe.md
Last active November 14, 2023 02:56
Paper Review: Offloading Distributed Applications onto SmartNICs using iPipe

Main Challenges addressed by the Paper

Increasing Network Bandwidth but stagnating Compute Power

  • In last couple of yeras, in the hardware space, the network bandwidth is being increasing continuously but there has been a very little improvements in compute power. To mitigate this gap, nowadays, the multicore SoCs(System on Chips) SmartNICs are being leveraged to offload some of the network related compute heavy tasks from the traditional CPUs

Domain specific accelaration approach

  • There have been recent research efforts which leverage FPGA based SmartNICs for offloading network functions. But they take conventional route in which the application logic is being consolidated on FPGA programmable logic blocks
  • Applications of this approach are relevant to very specific class of workloads that possesses parallelism, predictable program logic and regular data-structures which can be synthesized efficiently on FPGAs
  • Paper's focus is on to leverage such hardware-based accelaration for complex dat
@divy9881
divy9881 / Lec-18-The End of Myth: Distributed Transactions Can Scale.md
Last active November 13, 2023 05:24
Paper Review: The End of Myth: Distributed Transactions Can Scale

Main Challenges addressed by the Paper

Opaque Solutions to Avoid Distributed Transactions

  • Common perception about Distributed transactions is that they do not scale. In order to workaround this, some of the solutions avoid distributed transaction by employing locality-aware partitioning, speculative execution or by relaxing durability guarantees
  • Problem with these approaches is that they are not transparent to the developers. The developers using this databases have to not only understand the implications of this mechanisms but also design their application to be able to leverage their benefits

Increase in Contention and CPU overhead due to TCP/IP

  • One of the reasons, why the distributed transactions are thought as not being able to scale, is because of increase in resource contention. This increase in contention is on shared resources like shared data structures to maintain the global state of transactions and for recovery purpose
  • Contention can be thought of as a side-effect. But the ma
@divy9881
divy9881 / Lec-17-Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS.md
Last active November 2, 2023 00:23
Paper Review: Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS

Main Challenges addressed by the Paper

Limited GPU memory capacity

  • The wider adoption of GPU for hardware accelaration is limited because of its limited memory capacity. One of the approach to mitigate this is by leveraging CPU and GPU both by achieving heterogeneous query processing
  • But to achieve such a design, we have to pay a cost in form of higher design complexity in developing data placement and heterogeneous query execution across devices to achieve optimal performance

Bottleneck due to high traffic in PCIe transport

  • One of the potential solution to limited memory capacity of the GPU would be to transfer data to GPU on-demand through PCIe link
  • This can work well but under heavy-load conditions, it can add a significant data traffic on PCIe links. This can become a performance bottleneck, due to which highly optimized databases can degrade in their performance

Key Contributions of the Paper

Fine-grained Semantic-Aware Data Placement

@divy9881
divy9881 / Lec-16-Cloud-Native Transactions and Analytics in SingleStore.md
Last active November 1, 2023 19:32
Paper Review: Cloud-Native Transactions and Analytics in SingleStore

Main Challenges addressed by the Paper

Serving OLAP and OLTP on same data

  • Nowadays, the workloads require low latency, high throughput point writes and reads, and at the same time efficient batch loading and range of column values lookup for complex analytical queries on the same data
  • One alternative to achieve this would be to employ different domain specific databases on multiple data copies. But incurs a lot of cost on, requiring training of developers to manage those databases, having to move and transform data, and storing multiple copies of data

Cost-performance requirements hard to be met with specialized DBs

  • Supporting low-latency writes and complex analytical queries on the same data with end-to-end latency of seconds to subseconds, from having to process new incoming data and being able to generate real-time insights from these data becomes very difficult to achieve by employing domain specific databases
  • This made authors to come up with a database which harnesses columnar stor
@divy9881
divy9881 / Lec-14-SkyPilot: An Intercloud Broker for Sky Computing.md
Last active October 30, 2023 17:08
Paper Review: SkyPilot: An Intercloud Broker for Sky Computing

Main Challenges addressed by the Paper

Incompatibility among Cloud Providers

  • Most fundamental difference in the cloud ecosystems is the high degree of incompatibility of cloud provider and their services offered, which make up this ecosystem
  • As its origins were result of the motivation for shifting from on-premise infrastructures rather than interconnected communication infrastructure. As a result, the Cloud Providers to demonstrate their competitive advantage they emphasize more on their differences with the cloud services offered by other cloud providers

Data Gravity: High costs for data leaving from Cloud

  • Cloud providers charges customers more on data leaving than data entering, which make data sharing costly among different cloud providers, and hence, makes managing and deploying truly multi-cloud applications, expensive
  • Also they do proprietary service interfaces which leads to significant customer lock-in: it is hard for companies who have established their computational workloads
@divy9881
divy9881 / Lec-15-Self Driving Database Management Systems.md
Last active October 29, 2023 23:47
Paper Review: Self Driving Database Management Systems

Main Challenges addressed by the Paper

Requires Human Interventions

  • Previous works on auto-tuning, mostly required human intervention to take the final decisions to make changes in the database, and they are reactionary in nature, and cannot forecast nature and intensity of future workload
  • This makes tuning of databases, in order to adhere to cost-performance requirements, very slow and manual. This can affect the cost of managing the database and performance of the databases while they are tuned

Niche Auto-tuning

  • Existing auto-tuning solutions focus on specific aspects of the databases, for ex. some databases are concerned with physical or logical design of the database such as indexes, partiotioned schemas, data organization, or materialized views
  • Today's modern application requirements demands integrated components that support adaptive architectures

Key Contributions of the Paper

Self Driving Architecture

@divy9881
divy9881 / Lec-12-PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers.md
Last active October 21, 2023 17:54
Paper Review: PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers

Main Challenges addressed by the Paper

Shortcomings of Monolithic Architecture

  • In Monolithic architecture, where all resources like CPU, Memory and Storage are tightly coupled, the platform owners have to make a tough decision on allocating resources for various database instances
  • It is very difficult to attain high utilization of all resources in a cluster with this architecture. Also, it becomes difficult for customer to scale up or scale down the resources based on the load
  • Monolithic architecture also enforces the problem of fate sharing where the failure of one resource becomes a cause for failure of other resources, and makes it unable to recover independently of others

Bundled Memory with Compute

  • Memory and CPU bundled in a machine, has a problem of flexibly scaling memory resources. Disaggregation of Memory can lead to sharing of the resources among read replicas and primary. This can offload the read-only load from primary instances

Key Contributions of the Paper

Share

@divy9881
divy9881 / Lec-11-Starling: A Scalable Query Engine on Cloud Functions.md
Last active October 17, 2023 23:07
Paper Review: Starling: A Scalable Query Engine on Cloud Functions

Main Challenges addressed by the Paper

Complex Provisioning and Uneconomical Payback model

  • Cloud based Data Analytical services, like Amazon Redshift and Azure SQL Data Warehouse, are elastic in terms of scaling but involve making complex provisioning decisions. The analytics workloads are largely unpredicatable and provisioning well can be difficult which can lead to over-provisioning in many cases
  • This could lead to paying excessive amount of money for under-utilized analytics cluster. The cloud services which allows dynamic scaling of compute nodes, takes a lot of time which makes it impractical to adopt pay-per-query model

Expensive and Proprietary storage

  • Several cloud data management services requires the data to be stored on local disks in proprietary format to perform well. This storage can be expensive in comparison with other cloud-based storage services like Amazon S3

Hurdles in current Serverless services

  • Most of the cloud function services have limited memory, execution