category | avg(price) | avg(rating) |
---|---|---|
beauty | 12.45 | 4.2 |
smartphones | 489.99 | 4.5 |
laptops | 999.99 | 4.4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from airflow import DAG | |
from airflow.operators.python import PythonOperator | |
from datetime import datetime | |
from pyspark.sql import SparkSession | |
import requests | |
import pandas as pd | |
# ------------------------------ | |
# Função que executa o ETL em Spark | |
# ------------------------------ |
category | avg(price) | avg(rating) |
---|---|---|
beauty | 12.45 | 4.2 |
smartphones | 489.99 | 4.5 |
laptops | 999.99 | 4.4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql import SparkSession | |
import requests | |
import pandas as pd | |
# 1. Cria a sessão Spark | |
spark = SparkSession.builder \ | |
.appName("ETL_Produtos_DummyJSON") \ | |
.getOrCreate() | |
# ======================= |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ============================================================ | |
# DAG: pipeline_produtos (ETL com DummyJSON → SQLite) | |
# Objetivo: demonstrar, de forma simples, a orquestração | |
# de um pipeline ETL no Airflow (Extrair → Transformar → Carregar) | |
# ============================================================ | |
# Importa a classe DAG e o operador de função Python do Airflow | |
from airflow import DAG | |
from airflow.operators.python import PythonOperator |
Features | R | Python |
---|---|---|
Best for Statistics | ✅ | |
Machine Learning | ✅ | |
Ease of Use | ✅ | |
Data Visualization | ✅ | |
Big Data Performance | ✅ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
from bs4 import BeautifulSoup | |
# URL que vamos acessar | |
url = "http://books.toscrape.com/" | |
# Requisição HTTP | |
response = requests.get(url) | |
# Verifica se deu tudo certo |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import requests | |
# Início Extração | |
# Criando a Extração dos Dados | |
response = requests.get("https://jsonplaceholder.typicode.com/posts") | |
data = response.json() | |
# Usando Pandas para processar um volume maior de Dados | |
df = pd.DataFrame(data) |
Região/Cidade | Salário Médio Mensal (R$) | Fonte |
---|---|---|
Brasil (Geral) | 13.392,93 | Salario.com.br |
São Paulo (SP) | 16.453,60 | Salario.com.br |
Espírito Santo | 12.836,35 | Quero Bolsa |
Rio de Janeiro (RJ) | 9.768 | Salario.com.br |
Minas Gerais (MG) | 12.119,48 | [Salario.com.br](https://www.salario.com. |
Característica | R | Python |
---|---|---|
Melhor para Estatística | ✅ | |
Machine Learning | ✅ | |
Facilidade de uso | ✅ | |
Visualização de dados | ✅ | |
Performance em Big Data | ✅ |
NewerOlder