Skip to content

Instantly share code, notes, and snippets.

View haniefhan's full-sized avatar

Hanief haniefhan

View GitHub Profile
@haniefhan
haniefhan / medium_8547b733ae6a_01_tabula_test.py
Last active August 27, 2021 23:15
Read PDF Document using tabula, page 8
import tabula
pdf_path = "PMDN 72 TH 2019+lampiran.pdf"
output = tabula.read_pdf(
input_path =pdf_path,
pages="8",
output_format="json"
)
print(output)
@haniefhan
haniefhan / medium_8547b733ae6a_02_tabula_test_result.json
Last active August 27, 2021 23:08
Result of read PDF Document using tabula, page 8
D:\python\venv\indonesia_region\Scripts\python.exe D:/python/project/indonesia_region/main.py
[{'extraction_method': 'lattice', 'top': 96.0, 'left': 37.350002, 'width': 879.91845703125, 'height': 470.70013427734375, 'right': 917.26843, 'bottom': 566.70013, 'data': [[{'top': 96.0, 'left': 37.350002, 'width': 52.64999771118164, 'height': 48.0, 'text': 'K O D E'}, {'top': 96.0, 'left': 90.0, 'width': 144.0, 'height': 48.0, 'text': 'NAMA PROVINSI /\rKABUPATEN / KOTA'}, {'top': 96.0, 'left': 234.0, 'width': 60.0, 'height': 24.0, 'text': 'JUMLAH'}, {'top': 96.0, 'left': 294.0, 'width': 306.0, 'height': 24.0, 'text': 'N A M A/J U M L A H'}, {'top': 96.0, 'left': 600.0, 'width': 54.0, 'height': 48.0, 'text': 'LUAS\rWILAYAH\r(Km2)'}, {'top': 96.0, 'left': 654.0, 'width': 53.45452880859375, 'height': 48.0, 'text': 'JUMLAH\rPENDUDUK\r(Jiwa)'}, {'top': 96.0, 'left': 707.4545, 'width': 209.81390380859375, 'height': 48.0, 'text': 'K E T E R A N G A N'}, {'top': 0.0, 'left': 0.0, 'width': 0.0, 'height': 0.0, 'text': ''}, {'
@haniefhan
haniefhan / medium_8547b733ae6a_03_tabula_test.py
Created August 27, 2021 23:13
Read PDF Document using tabula into csv, page 8
import tabula
pdf_path = "PMDN 72 TH 2019+lampiran.pdf"
tabula.convert_into(
input_path=pdf_path,
output_path="aceh_page_8.csv",
output_format="csv",
pages="8"
)
@haniefhan
haniefhan / medium_8547b733ae6a_04_tabula_test_result.csv
Created August 28, 2021 03:37
The contents of file aceh_page_8.csv
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 3.
K O D E,"NAMA PROVINSI /
KABUPATEN / KOTA",JUMLAH,N A M A/J U M L A H,"LUAS
WILAYAH
(Km2)","JUMLAH
PENDUDUK
(Jiwa)",K E T E R A N G A N,,,
"",KAB,KOTA,KECAMATAN,KELURAHAN,D E S A,,,,
11,ACEH,,,,,,,,"UU. No. 11 Tahun 2006
Perbaikan nama sesuai Surat Pemkab Aceh Selatan
No.140/819/2016 tgl 14 okt 2016 dan Rekomedasi Ditjen Bina
@haniefhan
haniefhan / medium_8547b733ae6a_05_get_aceh_province_region_data.py
Created August 28, 2021 03:52
Read PDF Document using tabula, page 8 - 274
import tabula
pdf_path = "PMDN 72 TH 2019+lampiran.pdf"
tabula.convert_into(
input_path=pdf_path,
output_path="aceh.csv",
output_format="csv",
pages="8-274"
)
@haniefhan
haniefhan / medium_8547b733ae6a_06_aceh.csv
Created August 28, 2021 03:56
The contents of file aceh.csv
We can't make this file beautiful and searchable because it's too large.
K O D E,"NAMA PROVINSI /
KABUPATEN / KOTA",JUMLAH,N A M A/J U M L A H,"LUAS
WILAYAH
(Km2)","JUMLAH
PENDUDUK
(Jiwa)",K E T E R A N G A N,,,
"",KAB,KOTA,KECAMATAN,KELURAHAN,D E S A,,,,
11,ACEH,,,,,,,,"UU. No. 11 Tahun 2006
Perbaikan nama sesuai Surat Pemkab Aceh Selatan
No.140/819/2016 tgl 14 okt 2016 dan Rekomedasi Ditjen Bina
@haniefhan
haniefhan / medium_8547b733ae6a_07_get_all_province_region_data.py
Last active August 28, 2021 11:01
Read All Region PDF Document using tabula
import tabula
pdf_path = "PMDN 72 TH 2019+lampiran.pdf"
output_folder = "csv/"
region_list = [
# sumatera
{"file": output_folder + "11-aceh.csv", "pages": "8-274"},
{"file": output_folder + "12-sumut.csv", "pages": "287-582"},
{"file": output_folder + "13-sumbar.csv", "pages": "601-653"},
import csv
import json
province = [] # provinsi
regency_or_city = [] # kabupaten or kota
district = [] # kecamatan
administrative_village_or_village = [] # kelurahan or desa
# check if string has a number
def hasANumber(inputString):
@haniefhan
haniefhan / medium_8547b733ae6a_09_parse_aceh_result.json
Last active August 30, 2021 22:41
Result of parse Aceh CSV test 1
{
"province": [
{
"prov_code": "11",
"prov_name": "ACEH"
}
],
"regency_or_city": [
{
"kab_code": "11.01",
@haniefhan
haniefhan / medium_8547b733ae6a_10_parse_aceh_result_province.json
Created August 31, 2021 01:15
Result of parse Aceh CSV test 1 - Province
[
{
"prov_code": "11",
"prov_name": "ACEH"
}
]