Perbandingan Performa Algoritma Metode Bagging dan Boosting pada Prediksi Konsentrasi PM10 di Jakarta Utara

Elita Rizkiani Putri(1*), Dede Brahma Arianto(2)
(1) Program Studi Kesehatan Lingkungan, Fakultas Kesehatan Masyarakat, Universitas Indonesia
(2) Magister Informatika, Fakultas Teknologi Industri, Universitas Islam Indonesia
(*) Corresponding Author



Abstrak


Jakarta Utara merupakan salah satu wilayah di DKI Jakarta yang mengalami peningkatan hari dengan kualitas udara berkategori tidak sehat, yakni 21 hari pada tahun 2017 menjadi 117 hari di 2018, tetapi kemudian menurun menjadi 45 hari pada tahun 2019. Kategori tidak sehat tersebut dipengaruhi oleh polusi udara. Salah satu polutan yang ada di udara adalah PM10. Saat ini, kualitas udara dapat diprediksi menggunakan pendekatan algoritma machine learning. Contoh metode machine learning yang terkenal adalah Metode Bagging dan Boosting yang ada di Metode Ensemble. Contoh algoritma dengan Metode Bagging adalah Random Forest, sedangkan pada Metode Boosting adalah Catboost dan XGBoost. Penelitian ini bertujuan membandingkan performa algoritma Metode Bagging berupa Random Forest dan algoritma Metode Boosting berupa Catboost dan XGBoost dalam memprediksi konsentrasi PM10 di Jakarta Utara. Data yang digunakan adalah data harian tahun 2017—2019 untuk faktor meteorologis dan polutan lainnya di wilayah tersebut. Faktor meteorologis digunakan karena faktor ini dapat memengaruhi konsentrasi dan pembentukan polutan. Sementara itu, faktor polutan digunakan karena beberapa penelitian sebelumnya menggunakan faktor ini dalam memprediksi konsentrasi PM10. Penelitian ini dilakukan dengan studi literatur, pemerolehan data, pra-pemprosesan data, dan pemodelan data. Beberapa metrik evaluasi juga digunakan untuk melihat evaluasi dari pemodelan. Berdasarkan hasil pemodelan, algoritma Random Forest menghasilkan akurasi data testing yang lebih tinggi (R2 = 0,6424) dibandingkan XGBoost (R2 = 0,6340) dan Catboost (R2 = 0,6294).

Kata Kunci


PM10, Faktor meteorologi, Random Forest, Catboost, XGBoost


Teks Lengkap:

PDF


Referensi


[1] National Institute of Environmental Health Sciences, “Air Pollution and Your Health,” National Institute of Environmental Health Sciences. Accessed: Oct. 10, 2023. [Online]. Available: https://www.niehs.nih.gov/health/topics/agents/air-pollution/index.cfm

[2] A. Bozdağ, Y. Dokuz, and Ö. B. Gökçek, “Spatial prediction of PM10 concentration using machine learning algorithms in Ankara, Turkey,” Environ. Pollut., vol. 263, 2020, doi: 10.1016/j.envpol.2020.114635.

[3] A. Biswal, V. Singh, L. Malik, G. Tiwari, K. Ravindra, and S. Mor, “Spatially resolved hourly traffic emission over megacity Delhi using advanced traffic flow data,” Earth Syst. Sci. Data, vol. 15, no. 2, pp. 661–680, Feb. 2023, doi: 10.5194/ESSD-15-661-2023.

[4] P. Lestari, M. Khafid Arrohman, S. Damayanti, and Z. Klimont, “Emissions and spatial distribution of air pollutants from anthropogenic sources in Jakarta,” 2022, doi: 10.1016/j.apr.2022.101521.

[5] World Health Organization, “Ambient (outdoor) air pollution,” World Health Organization. Accessed: Oct. 10, 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health

[6] US EPA, “Particulate Matter (PM) Basics.” Accessed: Oct. 23, 2023. [Online]. Available: https://www.epa.gov/pm-pollution/particulate-matter-pm-basics

[7] Y. Liu, Y. Zhou, and J. Lu, “Exploring the relationship between air pollution and meteorological conditions in China under environmental governance,” Sci. Reports 2020 101, vol. 10, no. 1, pp. 1–11, Sep. 2020, doi: 10.1038/s41598-020-71338-7.

[8] D. Sirithian and P. Thanatrakolsri, “Relationships between Meteorological and Particulate Matter Concentrations (PM2.5 and PM10) during the Haze Period in Urban and Rural Areas, Northern Thailand,” Air, Soil Water Res., vol. 15, 2022, doi: 10.1177/11786221221117264.

[9] N. A. Dung et al., “Effect of Meteorological Factors on PM10 Concentration in Hanoi, Vietnam,” J. Geosci. Environ. Prot., vol. 7, no. 11, pp. 138–150, Nov. 2019, doi: 10.4236/GEP.2019.711010.

[10] G. Syuhada et al., “Impacts of Air Pollution on Health and Cost of Illness in Jakarta, Indonesia,” Int. J. Environ. Res. Public Health, vol. 20, no. 4, 2023, doi: 10.3390/ijerph20042916.

[11] BPS Provinsi DKI Jakarta, “Jumlah Hari berdasarkan Kategori Indeks Standar Pencemar Udara Menurut Lokasi Pengukuran di Provinsi DKI Jakarta 2018,” BPS Provinsi DKI Jakarta. Accessed: Oct. 23, 2023. [Online]. Available: https://jakarta.bps.go.id/indicator/153/378/1/jumlah-hari-berdasarkan-kategori-indeks-standar-pencemar-udara-menurut-lokasi-pengukuran-di-provinsi-dki-jakarta.html

[12] Dinas Lingkungan Hidup DKI Jakarta, “Indeks Standar Pencemaran Udara (ISPU) Tahun 2019,” Open Data Jakarta. Accessed: Oct. 04, 2023. [Online]. Available: https://data.jakarta.go.id/dataset/data-indeks-standar-pencemar-udara-ispu-di-provinsi-dki-jakarta-tahun-2019

[13] A. Samad, S. Garuda, U. Vogt, and B. Yang, “Air pollution prediction using machine learning techniques – An approach to replace existing monitoring stations with virtual monitoring stations,” Atmos. Environ., vol. 310, no. July, p. 119987, 2023, doi: 10.1016/j.atmosenv.2023.119987.

[14] W. N. Shaziayani, A. Z. Ul-Saufie, S. Mutalib, N. Mohamad Noor, and N. S. Zainordin, “Classification Prediction of PM10 Concentration Using a Tree-Based Machine Learning Approach,” Atmosphere (Basel)., vol. 13, no. 4, pp. 1–11, 2022, doi: 10.3390/atmos13040538.

[15] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” J. Big Data, vol. 7, no. 1, pp. 1–45, Dec. 2020, doi: 10.1186/S40537-020-00369-8/FIGURES/9.

[16] S. Wang, Y. Ren, and B. Xia, “PM2.5 and O3 Concentration Estimation Based on Interpretable Machine Learning,” Atmos. Pollut. Res., vol. 14, no. 9, p. 101866, 2023, doi: 10.1016/j.apr.2023.101866.

[17] L. Mampitiya et al., “Machine Learning Techniques to Predict the Air Quality Using Meteorological Data in Two Urban Areas in Sri Lanka,” Environ. 2023, Vol. 10, Page 141, vol. 10, no. 8, p. 141, Aug. 2023, doi: 10.3390/ENVIRONMENTS10080141.

[18] Doreswamy, K. S. Harishkumar, Y. Km, and I. Gad, “Forecasting Air Pollution Particulate Matter (PM2.5) Using Machine Learning Regression Models,” Procedia Comput. Sci., vol. 171, no. 2019, pp. 2057–2066, 2020, doi: 10.1016/j.procs.2020.04.221.

[19] A. Barthwal, D. Acharya, and D. Lohani, “Prediction and analysis of particulate matter (PM2.5 and PM10) concentrations using machine learning techniques,” J. Ambient Intell. Humaniz. Comput., vol. 14, no. 3, pp. 1323–1338, 2023, doi: 10.1007/s12652-021-03051-w.

[20] Z. Deqing, S. Tang, R. Ci, and D. Qiong, “Analysis of the Air Pollution Index and Meteorological Factors and Risk Assessment for Tibet,” J. Phys. Conf. Ser., vol. 1838, 2021, doi: 10.1088/1742-6596/1838/1/012047.

[21] H. Yang, Q. Peng, J. Zhou, G. Song, and X. Gong, “The unidirectional causality influence of factors on PM2.5 in Shenyang city of China,” Sci. Rep., vol. 10, no. 1, pp. 1–12, 2020, doi: 10.1038/s41598-020-65391-5.

[22] A. L. Clements et al., “Source identification of coarse particles in the Desert Southwest, USA using Positive Matrix Factorization,” Atmos. Pollut. Res., vol. 8, no. 5, pp. 873–884, Sep. 2017, doi: 10.1016/J.APR.2017.02.003.

[23] T. Handhayani, “An integrated analysis of air pollution and meteorological conditions in Jakarta,” Sci. Rep., vol. 13, no. 1, pp. 1–11, 2023, doi: 10.1038/s41598-023-32817-9.

[24] Y. Wu et al., “Comparison of dry and wet deposition of particulate matter in near-surface waters during summer,” PLoS One, vol. 13, no. 6, pp. 1–15, 2018, doi: 10.1371/journal.pone.0199241.

[25] Z. Husnina, K. Wangdi, T. Puspita, S. M. Praveena, and Z. Ni, “Profiling Temporal Pattern of Particulate Matter (PM10) and Meteorological Parameters in Jakarta Province during 2020-2021,” J. Kesehat. Lingkung., vol. 15, no. 1, pp. 16–26, 2023, doi: 10.20473/jkl.v15i1.2023.16-26.

[26] S. Kirešová and M. Guzan, “Determining the Correlation between Particulate Matter PM10 and Meteorological Factors,” Eng, vol. 3, no. 3, pp. 343–363, 2022, doi: 10.3390/eng3030025.

[27] J. Kujawska, M. Kulisz, P. Oleszczuk, and W. Cel, “Machine Learning Methods to Forecast the Concentration of PM10 in Lublin, Poland,” Energies, vol. 15, no. 17, pp. 1–23, 2022, doi: 10.3390/en15176428.

[28] C. Zhang and Y. Ma, Ensemble Machine Learning: Methods and Applications. New York: Springer Publishing Company, 2012. doi: 10.1007/978-1-4419-9236-7.

[29] Y. A. Ali, E. M. Awwad, M. Al-Razgan, and A. Maarouf, “Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity,” Process. 2023, Vol. 11, Page 349, vol. 11, no. 2, p. 349, Jan. 2023, doi: 10.3390/PR11020349.

[30] D. M. Belete and M. D. Huchaiah, “Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results,” Int. J. Comput. Appl., vol. 44, no. 9, pp. 875–886, Sep. 2022, doi: 10.1080/1206212X.2021.1974663.

[31] E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis,” Informatics 2021, Vol. 8, Page 79, vol. 8, no. 4, p. 79, Nov. 2021, doi: 10.3390/INFORMATICS8040079.

[32] Z. Guo, X. Wang, and L. Ge, “Classification prediction model of indoor PM2.5 concentration using CatBoost algorithm,” Front. Built Environ., vol. 9, p. 1207193, Jul. 2023, doi: 10.3389/FBUIL.2023.1207193/BIBTEX.

[33] I. Ayus, N. Natarajan, and D. Gupta, “Comparison of machine learning and deep learning techniques for the prediction of air pollution: a case study from China,” Asian J. Atmos. Environ., vol. 17, no. 1, pp. 1–22, Dec. 2023, doi: 10.1007/S44273-023-00005-W/FIGURES/14.

[34] M. Méndez, M. G. Merayo, and M. Núñez, “Machine learning algorithms to forecast air quality: a survey,” Artif. Intell. Rev. 2023 569, vol. 56, no. 9, pp. 10031–10066, Feb. 2023, doi: 10.1007/S10462-023-10424-4.

[35] T. Plocoste and S. Laventure, “Forecasting PM10 Concentrations in the Caribbean Area Using Machine Learning Models,” Atmosphere (Basel)., vol. 14, no. 1, pp. 1–13, 2023, doi: 10.3390/atmos14010134.

[36] A. Botchkarev, “Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology,” Interdiscip. J. Information, Knowledge, Manag., vol. 14, pp. 45–76, Sep. 2018, doi: 10.28945/4184.

[37] J. Kaliappan, K. Srinivasan, S. Mian Qaisar, K. Sundararajan, C. Y. Chang, and C. Suganthan, “Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate,” Front. Public Heal., vol. 9, no. September, pp. 1–12, 2021, doi: 10.3389/fpubh.2021.729795.

[38] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput. Sci., vol. 7, pp. 1–24, 2021, doi: 10.7717/PEERJ-CS.623.

[39] I. N. Achmad and R. S. Witiastuti, “Underpricing, Institutional Ownership and Liquidity Stock of IPO Companies in Indonesia,” Manag. Anal. J., vol. 7, no. 3, pp. 281–291, 2018, [Online]. Available: http://maj.unnes.ac.id

[40] Badan Meteorologi Klimatologi dan Geofisika, “Pusat Database BMKG,” Badan Meteorologi Klimatologi dan Geofisika. Accessed: Oct. 04, 2023. [Online]. Available: https://dataonline.bmkg.go.id/home

[41] Dinas Lingkungan Hidup DKI Jakarta, “Indeks Standar Pencemaran Udara (ISPU) Tahun 2017,” Open Data Jakarta. Accessed: Oct. 04, 2023. [Online]. Available: https://data.jakarta.go.id/dataset/indeks-standar-pencemaran-udara-ispu-tahun-2017

[42] Dinas Lingkungan Hidup DKI Jakarta, “Indeks Standar Pencemaran Udara (ISPU) Tahun 2018,” Open Data Jakarta. Accessed: Oct. 04, 2023. [Online]. Available: https://data.jakarta.go.id/dataset/indeks-standar-pencemar-udara-di-provinsi-dki-jakarta-tahun-2018

[43] P. Mishra, C. M. Pandey, U. Singh, A. Gupta, C. Sahu, and A. Keshri, “Descriptive statistics and normality tests for statistical data,” Ann. Card. Anaesth., vol. 22, no. 1, pp. 67–72, 2019, doi: 10.4103/aca.ACA_157_18.

[44] J. W. Heo et al., “Smoking is associated with pneumonia development in lung cancer patients,” BMC Pulm. Med., vol. 20, no. 1, pp. 1–8, May 2020, doi: 10.1186/S12890-020-1160-8/TABLES/3.

[45] C. Xiao, J. Ye, R. M. Esteves, and C. Rong, “Using Spearman’s correlation coefficients for exploratory data analysis on big dataset,” Concurr. Comput. Pract. Exp., vol. 28, no. 14, pp. 3866–3878, Sep. 2016, doi: 10.1002/CPE.3745.

[46] M. Lobo and R. D. Guntur, “Spearman’s rank correlation analysis on public perception toward health partnership projects between Indonesia and Australia in East Nusa Tenggara Province,” J. Phys. Conf. Ser., vol. 1116, no. 2, 2018, doi: 10.1088/1742-6596/1116/2/022020.

[47] V. R. Joseph, “Optimal Ratio for Data Splitting,” Stat. Anal. Data Min., vol. 15, no. 4, pp. 531–538, 2022, doi: 10.1002/sam.11583.

[48] W. L. Kusuma, W. Chih-Da, Z. Yu-Ting, H. H. Hapsari, and J. L. Muhamad, “PM2.5 Pollutant in Asia—A Comparison of Metropolis Cities in Indonesia and Taiwan,” Int. J. Environ. Res. Public Health, vol. 16, no. 24, pp. 1–12, 2019, doi: 10.3390/ijerph16244924.

[49] K. I. Solihah, D. N. Martono, and B. Haryanto, “Analysis of Spatial Distribution of PM2.5and Human Behavior on Air Pollution in Jakarta,” IOP Conf. Ser. Earth Environ. Sci., vol. 940, no. 1, 2021, doi: 10.1088/1755-1315/940/1/012018.


Artikel Statistik

Abstrak telah dilihat : 42 kali
PDF telah dilihat : 41 kali

Refbacks

  • Saat ini tidak ada refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 

Alamat Redaksi :
Departemen Sistem Informasi, Fakultas Teknologi Informasi
Universitas Andalas
Kampus Limau Manis, Padang 25163, Sumatera Barat

email: teknosi@fti.unand.ac.id

  Jumlah Pengunjung :

 

Creative Commons License
This work by JSI-Unand and licensed under a CC BY-SA 4.0 International License.