國家衛生研究院 NHRI:Item 3990099045/16730

English | 正體中文 | 简体中文 | 全文筆數/總筆數 : 12500/13673 (91%)
造訪人次 : 2572848 線上人數 : 478

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於NHRI ‧ 管理

到手機版

國家衛生研究院 NHRI > 國家環境醫學研究所 > 陳保中 > 期刊論文 > Item 3990099045/16730

請使用永久網址來引用或連結此文件: http://ir.nhri.org.tw/handle/3990099045/16730

題名:	Effects of feature selection methods in estimating SO2 concentration variations using machine learning and stacking ensemble approach
作者:	Wong, PY;Zeng, YT;Su, HJ;Lung, SCC;Chen, YC;Chen, PC;Hsiao, TC;Adamkiewicz, G;Wu, CD
貢獻者:	National Institute of Environmental Health Sciences
摘要:	Statistical-based feature selection methods have been used for dimension reduction, but only a few studies have explored the impact of selected features on machine learning models. This study aims to investigate the effects of statistical and machine learning-based feature selection methods on spatial prediction models for estimating variations in SO2 concentrations. We collected daily SO2 observations from 1994 to 2018 along with predictor variables such as land-use/land cover allocations, roads, landmarks, meteorological factors, and satellite images, resulting in a total of 428 geographic predictors. Important features were identified using statistical-based feature selection methods including SelectKBest, stepwise feature selection, elastic net, and machine learning-based methods such as random forest. The selected features from the four feature selection methods were fitted to machine learning algorithms including gradient boosting, Cat- Boost, XGBoost, and stacking ensemble to establish prediction models for estimating SO2 concentrations. SHapley Additive exPlanations (SHAP) was applied to explain the contribution of each selected feature to the model's prediction capability. The results showed that stacking ensemble model outperformed the three single machine learning algorithms. Among the four feature selection methods, the random forest method yielded the highest prediction accuracy (R2=0.80) in the training model, followed by stepwise selection (R2=0.75), SelectKBest (R2=0.75), and elastic net (R2=0.72) in the stacking ensemble model. These results were robust after several validation tests. Our findings suggested that the random forest feature selection method was more suitable for developing machine learning models for air pollution estimation. The identified features also provide important information for urban air pollution management.
日期:	2025-02
關聯:	Environmental Technology and Innovation. 2025 Feb;37:Article number 103996.
Link to:	http://dx.doi.org/10.1016/j.eti.2024.103996
JIF/Ranking 2023:	http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=NHRI&SrcApp=NHRI_IR&KeyISSN=2352-1864&DestApp=IC2JCR
Cited Times(WOS):	https://www.webofscience.com/wos/woscc/full-record/WOS:001399960600001
Cited Times(Scopus):	https://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85213288196
顯示於類別:	[陳保中] 期刊論文 [陳裕政] 期刊論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
ISI001399960600001.pdf		7290Kb	Adobe PDF	35	檢視/開啟

在NHRI中所有的資料項目都受到原著作權保護.

TAIR相關文章

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋