In: Peter D. Easton, Martin M. Kapons, Steven J. Monahan, Harm H. Schütt, Eric H. Weisbrod; Forecasting Earnings Using k-Nearest Neighbors. The Accounting Review 2023; https://doi.org/10.2308/TAR-2021-0478
Heart disease is a non-communicable disease and the number 1 cause of death in Indonesia. According to WHO predictions, heart disease will cause 11 million deaths in 2020. Bad lifestyle and unhealthy consumption patterns of modern society are the causes of this disease experienced by many people. Lack of knowledge about heart conditions and the potential dangers cause heart disease attacks before any preventive measures are taken. This study aims to produce a system for Predicting Heart Disease, which benefits to prevent and reduce the number of deaths caused by heart disease. The use of technology in the health sector has been widely practiced in various places and one of the advanced technologies is machine learning. Machine learning technology can be used to predict the potential patients of heart disease by implementing the K-Nearest Neighbors (KNN). The algorithm results in 65.93% for its accuracy, which is then improved to 82.41% due to the z-score normalization. It shows that z-score can noticeably improve the accuracy of the KNN algorithm. The system is developed based on a website that uses the Flask micro-framework so that development is more efficient. Flask is a micro-framework based on the Python programming language that does not contain many tools and libraries, so it is more portable and does not utilize a lot of resources.
We combine the k‐Nearest Neighbors (kNN) method to the local linear estimation (LLE) approach to construct a new estimator (LLE‐kNN) of the regression operator when the regressor is of functional type and the response variable is a scalar but observed with some missing at random (MAR) observations. The resulting estimator inherits many of the advantages of both approaches (kNN and LLE methods). This is confirmed by the established asymptotic results, in terms of the pointwise and uniform almost complete consistencies, and the precise convergence rates. In addition, a numerical study (i) on simulated data, then (ii) on a real dataset concerning the sugar quality using fluorescence data, were conducted. This practical study clearly shows the feasibility and the superiority of the LLE‐kNN estimator compared to competitive estimators.
This work has been supported by the Spanish National Research Project TIN2014-57251-P and the Andalusian Research Plan P11-TIC-7765. J. Maillo and S. Ramirez hold FPU scholarships from the Spanish Ministry of Education. I. Triguero held a BOF postdoctoral fellowship from Ghent University during part of the development of this work. ; The k-Nearest Neighbors classifier is a simple yet effective widely renowned method in data mining. The actual application of this model in the big data domain is not feasible due to time and memory restrictions. Several distributed alternatives based on MapReduce have been proposed to enable this method to handle large-scale data. However, their performance can be further improved with new designs that fit with newly arising technologies. In this work we provide a new solution to perform an exact k-nearest neighbor classification based on Spark. We take advantage of its in-memory operations to classify big amounts of unseen rases against a big training dataset. The map phase computes the k-nearest neighbors in different training data splits. Afterwards, multiple reducers process the definitive neighbors from the list obtained in the map phase. The key point of this proposal lies on the management of the test set, keeping it in memory when possible. Otherwise, it is split into a minimum number of pieces, applying a MapReduce per chunk, using the caching skills of Spark to reuse the previously partitioned training set. In our experiments we study the differences between Hadoop and Spark implementations with datasets up to 11 million instances, showing the scaling-up capabilities of the proposed approach. As a result of this work an open-source Spark package is available. ; Spanish National Research Project TIN2014-57251-P ; Andalusian Research Plan P11-TIC-7765 ; Spanish Government ; Ghent University
The state budget of Indonesia is an instrument used by the government to manage the budget to archive development goals in the economy. Since 1984 to 2019 state budget of Indonesia experiences a budget deficit where state expenditure is bigger than its revenue which has an impact on the decline of the trade balance, a decrease in the level of gross domestic product that indicates the ability of the country's economic resources to weaken, and the increase in government debt to finance the budget deficit. This study applies the machine learning algorithm using the k-Nearest Neighbors classification technique to predict Indonesia's State budget deficit by using the nearest optimum distance of the k-Fold Cross-Validation algorithm. The results showed that the application of the budget deficit prediction can predict a decrease/increase in the budget deficit with an accuracy level of 63%. This level of accuracy is obtained by using the top 9 nearest neighbors distance that is most appropriate for this study.
COVID-19 has become the largest pandemic in recent history to sweep the world. This study is devoted to developing and investigating three models of the COVID-19 epidemic process based on statistical machine learning and the evaluation of the results of their forecasting. The models developed are based on Random Forest, K-Nearest Neighbors, and Gradient Boosting methods. The models were studied for the adequacy and accuracy of predictive incidence for 3, 7, 10, 14, 21, and 30 days. The study used data on new cases of COVID-19 in Germany, Japan, South Korea, and Ukraine. These countries are selected because they have different dynamics of the COVID-19 epidemic process, and their governments have applied various control measures to contain the pandemic. The simulation results showed sufficient accuracy for practical use in the K-Nearest Neighbors and Gradient Boosting models. Public health agencies can use the models and their predictions to address various pandemic containment challenges. Such challenges are investigated depending on the duration of the constructed forecast.
Pemilihan umum adalah salah satu cara dalam sistem demokrasi untuk memilih wakil rakyat yang akan duduk di lembaga perwakilan rakyat. Popularitas bukanlah faktor utama yang dapat menjamin kesuksesan calon legislatif untuk lolos pada pemilihan umum. Dibandingkan dengan mengandalkan popularitas, menentukan strategi adalah hal yang tepat untuk dilakukan saat pemilihan umum. Salah satu strategi tersebut adalah dengan menentukan daerah pemilihan. Permasalahan yang terjadi dalam menentukan daerah pemilihan adalah terjadinya ketidaksinambungan wilayah antara calon legislatif dengan daerah pemilihannya. Maka perlu memperhatikan data latar belakang anggota legislatif sebelumnya yang telah berhasil di daerah pemilihannya. Data Mining didefinisikan sebagai proses penemuan pola dalam data, algoritma K-Nearest Neighbosr adalah salah satu algoritma data mining yang melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Penlitian ini membangun sebuah sistem yang dapat menentukan daerah pemilihan yang berpotensi berhasil untuk bakal calon legislative di Jawa Barat. Hasil dari penelitian ini berupa sistem yang dapat menghasilkan rekomendasi daerah pemilihan yang berpotensi berhasil untuk bakal calon legislatif dengan tingkat akurasi data uji sebesar 85,62%. Kata kunci: pemilihan umum, calon legislatif, daerah pemilihan, data mining, k-nearest neighbors