Bootstrap Inference for K-Nearest Neighbour Matching Estimators
In: IZA Discussion Paper No. 5361
50780 Ergebnisse
Sortierung:
In: IZA Discussion Paper No. 5361
SSRN
In: Nonresponse in survey research : proceedings of the Eighth International Workshop on Household Survey Nonresponse, 24-16 September 1997, S. 285-298
Der Verfasser entwickelt eine Methode der Ableitung, die sowohl auf einem multivariaten Regressionsmodell als auch auf einem "nearest neighbour hot decking" beruht. Er wendet diese Methode erfolgreich auf eine ratioskalierte Variable an, die aus einer hohen Zahl unbekannter Nullwerte besteht. Die so erhaltenen Ergebnisse werden mit Ergebnissen verglichen, die vermittels random hot decking berechnet wurden. Der Verfasser versucht darüber hinaus, Varianzen zu schätzen, die die Tatsache berücksichtigen, dass einige Daten abgeleitet sind. Diese Methode führt zu einer zusätzlichen Komponente der Varianz, die als Ableitungsvarianz bezeichnet wird. Im ersten Teil des Beitrags werden Ableitungsmethoden und -strategien im allgemeinen diskutiert. Der Verfasser entwickelt auch einen diagnostischen Test für die Qualität von Ableitungen. Dieser Test überprüft, wie oft dieselbe Quelle zur Ableitung fehlender Werte verwandt wird. (ICEÜbers)
SSRN
Working paper
SSRN
Working paper
In: IZA Discussion Paper No. 4239
SSRN
In: Iraqi journal of science, S. 1036-1045
ISSN: 0067-2904
E-mail is an efficient and reliable data exchange service. Spams are undesired e-mail messages which are randomly sent in bulk usually for commercial aims. Obfuscated image spamming is one of the new tricks to bypass text-based and Optical Character Recognition (OCR)-based spam filters. Image spam detection based on image visual features has the advantage of efficiency in terms of reducing the computational cost and improving the performance. In this paper, an image spam detection schema is presented. Suitable image processing techniques were used to capture the image features that can differentiate spam images from non-spam ones. Weighted k-nearest neighbor, which is a simple, yet powerful, machine learning algorithm, was used as a classifier. The results confirm the effectiveness of the proposed schema as it is evaluated over two datasets. The first dataset is a real and benchmark dataset while the other is a real-like, modern, and more challenging dataset collected from social media and many public available image spam datasets. The obtained accuracy was 99.36% and 91% on benchmark and the proposed dataset, respectively.
In: Journal of development economics, Band 145, S. 102460
ISSN: 0304-3878
In: Journal of development economics, Band 145
ISSN: 0304-3878
World Affairs Online
In: PIER Working Paper No. 14-028
SSRN
Working paper
In: American economic review, Band 108, Heft 11, S. 3154-3169
ISSN: 1944-7981
The National Resident Matching program seeks a stable matching of medical students to teaching hospitals. With couples, stable matchings need not exist. Nevertheless, for any student preferences, we show that each instance of a matching problem has a "nearby" instance with a stable matching. The nearby instance is obtained by perturbing the capacities of the hospitals. In this perturbation, aggregate capacity is never reduced and can increase by at most four. The capacity of each hospital never changes by more than two. (JEL C78, D47, I11, J41, J44)
In: Natural hazards and earth system sciences: NHESS, Band 2, Heft 3/4, S. 247-253
ISSN: 1684-9981
Abstract. This paper presents two avalanche forecasting applications NXD2000 and NXD-REG which were developed at the Swiss Federal Institute for Snow and Avalanche Re-search (SLF). Even both are based on the nearest neighbour method they are targeted to different scales. NXD2000 is used to forecast avalanches on a local scale. It is operated by avalanche forecasters responsible for snow safety at snow sport areas, villages or cross country roads. The area covered ranges from 10 km2 up to 100 km2 depending on the climatological homogeneity. It provides the forecaster with ten most similar days to a given situation. The observed avalanches of these days are an indication of the actual avalanche danger. NXD-REG is used operationally by the Swiss avalanche warning service for regional avalanche forecasting. The Nearest Neighbour approach is applied to the data sets of 60 observer stations. The results of each station are then compiled into a map of current and future avalanche hazard. Evaluation of the model by cross-validation has shown that the model can reproduce the official SLF avalanche forecasts in about 52% of the days.
In: CAIE-D-23-01018
SSRN
In: Blätter der DGVFM, Band 27, Heft 4, S. 647-664
ISSN: 1864-0303
In: Iraqi journal of science, S. 4987-5003
ISSN: 0067-2904
Data mining has the most important role in healthcare for discovering hidden relationships in big datasets, especially in breast cancer diagnostics, which is the most popular cause of death in the world. In this paper two algorithms are applied that are decision tree and K-Nearest Neighbour for diagnosing Breast Cancer Grad in order to reduce its risk on patients. In decision tree with feature selection, the Gini index gives an accuracy of %87.83, while with entropy, the feature selection gives an accuracy of %86.77. In both cases, Age appeared as the most effective parameter, particularly when Age<49.5. Whereas Ki67 appeared as a second effective parameter. Furthermore, K- Nearest Neighbor is based on the minimum error rate, and the test maximum accuracy for K_value selection with an accuracy of 86.24%. Where the distance metric has been assigned using the Euclidean approach. From previous models, it seems that Breast Cancer Grade2 is the most prevalent type. For the future perspective, a comparative study could be performed to compare the supervised and unsupervised data mining algorithms.
In: Statistika: statistics and economy journal, Band 103, Heft 2, S. 226-234
ISSN: 1804-8765
The k-Nearest Neighbour method is a popular nonparametric technique for solving classification and regression problems without having to make potentially restrictive a priori assumptions about the functional form of the statistical relationship under investigation. The purpose of this paper was to demonstrate that the scope of this method can be extended in a way that enables the simultaneous consideration of continuous, ordered discrete, and unordered discrete explanatory variables. An exemplary application to a publicly available dataset demonstrated the feasibility of the proposed approach.