As the internet is becoming part of our daily routine there is sudden growth and popularity of online news reading. This news can become a major issue to the public and government bodies (especially politically) if its fake hence authentication is necessary. It is essential to flag the fake news before it goes viral and misleads the society. In this paper, various Natural Language Processing techniques along with the number of classifiers are used to identify news content for its credibility. Further this technique can be used for various applications like plagiarism check , checking for criminal records.
Stochastic Gradient Descent (SGD, or 1-SGD in our notation) is probably the most popular family of optimisation algorithms used in machine learning on large data sets due to its ability to optimise efficiently with respect to the number of complete training set data touches (epochs) used. Various authors have worked on data or model parallelism for SGD, but there is little work on how SGD fits with memory hierarchies ubiquitous in HPC machines. Standard practice suggests randomising the order of training points and streaming the whole set through the learner, which results in extremely low temporal locality of access to the training set and thus, when dealing with large data sets, makes minimal use of the small, fast layers of memory in an HPC memory hierarchy. Mini-batch SGD with batch size n (n-SGD) is often used to control the noise on the gradient and make convergence smoother and more easy to identify, but this can reduce the learning efficiency wrt. epochs when compared to 1-SGD whilst also having the same extremely low temporal locality. In this paper we introduce Sliding Window SGD (SW-SGD) which uses temporal locality of training point access in an attempt to combine the advantages of 1-SGD (epoch efficiency) with n-SGD (smoother convergence and easier identification of convergence) by leveraging HPC memory hierarchies. We give initial results on part of the Pascal dataset that show that memory hierarchies can be used to improve SGD performance. (C) 2017 The Authors. Published by Elsevier B.V. ; European project ExCAPE [2] from the European Union's Horizon 2020 Research and Innovation programme [671555]
Recently, Content-Based Image Retrieval is a widely popular and efficient searching and indexing approach used by knowledge seekers. Use of images by e-commerce sites, by product and by service industries is not new nowadays. Travel and tourism are the largest service industries in India. Every year people visit tourist places and upload pictures of their visit on social networking sites or share via the mobile device with friends and relatives. Classification of the monuments is helpful to hoteliers for the development of a new hotel with state of the art amenities, to travel service providers, to restaurant owners, to government agencies for security, etc. The proposed system had extracted features and classified the Indian monuments visited by the tourists based on the linear Support Vector Machine (SVM). The proposed system was divided into 3 main phases: preprocessing, feature vector creation and classification. The extracted features are based on Local Binary Pattern, Histogram, Co-occurrence Matrix and Canny Edge Detection methods. Once the feature vector had been constructed, classification was performed using Linear SVM. The Database of 10 popular Indian monuments was generated with 50 images for each class. The proposed system is implemented in MATLAB and achieves very high accuracy. The proposed system was also tested on other popular benchmark databases.
Abstrak - Berita adalah sebuah informasi mengenai peristiwa yang terjadi di suatu lokasi yang bisa disajikan dalam bentuk teks maupun visual. Berita bisa ditemukan di berbagai portal berita dan media cetak. Umumnya setiap berita dikelompokan berdasarkan kategori umum seperti ekonomi, politik, olahraga, dll. Permasalahan yang muncul adalah bagaimana cara untuk melakukan pengelompokan pada data berita yang biasanya berjumlah hingga ribuan karakter kedalam kategori yang lebih spesifik. Permasalah ini dapat diselesaikan dengan cara menerapkan text mining dengan memanfatakan algoritma klasifikasi untuk mendapatkan sebuah model fungsi yang merepresentasikan tiap kategori berita. Salah satu algoritma klasifikasi yang cukup tangguh untuk melakukan proses klasifikasi teks adalah Support Vector Machine. Penelitian ini menggunakan 510 data berita dengan batasan klasifikasi 3 kategori berita. Algoritma SVM mendapatkan hasil akurasi tertinggi di 88% untuk nilai parameter C =1, kernel Linear dengan pembagian data uji dan data latih sebesar 90% dan 10 %.Kata kunci : Berita, Klasifikasi, Support Vector Machine, Text Mining Abstract - News is information about events that occur in a location that can be presented in text or visual form. News can be found on various news portals and print media.Generally each news is grouped by general categories such as economics, politics, sports, etc. The problem is how to group news data into more specific categories.This problem can be solved by applying text mining using the classification algorithm to obtain a function model that represents each news category. One of the classification algorithms that is strong enough to do the text classification process is the Support Vector Machine. This study uses 510 news sample with a classification limit of 3 news categories. The SVM algorithm gets the highest accuracy at 88% for the parameter value C = 1, and Linear kernel with the distribution of test data and training data is 90% and 10%.Keywords : Classification, News, Support Vector Machine, ...
Modeling land-use change is a prerequisite to understanding the complexity of land-use-change patterns. This paper presents a novel method to model urban land-use change using support-vector machines (SVMs), a new generation of machine learning algorithms used in classification and regression domains. An SVM modeling framework has been developed to analyze land-use change in relation to various factors such as population, distance to roads and facilities, and surrounding land use. As land-use data are generally unbalanced, in the sense that the unchanged data overwhelm the changed data, traditional methods are incapable of classifying relatively minor land-use changes with high accuracy. To circumvent this problem, an unbalanced SVM has been adopted by enhancing the standard SVMs. A case study of Calgary land-use change demonstrates that the unbalanced SVMs can achieve high and reliable performance for land-use-change modeling.
Abstract—Coronavirus pandemic has faced humankind for over a year and it looks like it won't be ending anytime soon. Indonesia is one of the countries most affected by this pandemic with millions confirm cases hence the government paly rules to increase strict procedures of using face mask in public area. For this reason, the detection of people wearing a face mask in the public area is needed. Automatically face mask detection is a part of classification problem, thus Support Vector Machine (SVM) can be implemented. This study was aims to build an automatically face mask detector using multi kernel support vector machine. The proposed method was applied by combined various kernels into a one kernel equation. The result presented that the proposed method provided good performances proved by average of value of sensitivity was 83,67, specificity was 82,40%, precision was 82,00%, accuracy was 82,93%, and F-1 Score was 82,77%, better than any other experiment using single kernel SVM tried with same process and dataset.
PT Telekomunikasi Indonesia Tbk. (PT Telkom Tbk.) is a Badan UsahaMilik Negara (BUMN) engaged in telecommunications and network services in the territory of Indonesia. PT Telkom Tbk. is claimed to be the largest telecommunications company with 15 million costumer of telephone and 104 million costumer of cellular telephone. PT Telkom Tbk. is one of the BUMN's whose shares are currently owned by the Government of Indonesia (52,56%), and 47,44% owned by the Public, Bank of New York, and Domestic Investors. In the 2017, PT Telkom Tbk. experienced satellite interference that triggers stock price changes. Thus, forecasting is needed to help capital market players to determine the basis for strategic decision making that can give them an advantage. The forecasting method used is Support Vector Machine (SVM).SVM is one of many methods that can be used to solve various types ofproblems including forecasting. By using the Grid Search method, the best training data parameter optimization results are obtained to predict data testing. Based on the results of forecasting stock prices are in the range of Rp3152 up to Rp3615 for the period 01-15 August 2018. Based on the MAPE value the variables open, high, low and close obtained very good forecasting results with a value of <10%. Keywords: PT Telkom Tbk., SVM, Grid Search and MAPE
Abstract. This paper explores the use of the Support Vector Machine (SVM) as a data exploration tool and a predictive engine for spatio-temporal forecasting of snow avalanches. Based on the historical observations of avalanche activity, meteorological conditions and snowpack observations in the field, an SVM is used to build a data-driven spatio-temporal forecast for the local mountain region. It incorporates the outputs of simple physics-based and statistical approaches used to interpolate meteorological and snowpack-related data over a digital elevation model of the region. The interpretation of the produced forecast is discussed, and the quality of the model is validated using observations and avalanche bulletins of the recent years. The insight into the model behaviour is presented to highlight the interpretability of the model, its abilities to produce reliable forecasts for individual avalanche paths and sensitivity to input data. Estimates of prediction uncertainty are obtained with ensemble forecasting. The case study was carried out using data from the avalanche forecasting service in the Locaber region of Scotland, where avalanches are forecast on a daily basis during the winter months.
Corona Virus Disease or COVID 19 is a new virus disease that originated in 2019 [6], Indonesia has reported first COVID-19 In 2nd March 2020. Various attempts have been made by the government, such as taking strict measures by temporal lockdown or cordoning off the areas that were suspected of having risks of community spread. As a source of information, the internet has changed substantially,. for example, social media. social media is a communication tool that is very popular among internet users today, From social media, users can update status, send messages, even, become a platform for exchanging socio-economic opinions and political views both in their place of residence or their country. This paper deals with the sentiment analysis of Indonesian after the peformance of Indonesian Ministry Of Health. We used the social media platform Twitter for our analysis. Tweets were studied to gauge the opinion of Indonesian towards peformance of Indonesian Ministry Of Health. Tweets were extracted using the two prominent keywords used namely: "terawan "and "menkes" from June 15th to September 19th 2020. A total of 200 tweets were considered for the analysis. This study has successfully implemented the SVM algorithm for sentiment analysis on tweet data about peformance of Indonesian Ministry Of Health during COVID-19 Crisis. This is shown by the accuracy of using tweet data as much as 200 data, which is 172 data are training data and 28 are testing data. Besides the amount of data that affects accuracy, there are also other factors, namely the use of the kernel and the number of classes used. The results show that the Linear Kernel has the best accuracy, precision and recall rate compared to other kernels, respectively 75% for accuracy, 78.4% for precision and a recall value of 75%. for polynomial kernels, Gaussian and Sigmoid have the same accuracy, precision, and recall rates, namely, respectively. 60.71% for accuracy, 36.86% for precision and 60.71% recall value.
With the explosive growth of ocean data, it is of great significance to use ocean observation data to analyze ocean pycnocline data in military field. However, due to natural factors, most of the time the ocean hydrological data is not complete. In this case, predicting the ocean hydrological data by partial data has become a hot spot in marine science. In this paper, based on the traditional statistical analysis literature, we propose a machine-learning ocean hydrological data processing process under big data. At the same time, based on the traditional pycnocline gradient determination method, the open Argo data set is analyzed, and the local characteristics of pycnocline are verified from several aspects combined with the current research about pycnocline. Most importantly, in this paper, the combination of kernel function and support vector machine(SVM) is extended to nonlinear learning by using the idea of machine learning and convex optimization technology. Based on this, the known pycnocline training set is trained, and an accurate model is obtained to predict the pycnocline in unknown domains. In the specific steps, this paper combines the classification problem with the regression problem, and determines the proportion of training set and test formula set by polynomial regression. Subsequently, the feature scaling of the input data accelerated the gradient convergence, and a grid search algorithm with variable step size was proposed to determine the super parameter c and gamma of the SVM model. The prediction results not only used the confusion matrix to analyze the accuracy of GridSearch-SVM with variable step size, but also compared the traditional SVM and the similar algorithm. At the end of the experiment, two features which have the greatest influence on the Marine density thermocline are found out by the feature ranking algorithm based on learning.
PurposeThis paper aims to effectively explore the application effect of big data techniques based on an α-support vector machine-stochastic gradient descent (SVMSGD) algorithm in third-party logistics, obtain the valuable information hidden in the logistics big data and promote the logistics enterprises to make more reasonable planning schemes.Design/methodology/approachIn this paper, the forgetting factor is introduced without changing the algorithm's complexity and proposed an algorithm based on the forgetting factor called the α-SVMSGD algorithm. The algorithm selectively deletes or retains the historical data, which improves the adaptability of the classifier to the real-time new logistics data. The simulation results verify the application effect of the algorithm.FindingsWith the increase of training times, the test error percentages of gradient descent (GD) algorithm, gradient descent support (SGD) algorithm and the α-SVMSGD algorithm decrease gradually; in the process of logistics big data processing, the α-SVMSGD algorithm has the efficiency of SGD algorithm while ensuring that the GD direction approaches the optimal solution direction and can use a small amount of data to obtain more accurate results and enhance the convergence accuracy.Research limitations/implicationsThe threshold setting of the forgetting factor still needs to be improved. Setting thresholds for different data types in self-learning has become a research direction. The number of forgotten data can be effectively controlled through big data processing technology to improve data support for the normal operation of third-party logistics.Practical implicationsIt can effectively reduce the time-consuming of data mining, realize the rapid and accurate convergence of sample data without increasing the complexity of samples, improve the efficiency of logistics big data mining, reduce the redundancy of historical data, and has a certain reference value in promoting the development of logistics industry.Originality/valueThe classification algorithm proposed in this paper has feasibility and high convergence in third-party logistics big data mining. The α-SVMSGD algorithm proposed in this paper has a certain application value in real-time logistics data mining, but the design of the forgetting factor threshold needs to be improved. In the future, the authors will continue to study how to set different data type thresholds in self-learning.
This paper explores the linear and nonlinear forecastability of European football match scores using IX2 and Asian Handicap odds data from the English Premier league. To this end, we compare the performance of a Poisson count regression to that of a nonparametric Support Vector Machine (SVM) model. Our descriptive analysis of the odds and match outcomes indicate that these variables are strongly interrelated in a nonlinear fashion. An interesting finding is that the size of the Asian Handicap appears to be a significant predictor of both home and away team scores. The modeling results show that while the SVM is only marginally superior on the basis of statistical criteria, it manages to produce out-of-sample forecasts with much higher economic significance.