The Future of International Data Transfers: Managing New Legal Risk with a 'User-Held' Data Model
In: The Computer Law and Security Review (2022, Forthcoming)
931802 Ergebnisse
Sortierung:
In: The Computer Law and Security Review (2022, Forthcoming)
SSRN
SSRN
SSRN
In: Vanderbilt Journal of Entertainment & Technology Law, Forthcoming
SSRN
In: Harvard Journal of Law and Technology Digest [2021]
SSRN
Working paper
In: Journal of Contemporary Issues in Business and Government, Band 27, Heft 2
ISSN: 2204-1990
User attributes refer to a person's various demographic characteristics, like income, education, job, age, gender, socioeconomic status (SES), etc. User attributes play an important role in many research areas like sociology and education. Recently, companies have become more and more interested in user attributes because these attributes are also valuable to many emerging applications, such as personalized recommendation, customized marketing and precise advertisements. For example, previous works leverage the users' age, gender, occupation to improve the performance of personalized recommendation. The manual survey is the traditional way to collect user attributes, which is highly expensive and time-consuming. Many researchers try to infer user attributes based on various kinds of user-generated data, like people's tweets or cellphone records. Compared with the survey method, these proposed machine-learning-based user attribute inference (UAI) methods are much quicker and cheaper. However, there are still many open challenges: to introduce new kind of user-generated data source into attribute inference; to improve the accuracy for multiple attribute prediction based on limited data sources; to improve the performance of user-attribute-enhanced (UAE) tasks by UAI methods. For the first challenge, human mobility data based socioeconomic status (SES) inference is chosen as a case study of introducing new data source into UAI. The notion of SES of a person or family reflects the corresponding entity's social and economic rank in society. This attribute can help applications like bank loaning decisions and provide measurable inputs for related studies like social stratification, social welfare and business planning. Traditionally, estimating SES for a large population is performed by national statistical institutes through a large number of household interviews. Recently researchers begin to estimate individual-level SES from people's social media data. However, these methods cannot work if researchers cannot get people's cyberspace data. So we need to continue to introduce new data sources, especially some widely recorded real-world users' behavior such as human mobility. In this work, we leverage Smart Card Data (SCD) for public transport systems, which records the temporal and spatial mobility behavior of a large population of users. More specifically, we develop S2S, a deep learning-based method for estimating people's SES based on their SCD. Essentially, S2S models two types of SES-related features, namely the temporal-sequential feature and general statistical feature, and leverages deep learning for SES estimation. We evaluate our approach in an actual dataset, Shanghai subway SCD, which involves millions of users. The results show that the proposed method can use mobility data for SES inference and clearly outperforms several state-of-art methods in terms of various evaluation metrics. For the next challenge, home location-based multiple Socioeconomic Attributes (SEA) Inference is selected as an example problem of improving the accuracy of multiple attribute inference with the limited input information. Inferring people's socioeconomic attributes (SEAs) including income, occupation and education level is an important problem for applications like personalized recommendation and targeted advertising. Some methods have been proposed to estimate SEAs, if users have rich information like tweet contents through a long period. However, the accuracy of these methods may be affected if researchers can only get limited information of users (e.g., no or very few tweet content). Besides, limited by the budget and time, researchers may have to estimate as many as attributes with a limited data source. Multi-SEA-inference based on limited information is even harder. Here we choose home location as an example of limited data sources. The longitude and latitude of home location is often used as a supportive data source in UAI work. The accuracy of existing methods will be seriously affected if we only get users' home location. In this work, we try to predict a person's income level, family income level, occupation type and education level from his/her home location. We collect people's home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities of China. Then we design new basic features by enriching home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose a multi-task learning method with attention mechanism, which is called H2SEA. The factorization machine-based embedding component of H2SEA can also generates more kinds of new interacted features base on the input basic features. Extensive experiment results show that the proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as AUC, F-measure, and specificity. The first two works are focusing on improving the performance of UAI itself in different scenarios. In the final work, we expand the focus to improve UAE tasks with the help of UAI. There are two kinds of tasks relying on user attributes. For user-attribute-based (UAB) tasks, researchers cannot carry out these tasks without user attributes. For UAE, attributes are not necessary, but can be used to enhance their performance. From the first two challenges, we can see designing an accurate UAI method requires a lot of works including data mining and model design. UAE researchers usually would rather give up the benefits of UAI to lower the cost, especially if the missing rates of attributes are too high or there are many kinds of missing attributes. In this thesis, we take collaborative filtering (CF) recommender system as a case study of UAE tasks. CF recommendation methods mainly rely on user-item history interactions, which may suffer from the interaction sparsity problem. Therefore, some algorithms have been proposed to leverage user/item attributes (e.g, user location or item brand) to enhance the recommendation performance. However, in real-world datasets, user/item attributes are often missing for reasons like privacy concerns. CF recommender systems usually use unknown tags or zeros as simple substitutes of missing attributes instead of leveraging UAI. In the final work, we first conduct empirical experiments to quantify how the recommending performance can be affected if we just use simple substitutes for missing attributes. Then we discuss how to alleviate this negative impact caused by the missing attributes by UAI. Although recommending and UAI are usually separately studied, we argue they can be both seen as graph node representation learning tasks based on node interactions. We develop a novel multi-task Attribute-Enhanced Graph Convolutional Network (AEGCN) method, which enhances recommendation by auxiliary UAI tasks. The auxiliary attribute inference tasks can send estimated attribute information to the recommending task, improving the recommendation performance with incomplete attributes. More specifically, we define recommending and profiling in one user-item bipartite graph. The two kinds of tasks share one graph convolutional network (GCN) to learn the user/item-hidden representations. Then the user/item representations are used for profiling while their combination is used to predict users' preference on items. Extensive experimental results on three real-world datasets demonstrate that AEGCN is simple yet effective for missing attributes. Compared with attribute-enhanced CF models, AEGCN achieves comparable performance when the attributes are complete, and significant improvements when the missing rate increases. This thesis chooses mobility-based SES prediction, home-based SEA prediction and CF recommender system as case studies of three open challenges of UAI. The three challenges studied in this thesis belong to a general effort to expand UAI from one-attribute-prediction to multiattribute-prediction and finally multi-task framework, which includes both UAI and UAE tasks.
BASE
In: IASSIST quarterly: IQ, Band 12, Heft 2, S. 3
ISSN: 2331-4141
Educating the Data User
SSRN
In: A Petrocelli Book
In: IASSIST quarterly: IQ, Band 4, Heft 2, S. 29
ISSN: 2331-4141
User Services in a Data Library
In: Knight First Amendment Institute at Columbia University
SSRN
In this paper a rural electrification peak load demand forecast model was developed based on the readily available United Nation and World Bank data on the electric power consumption in KWh per capita, along with the population and land mass data of the rural community. Furthermore, in the situation where there is no available data on the land mass area, a web map applications can be used for computing the land mass area in Km2 for the project coverage area. In this paper, the rural community used as the case study was Orji town in Owerri North local government area in Imo state, Nigeria. The forecast results showed that the part of Orji community that was considered in the study had landmass area of 2.4760387Km2; a population of 2898 in 2015 and peak load demand of 81.14 KVA in the same year with 45% population having access to electricity. However, in 2025, the same part of Orji town will have a population of 3971.791 and peak load demand of 261.79 KVA with 75% population having access to electricity.
BASE
This paper studies the issue the information security for smartphone users in Russia. The report analyses the regulations the state uses to prevent undeclared functionality and malicious programs in mobile phones in Russia; the law enforcement practice in this area; the responsibility of legal entities, officials and persons for non-compliance with the requirements for standardization, ensuring information security and violation of declaration of conformity. The paper develops proposals to improve state regulation of undeclared functionality of mobile devices providing the collection of information, including confidential data. The report discusses specific ethical issues related to privacy, including matters relating to compensation for damage resulting from the leakage of personal information and develops proposals for legal ensuring the information security of mobile phone users. The report first outlines the main actors, terms and concepts used in the report. Second the standards for mobile phone developers, although there is no guarantee of complete information security. In this case, the peculiarity of Russia is that standards used in the field of information security are voluntary. Third how law enforcement agencies protect the user community. In this case there is a potential danger that this may entail uncontrolled access of government agencies to confidential data.
BASE
In: International Free and Open Source Software Law Review; Vol 6, No 1 (2014); 51-60
This article gives a succinct introduction to the work of the Open Data User Group and the impact of demand driven open data on the UK economy, society and political landscape. It highlights the global importance of Transparency to societies and how the UK is at the forefront of this growing movement. It ends with a look into the future and how Open Data could impact the lives of citizens and the activities of businesses.
BASE