Generating Data | Pollux - Fachinformationsdienst Politikwissenschaft

4212 Ergebnisse

Sortierung:

Aufsatz

Aufsatz(elektronisch)#4610. September 2024

Generating synthetic identifiers to support development and evaluation of data linkage methods

In: International journal of population data science: (IJPDS), Band 9, Heft 5

Lam, Joseph; Boyd, Andy; Linacre, Robin; Blackburn, Ruth; Harron, Katie

Lam, Joseph; Boyd, Andy; Linacre, Robin; Blackburn, Ruth; Harron, Katie

ISSN: 2399-4908

We aimed to develop a framework for generating synthetic identifier datasets to support development and evaluation of data linkage methods. We evaluated whether replicating associations between attributes and identifiers improved the utility of the synthetic data for assessing linkage error.
We determined the steps required to generate synthetic identifiers that replicate the properties of real-world data collection. We generated synthetic versions of a large United Kingdom cohort study (the Avon Longitudinal Study of Parents and Children), according to the quality and completeness of identifiers recorded over several waves of the cohort. We evaluated the utility of the synthetic identifier data in terms of assessing linkage quality (false matches and missed matches).
Comparing data from two collection points in ALSPAC, we found within-person disagreement in identifiers (differences in recording due to both natural change and non-valid entries) in 18% of surnames and 12% of forenames. Rates of disagreement varied by maternal age and ethnic group. Synthetic data provided accurate estimates of linkage quality metrics compared with the original data (within 0.13-0.55% for missed matches and 0.00-0.04% for false matches). Incorporating associations between identifier errors and maternal age/ethnicity improved synthetic data utility.
Replicating dependencies between attribute values (e.g. ethnicity), values of identifiers (e.g. name), identifier disagreements (e.g. missing values, errors or changes over time), and their patterns and distribution structure enables generation of realistic synthetic data that can be used for robust evaluation of linkage methods.
Our framework provides a novel and generalisable mechanism for developing and benchmarking record linkage algorithms.

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#471. Juli 2024

Generating synthetic identifiers to support development and evaluation of data linkage methods

In: International journal of population data science: (IJPDS), Band 9, Heft 1

Lam, Joseph; Boyd, Andy; Linacre, Robin; Blackburn, Ruth; Harron, Katie

Lam, Joseph; Boyd, Andy; Linacre, Robin; Blackburn, Ruth; Harron, Katie

ISSN: 2399-4908

IntroductionCareful development and evaluation of data linkage methods is limited by researcher access to personal identifiers. One solution is to generate synthetic identifiers, which do not pose equivalent privacy concerns, but can form a 'gold-standard' linkage algorithm training dataset. Such data could help inform choices about appropriate linkage strategies in different settings.
ObjectivesWe aimed to develop and demonstrate a framework for generating synthetic identifier datasets to support development and evaluation of data linkage methods. We evaluated whether replicating associations between attributes and identifiers improved the utility of the synthetic data for assessing linkage error.
MethodsWe determined the steps required to generate synthetic identifiers that replicate the properties of real-world data collection. We then generated synthetic versions of a large UK cohort study (the Avon Longitudinal Study of Parents and Children; ALSPAC), according to the quality and completeness of identifiers recorded over several waves of the cohort. We evaluated the utility of the synthetic identifier data in terms of assessing linkage quality (false matches and missed matches).
ResultsComparing data from two collection points in ALSPAC, we found within-person disagreement in identifiers (differences in recording due to both natural change and non-valid entries) in 18% of surnames and 12% of forenames. Rates of disagreement varied by maternal age and ethnic group. Synthetic data provided accurate estimates of linkage quality metrics compared with the original data (within 0.13-0.55% for missed matches and 0.00-0.04% for false matches). Incorporating associations between identifier errors and maternal age/ethnicity improved synthetic data utility.
ConclusionsWe show that replicating dependencies between attribute values (e.g. ethnicity), values of identifiers (e.g. name), identifier disagreements (e.g. missing values, errors or changes over time), and their patterns and distribution structure enables generation of realistic synthetic data that can be used for robust evaluation of linkage methods.

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Open Access

Open Access#481997

Generating and Demodulating

The views expressed in this report are those of the author and do not reflect the official policy or position of the Department of Defense or the United States Government. ; This paper discusses a technique for modulating and demodulating M-ary FSK using an FFT based modem typical of Coded Orthogonal Frequency Division Modulation (COFDM) systems. COFDM is one of the more promising spectrally efficient, high data rate modulation techniques for line-of-sight communications between mobile platforms. This paper will show that legacy FSK radios like the AN/MRC-142 (binary FSKmodulation at 144, 288, 576 kbps) used by the U.S. Marine Corps could be easily implemented in a COFDM modem orginally designed for higher data rates and better spectral efficiency than the legacy radio. In addition, the performance of such an implementation is analyzed in detail and shown to result in negligible performance degradation (0.05 dB or less). Digital processing speed requirements are analyzed and shown to be similar to digital implementation of conventional FSK receivers. MATLAB code is included that simulates the modem. ; This report was sponsored by NCCOSC RDTE Division. ; Approved for public release; distribution is unlimited.

BASE

Aufsatz

Aufsatz(elektronisch)#492022

Application of Generative Adversarial Networks (GANs) for Generating Synthetic Data and in Cybersecurity

Chenna, Sankalp

Chenna, Sankalp

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Aufsatz

Aufsatz(elektronisch)#502022

Using Big Data for Generating Firm-Level Innovation Indicators – A Literature Review

In: ZEW - Centre for European Economic Research Discussion Paper No. 22-007

Rammer, Christian; Es-Sadki, Nordine

Rammer, Christian; Es-Sadki, Nordine

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Aufsatz

Aufsatz(elektronisch)#511. Dezember 2004

Repeat receipts: a device for generating visible data in market research focus groups

In: Qualitative research, Band 4, Heft 3, S. 285-309

Puchta, Claudia; Potter, Jonathan; Wolff, Stephan

Puchta, Claudia; Potter, Jonathan; Wolff, Stephan

ISSN: 1741-3109

Market research focus groups generate three types of data: first, representatives of commissioning companies or organizations watch the group from behind a one-way mirror; second, they receive a video of the group discussion; third, they are given a report of the focus group. This article analyses how the required data are interactionally produced to be visible for the people behind the one-way screen, for the video and for the report. It describes the phenomenon of repeat receipts as a central device for producing visible data. Repeat receipts are sequences where the moderator repeats participants'contributions, typically with intonational cues that mark completion. Repeat receipts have several functions. They can (a) highlight central market-research relevant terms from participants'responses; (b) strip off rhetorical relations by repeating utterances in a decontextualized manner; (c) summarize contributions in repeating contributions of different authors as if of one voice; (d) cover conflict in repeating potentially contradictory contributions as discrete statements; (e) socialize responding by providing templates for the required contributions. Repeat receipts help shape the focus group interaction to generate visible data for the overhearing audience, the video and the report. The article ends with a comparison of repeats in market research focus groups, standardized surveys and news interviews.

Zugriff(via Standort)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#521. April 2015

Generating Investment Strategies Using Multiobjective Genetic Programming And Internet Term Popularity Data

In: Analele ştiinţifice ale Univerşităţii Alexandru Ioan Cuza din Iaşi: Annals of the "Alexandru Ioan Cuza" University of Iasi. Ştiinţe economice = Economic Sciences Section, Band 62, Heft 1, S. 55-62

Jakubéci, Martin

Jakubéci, Martin

ISSN: 2068-8717

Abstract
Searching for stock picking strategies can be modelled as a multiobjective optimization problem. The objectives are mostly the profit and risk. Because of the conflicting nature of these objectives, we have to find Pareto optimal solutions. Multiobjective genetic programming (MOGP) can be used to find tree based solutions, using evolutionary operators. The advantage is that this algorithm can combine any number of inputs and generate complex models. Recent research shows, that the popularity of different terms on the internet can be used to enhance the models. This paper deals with a SPEA2 MOGP implementation, which uses Google trends and Wikipedia popularity to find stock investment strategies.

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#53März 2020

Generating invariants using design and data-centric approaches for distributed attack detection

In: International journal of critical infrastructure protection: IJCIP, Band 28, S. 100341

Umer, Muhammad Azmi; Mathur, Aditya; Junejo, Khurum Nazir; Adepu, Sridhar

Umer, Muhammad Azmi; Mathur, Aditya; Junejo, Khurum Nazir; Adepu, Sridhar

ISSN: 1874-5482

Zugriff(via Standort)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#5427. November 2023

Generating synthetic data from administrative health records for drug safety and effectiveness studies

In: International journal of population data science: (IJPDS), Band 8, Heft 1

Ayilara, Olawale F.; Platt, Robert W.; Dahl, Matt; Coulombe, Janie; Gonzalez Ginestet, Pablo; Chateau, Dan; Lix, Lisa M.

Ayilara, Olawale F.; Platt, Robert W.; Dahl, Matt; Coulombe, Janie; Gonzalez Ginestet, Pablo; Chateau, Dan; Lix, Lisa M.

ISSN: 2399-4908

IntroductionAdministrative health records (AHRs) are used to conduct population-based post-market drug safety and comparative effectiveness studies to inform healthcare decision making. However, the cost of data extraction, and the challenges associated with privacy and securing approvals can make it challenging for researchers to conduct methodological research in a timely manner using real data. Generating synthetic AHRs that reasonably represent the real-world data are beneficial for developing analytic methods and training analysts to rapidly implement study protocols. We generated synthetic AHRs using two methods and compared these synthetic AHRs to real-world AHRs. We described the challenges associated with using synthetic AHRs for real-world study.
MethodsThe real-world AHRs comprised prescription drug records for individuals with healthcare insurance coverage in the Population Research Data Repository (PRDR) from Manitoba, Canada for the 10-year period from 2008 to 2017. Synthetic data were generated using the Observational Medical Dataset Simulator II (OSIM2) and a modification (ModOSIM). Synthetic and real-world data were described using frequencies and percentages. Agreement of prescription drug use measures in PRDR, OSIM2 and ModOSIM was estimated with the concordance coefficient.
ResultsThe PRDR cohort included 169,586,633 drug records and 1,395 drug types for 1,604,734 individuals. Synthetic data for 1,000,000 individuals were generated using OSIM2 and ModOSIM. Sex and age group distributions were similar in the real-world and synthetic AHRs. However, there were significant differences in the number of drug records and number of unique drugs per person for OSIM2 and ModOSIM when compared with PRDR. For the average number of days of drug use, concordance with the PRDR was 16% (95% confidence interval [CI]: 12%-19%) for OSIM2 and 88% (95% CI: 87%-90%) for ModOSIM.
ConclusionsModOSIM data were more similar to PRDR than OSIM2 data on many measures. Synthetic AHRs consistent with those found in real-world settings can be generated using ModOSIM. Synthetic data will benefit rapid implementation of methodological studies and data analyst training.

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#5515. Juli 2021

Estimating latent traits from expert surveys: an analysis of sensitivity to data-generating process

In: Political science research and methods: PSRM, Band 11, Heft 2, S. 384-393

Marquardt, Kyle L.; Pemstein, Daniel

Marquardt, Kyle L.; Pemstein, Daniel

ISSN: 2049-8489

AbstractModels for converting expert-coded data to estimates of latent concepts assume different data-generating processes (DGPs). In this paper, we simulate ecologically valid data according to different assumptions, and examine the degree to which common methods for aggregating expert-coded data (1) recover true values and (2) construct appropriate coverage intervals. We find that the mean and both hierarchical Aldrich–McKelvey (A–M) scaling and hierarchical item-response theory (IRT) models perform similarly when expert error is low; the hierarchical latent variable models (A-M and IRT) outperform the mean when expert error is high. Hierarchical A–M and IRT models generally perform similarly, although IRT models are often more likely to include true values within their coverage intervals. The median and non-hierarchical latent variable models perform poorly under most assumed DGPs.

Zugriff(via Standort)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#562020

Generating reliable tourist accommodation statistics: Bootstrapping regression model for overdispersed long-tailed data

In: Journal of Tourism, Heritage & Services Marketing, Band 6, Heft 2, S. 30-37

Van Truong, Nguyen; Shimizu, Tetsuo; Kurihara, Takeshi; Choi, Sunkyung; Truong

Van Truong, Nguyen; Shimizu, Tetsuo; Kurihara, Takeshi; Choi, Sunkyung; Truong

Purpose: Few studies have applied count data analysis to tourist accommodation data. This study was undertaken to investigate the characteristics and to seek for the most fitting models for population total estimation in relation to tourist accommodation data.
Methods: Based on the data of 10,503 hotels, obtained from by a nationwide Japanese survey, the bootstrap resampling method was applied for re-randomisation of the data. Training and test sets were derived by randomly splitting each of the bootstrap samples. Six count models were fitted to the training set and validated with the test set. Bootstrap distributions for parameters of significance were used for model evaluation.
Results: The outcome variable (number of guests), was found to be heterogenous, over dispersed and long-tailed, with excessive zero counts. The hurdle negative binomial and zero-inflated negative binomial models outperformed the other models. The accuracy (se) of the estimation of total guests with training sets that ranged from 5% to 85%, was from 3.7 to 0.4 respectively. Results appear little overestimated.
Implications: Findings indicated that the integration of the bootstrap resampling method and count regression provide a statistical tool for generating reliable tourist accommodation statistics. The use of bootstrap would help to detect and correct the bias of the estimation.

init.form.title.accessOptions

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#572020

Generating Reliable Tourist Accommodation Statistics: Bootstrapping Regression Model for Overdispersed Long-Tailed Data

In: Journal of Tourism, Heritage & Services Marketing (JTHSM), 2020, Vol. 6, No. 2, pp. 30-37, DOI: 10.5281/zenodo.3837608

Van Truong, Nguyen; Shimizu, Tetsuo; Kurihara, Takeshi; Choi, Sunkyung

Van Truong, Nguyen; Shimizu, Tetsuo; Kurihara, Takeshi; Choi, Sunkyung

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Aufsatz

Aufsatz(elektronisch)#581. November 2024

Extracting Data from Medical Records for Monitoring Diseases and Generating Medical Alerts

In: Romanian Journal of Military Medicine, Band 127, Heft 6, S. 448-454

Academy of Economic Studies, Bucharest; Vîrgolici, Oana; Bologa, Ana R.; Costache, Raluca S.; Costache, Andrei C.; Vîrgolici, Bogdana; Academy of Economic Studies, Bucharest; Carol Davila University of Medicine and Pharmacy, Bucharest Academy of Romanian Scientists, Bucharest

Academy of Economic Studies, Bucharest; Vîrgolici, Oana; Bologa, Ana R.; Costache, Raluca S.; Costache, Andrei C.; Vîrgolici, Bogdana; Academy of Economic Studies, Bucharest; Carol Davila University of Medicine and Pharmacy, Bucharest Academy of Romanian Scientists, Bucharest; Carol Davila University of Medicine and Pharmacy, Bucharest; Carol Davila University of Medicine and Pharmacy, Bucharest

ISSN: 2501-2312

Background: Automated data processing is creating and implementing technology that automatically processes data. This computer tool is recommended for doctors because it supports their everyday work, assists in medical diagnosis, and enhances patient care. The aim of this paper is to propose an informatic tool that can extract the values of some parameters of interest from blood test sheets in order to get medical alerts and to monitor the chronic disease. Methods: An application, written in Python, was developed Results: The computer tool extracted automatically the values of glucose, triglycerides, HDL-cholesterol, total cholesterol, and LDL-cholesterol from medical sheets (text-based file or graphic file, respectively), and saved them in a database, accessed and represented in a graphic form the most recent values of these parameters; alerts according to metabolic syndrome criteria and Framingham risk score were generated. Conclusions: This tool contributes to the management of the medical process, saving precious time and helping the doctor in detecting current or future health problems.

Zugriff(via Standort)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#5929. Juli 2016

A note on generating correlated matched-pair binary data through conditional linear family

In: Communications in statistics. Theory and methods, Band 46, Heft 16, S. 8059-8068

Zhou, Ming; Yang, Zhao

Zhou, Ming; Yang, Zhao

ISSN: 1532-415X

Zugriff(via Standort)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Aufsatz

Aufsatz(elektronisch)#601. Juni 2009

Income-Generating Efficiency of Public Sector Banks in India: An Application of Data Envelopment Analysis

In: Artha Vijnana: Journal of The Gokhale Institute of Politics and Economics, Band 51, Heft 2, S. 103

Kumar, Sunil; Gulati, Rachita

Kumar, Sunil; Gulati, Rachita

Zugriff(via Standort)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt