I first became interested in Biostatistics when the SARS epidemic broke out

Table of Contents

I first became interested in Biostatistics when the SARS epidemic broke out in China in 2003. During this period, the society was in a huge panic, people wore masks, used vinegar and medicine to protect themselves from this disease. Everyone would flinch when someone coughed or sneezed. As a primary school student, the only thing I could do was to write encouraging letters to doctors and nurses working in the hospital. It was a very scary time of my life. This experience gave me the determination to fully devote myself to create interdisciplinary and innovative solutions to disease, and to disseminate and teach science effectively.My undergraduate education was at Southern Medical University where I majored in Bioinformatics from September 2012 to June 2016. Although my general GPA was not very high, lots of my professional courses were ranked top two. In my last two years as an undergraduate, I was enrolled in such the courses as Bioinformatics, Genetics, Research Training in Bioinformatics, Statistical Methods in Bioinformatics, Programming Language in Bioinformatics, etc., which has systematically acquainted me with theoretical knowledge about biomedicine and information science as well as basic methods of Biostatistics and computational biology. Based on my experience, I was amazed at the seemingly inexhaustible data, and yearned to unveil information and significance behind the data.After my undergraduate degree, I worked as a bioinformatics engineer at Beijing Novogene Technology Corporation. The main responsibility of our department was to provide a personalized analysis of Next Generation Sequencing (NGS) data for professors and researchers from universities around the world. NGS technologies generate large amounts of sequencing data in a relatively short amount of time that helps to enable a wide range of genetic analysis applications and accelerate advances in research, clinical and applied markets. In my job, I assisted many professors and researchers in completing lots of personalized NGS analysis, like virus integration detection, Patient-Derived Xenografts model analysis, and comparison of structural variation calling tools. These gave me in-depth knowledge of principles and procedures of the sequencing statistical analysis, which awakened me to the importance and charm of independent learning in research projects. I also made great progress in computer programming, including R, Python, HTML, and Linux System. In addition to my work, I was selected as one of the instructors of the NGS analysis workshop for researchers and professors in Macao, China. Bolstered by these experiences, I am fully confident about the future exploration in the analysis of sequencing data. However, I gradually realized that NGS analysis is just a technology, and what is really useful is the statistical thinking behind this technology. Thus, in order to become a user of big data, not just a follower, I came to Georgetown University to continue my study in Biostatistics and obtained my master degree in December of 2018.During the period of being a graduate student, I have taken several courses, including Probability and sampling, Statistical Inference, Linear Model, Categorical Data Analysis, Survival Analysis, Machine Learning for Bioinform and Data Science. Through my studies, I was deeply fascinated by statistics. Statistics significantly influenced the biological research with its strict theory. It is the art of making informed decisions out of limited and uncertain samples. Also, I believe I have gained a better understanding on the combination of mathematical statistics, applied statistics and statistical computing to analyze problems in various biological fields. Due to my outstanding grades, I became a student research assistant in the second semester and the third semester. This academic experience has meant the most to me in my educational career. My professor and I have developed a novel method to obtain an estimate of the functional structure of the dose-response combination by using the dose-response data for single drugs and pathway knowledge. Such an integrated network prediction with experimental validation is a new trend in the field of combinatorial prediction. Therefore, it is highly innovative in bringing forth a framework for selecting drug interactions based on the multiple biological pathways and single dose experimental data. For me, this is a genuine scientific exploration, during which I made lots of attempts and failed for multiples times, but eventually surmounted all the difficulties and achieved success. Through this project, I have learned and applied many mathematical and statistical methods in depth, including the Functional ANOVA Model, the Quasi-Monte Carlo Method, the Expectation Conditional Maximization (ECM) Model and the Structural Equation Model. Also, we will further validate the methods in cell lines of triple negative breast cancer (TNBC) and expect to find an effective drug combination.Furthermore, for the purpose of better development and application of the academic knowledge and software skills in Biostatistics, I have participated in other biostatistics projects. In the project Landscape of Genome-Wide Transcriptional Regulation of Repetitive Elements in tumorigenesis, I designed and applied bioinformatics algorithms for integrative analysis of repetitive elements and expression transcript using the high-throughput sequencing data. In the project Breast Cancer Therapy and Circulating Tumor Cells: Survival Study, I compared progression-free survival and overall survival among groups through using the four tests: log-rank test, Wilcoxon test, Tarone-Ware test and Peto-Prentice test. In the project An Analysis of Associations of Smoking and Hormone Levels, I examined the effect of tobacco smoke exposure for each hormone using pairwise tests and generalized linear model. Through these projects, not only have I acquired theoretical knowledge in respect of mathematical statistics, applied statistics and computing statistics, but also developed my statistical analysis ability and statistical tools application. Also, I have laid a solid foundation in computer programming and statistical packages, such as SAS, R, MATLAB and LaTeX.After an in-depth evaluation of my personal characteristics, educational background, research and practical experience, I have confirmed that Biostatistics is the priority of my interest. I believe that my work and academic experience in Bioinformatics and Biostatistics have prepared the ground for my future study. Pursuing a future study in Biostatistics at UCSD Division of Biostatistics and Bioinformatics is my short-term goal. Indeed, I believe that the reputation of UCSD coupled with the top-notch faculty and academic excellence in the Biostatistics program will prepare me well to stand out in future competition. As for the long-term goal, I would like to be a biostatistician at top universities and devote all my life to create interdisciplinary and innovative solutions to the complex epidemic and genetic disease. I know only too well that I may not find out all the secrets, but still I hope to look into the essence of life through constant research and exploration. To achieve my academic pursuit and fulfill my career goal, I wish to be considered for the PhD program in Biostatistics at UCSD Division of Biostatistics and Bioinformatics. I would like to delve into such fields as semi-parametric and nonparametric statistics, Bayesian statistics, machine learning, bioinformatics and computational biology. The well-constructed courses like Mathematical Statistics, Biostatistical Methods, Advanced Multivariate Methods, Bayesian methods, Time Series Analysis and Data mining and predictive analytics, as well as the Biostatistics Rotations will boost my in-depth and hands-on knowledge of statistics and analysis. Therefore, my well-considered decision to choose your program will certainly help me realize my dreams. I am sure that I can make UCSD proud of me as a Biostatistician someday.