big data Assignment 2 (1)

Table of Contents

centertopbig data AND ANALYTICSASSIGNMENT REPORT 76500big data AND ANALYTICSASSIGNMENT REPORT 2200016630653000032073857060056600centerbottomPardeep (30375852)Shubham Gupta (30366431)Malkeet Singh (30375831)765000Pardeep (30375852)Shubham Gupta (30366431)Malkeet Singh (30375831)Table of Content Actions Page numberTask 1 2Task 2 4Task 3 22Task 4 (combination of task 2 and 3 which were well explained )4-24Task 5 25Task 6 SUMMARY 27Task 7 Reflections 28References 29Task 1 A dataset may link to data in other datasets In this task we have created a data set from the given set and we have given the name of the variable ,datatype of the variable and the length of the variable ,we need to find the classification and the category of the particular data stored from the data source then we fetch out the data from the readmitted historical library to make our dataset . The aim of doing this is that if we need any data from past record we can easily fetch out the record of everything what we want at any time without wasting much time and that thing will also help in doing some surveys for the organisation by which they can improve the mistakes or problems faced in the past or can do better work . The benefits of visualizations is that this can bring changes very slowly and takes much time but changes made to this are very good and usable in future .there are so many visualisation techniques which can make be done by different charts such as line charts ,bar chart if we use these visualization that will make the data interactive and easy to understand by everyone And the main benefits of using this are :The processing of these re faster than the other techniques as they take less time for make thse visualizationWith the help of these visualization these are easy to communicate with the historical data and it is much easier to access that when it is neededThese types of visualization make connections between the operations performed by the users or these visualization and the results which was very helpful for the businesses This will boost up the trend of using this as they are easy to use if one knows how to use and they are safe as compared to other techniques available in the marketIt is easy to interact with the data provide to us by the user or if we want to access some information about anyone having in this type of data we can easily and speedily access the data of that person or individual that will help the business to increase their sale Visualization provides us the data in the well organised form by which one can easily understand and access the data As this machine learning is good way but not safe we recommended to use these visualization so the machine learning is limitless as these are not safe Task 2 & 4-257810-508635Q1. Data set of table :data dictionary for the data source group.0Q1. Data set of table :data dictionary for the data source group.Name Datatype Length Classification DescriptionENCOUNTER_KEY Numeric 8 Measure Encounter IDPATIENT_NUMBER Numeric 8 Measure Patient IDGENDER Character 1 Category GenderRACE_CD Character 6 Category RacePATIENT_AGE Numeric 8 Measure AgeDIAGNOSIS_GROUP Character 4 Category Diagnosis GroupICD9_TARGET Numeric 8 Measure International Classification of Diseases CodeMS_DRG_CODE Numeric 8 Measure Diagnosis Related Group CodeMS_DRG_DESC Character 68 Category Diagnosis Related Group DescriptionDRG_APR_CODE Character 5 Category All Patient Refined Diagnosis Related Group CodeDRG_APR_DESC Character 70 Category All Patient Refined Diagnosis Related Group DescriptionDRG_APR_SEVERITY Character 1 Category All Patient Refined Diagnosis Related Group SeverityDIAGNOSIS_SUBCAT_CODE Numeric 8 Measure Diagnosis Subcategory CodeDIAGNOSIS_SUBCAT_DESC Character 24 Category Diagnosis Subcategory DescriptionDIAGNOSIS_ICD_CODE Numeric 8 Measure Diagnosis International Classification of Disease CodeDIAGNOSIS_LONG_DESC Character 148 Category Diagnosis Long DescriptionPROCEDURE_SUBCAT_CODE Character 2 Category Procedure Subcategory CodePROCEDURE_SUBCAT_DESC Character 24 Category Procedure Subcategory DescriptionPROCEDURE_ICD_CODE Character 5 Category Procedure International Classification of Disease CodePROCEDURE_LONG_DESC Character 81 Category Procedure Long DescriptionDX_CODE Numeric 8 Measure Diagnosis CodeDX_GROUP Character 69 Category Diagnosis GroupDOCTOR Character 8 Measure DoctorADMIT_DATE Date 8 Category Admission DateDISCHARGE_DATE Date 8 Category Discharge DateREADMIT_DATE Date 8 Category Readmission DateREADMIT_DISCHARGE_DATE Date 8 Category Readmission Discharge DateREADMIT_LOS Numeric 8 Measure Readmission Length of StayADMIT_LOS Numeric 8 Measure Original Admission Length of StayICU_DAYS Numeric 8 Measure Number of Days in the Intensive Care UnitDEPARTMENT Character 15 Category Hospital DepartmentDISCHARGED_TO Character 24 Category Place of DischargeNUM_VISITS Numeric 8 Measure Number of VisitsSTANDARD_ORDERS_USED Character 1 Category Whether Standard Orders Were UsedNUM_CHRONIC_COND Numeric 8 Measure Number of Chronic ConditionsDISCH_NURSE_ID Character 8 Measure Discharge Nurse IDADMIT_MTH Numeric 8 Measure Admission MonthREADMIT_MTH Numeric 8 Readmisison MonthORDER_SET_USED Numeric 8 Measure Number of Order Sets UsedORDER_TOTAL_CHARGES Currency 8 Measure Order Total Charges in DollarsREADMITTED Numeric 8 Measure Whether the Patient was ReadmittedOPERATION_COUNT Numeric 8 Measure Number of OperationsHOSPITAL Character 7 Category HospitalZIP Numeric 8 Measure ZipcodeSTATECODE Character 2 Category StateCITY Character 22 Category CityCOUNTY_NAME Character 15 Category CountyX Numeric 8 Measure LongitudeY Numeric 8 Measure LatitudeREGION Character 9 Category RegionIn this we create a data dictionary from the given data and from the data library we have given name of variable ,type of variable ,length of variable and classification of data. We make a data dictionary by combing that dataset with the historical dataset to make a better dictionary for better understanding of the results Q 2. Analysis for the average number of ICU days as respect to gropd and gender?In this we have created a dashboard showing diagnosed group with respect to genders male and female both and showing the average number of days for both man and women respectively spending the average time in the ICU as shown in the above chart specially in the crosstab As it is clearly seen that both have almost same values so both are spending almost same number of days in the ICU (i.e about 3 days )Q3. Least and most common diagnosis group for every regions?The least and ,most common groups according to each regions are shown in the above cross tab .Q4. Leaat and most popular diagnosis group for every region?For each diagnosis group heart failure is the most common disease because the no. of patients on heart failure disease is the highest and the least common disease Bronchiectasis and the Bronchopneumonia as they have least number of patients Q5. The top five departments of hospital having number of patients ?The top five departments are Heart , General Med , Hosp 46 , Oncology , Transplant as shown in the above Dashboard with different visualization Q 6 Top three regions having maximum number of female patients ?The top three regions with respect to female patient numbers after filter out are region 3, region 8 and region 11 from which region 11 is at the top followed by region 8 and then by region 3 Q 7 places from where the maximum number of patients are discharging(top 5) ?The top 5 places for patients discharged are Routine Discharge , Home health academy , skilled nursing, other death, hospice( home ) as clearly shown in above charts Q 8regions having black race at top 3 ?Top 3 regions with respect to black race are region 11 , region 3 and region 8 having region 11 at the top position and region 8 and region 3 ate second and third respectivelyQ 5 hospitals having asthama patients ?The top 5 hospitals with having asthma patients are Hosp 13 ,Hosp 18, Hosp 28, Hosp 35 , Hosp 8 as clearly shown in above figures or chartsQ 10. Months in which maximum number of patients are admitted to hospitals ?November to march are the most active month for the visiting patients and July to November is the most inactive month for the patients going to hospitals as shown in the bar graph.Q 11 regions in which patients are spending maximum days in hospitals top 3 ? Top three regions spending the most number of days in the hospital are region 9 , region 4 , region 7 as shown in the above dashboard Q12. Top 10 cities having maximum number of patients ?The top 10 cities with maximum number of patients are Delray Beach ,Miami ,Hobe Sound , Fort Lauderdale, Lake Worth , Orlando ,Zellwood and Miami Beach Q 13 show the trend of patient who were admitted between oct 2011 to June 2012 with regions according to gender?In this question we are analysis the trend of male and female patients admitted in October 2011 to 2012 which is shown by automatic chart and tabular visualization. Region 8 and 9 having the maximum up and down as I can see from the visualization.Q 14 . most and the least popular month in Q9 at a time .In this task the asthma patients with respect to the most and least popular month. Female graph is around 1100 above and male is between 800.Q 15. Show the trend for patients who were diagnosed by CHF betwwn January 2102 to June 2012?In this task the CHF diagnosed between the Jan 2012 to June 2012. patients admitted in March is max and overall the graph is having the fluctuation.Q 16. Show the trend for all the diagnosed group over the year by showing it monthly ?As the diagnosed group are AMI, CHF and COPD. the CHF having the maximum value and COPD having the least value.Q 17 display top 5 departments who did max number of operations in different monthThe top 5 departments in the terms of number of operations general medicine , heart, hosp40 , oncology and transplant changing according to the months Q 18 diplay the best prediator for the heart disease ?The most appropriate prediators of the heart disease having id 0 and count value 192 Q19 . display all the hospitals in our database on the map In the above geomap dots are representing the hospitals having the patients in the proper cities this is shown by the geomao because locations can be shown by only maps Q20 make a record of all the patients in the form of cluster In the cluster analysis the data related to the patients has been put from the data visualization. Cluster id are forming automatically, as we are students can’t understand the cluster as it requires very high knowledge for it.Task 3 & 4 Q 1 top 3 department with top 3 cities ?The above dashboard is showing top 3 three departments in cities by having different visualizationQ 2 ratio of top 5 cities with gender ?Gender ratio in different cities Q 3 top 5 hospitals having maximum number os doctors ?Hospitals with the number of doctors working in top 5 Q 4 top 3 hospitals having diagnosis group?Diagnosis is performed well in these three hospitals Task 5The cluster analysis is done with respect to the patients information we have and the cluster id reflects the information about the patients with respect to the age. Cluster is showing in the different colours, so the understanding is easy for the data analytics. Cluster matrix shows the circle with different colours. As we are student, we can’t understand the cluster analysis with are knowledge . All the hospitals have almost same number of patients so it is not possible to show clearly the difference between all the hospitals which is easily shown in the bar chart that’s why we need to use line chart or cross tab to show the differences by which one can easily identifyTask 6 Summary of report Record of analysis of disease can be easily fetch out from the historical data with the help of SAS tool which help in getting more detail about the work We can analyse the ratio of gender with these analysis even in different cities hospitals and even in the country All the best thing provided in the database or the big data can be easily access with the help of these analysis while doing visualisation in SAS tool we can predict the disease with the help of thse tools and the best doctor or hospitals will be suggested to the patient so that can be easily cure by the professionals and patient can live peacefully After doing all the analysis in this given historical data we can say that female are more suffered from different diseases as compared to men Even in the black race disease female are more suffered by this Geomap shows the information related to their fields with patients number in different cities.In the last we recommend that organisation should do more hard work to recover the situations or areas where more people are suffering from diseases Task 7The Reflection In this assignment we were given the readmit-historical of the US health and human services and we need to analyse the data to make it easy to understand .With the help of SAS analysis tool given on the Teradata university network for which we need to access on the site or platform to analyse the data after doing analysis the big data which is difficult to read or understand that became easily understandable by the new employeesWe are have different visualizations from SAS analytics because it is known that people can understand it and memorize just by having a look as people can easily understand data by seeing images so we use different types of charts ,tables, crosstabs, geomaps, clusters ,etc., just to analyse the visualization easily The whole work we done very smoothly under the guidance and with the help of our tutor who help at every point where we stuck and don’t know what to do .We are having three members in our team Work done by every team member as followsPardeep makes the reports and helps in doing all the questions and gives additional question number 1 and 2 by self Shubham makes the presentation and well explained by all the team members and also help in solving the questions Malkeet helps in presenting the presentation smoothly and help in solving some questions While doing the work we stuck sometimes but with the help of team, we completed our task.We use the Teradata university network for the data visualization. Thanks to our teacher and university for giving that opportunity to make this assignment by which we get to know about big data and that type of assignments helps us in our jobs REFRENCES7 Benefits of Data Visualization – DZone Big Data. (2019). from Analytics – Key Attributes, Scope and Advantages. (2019). from