Bitsight Technologies - Boston, MA
posted 2 months ago
As part of the security analytics team within the data science team, the Data Analyst supports thought leadership by exploring datasets, conducting insightful data analysis, and generating intuitive visualizations. The role will also help multiple internal stakeholders understand datasets, patterns, and trends. The Data Analyst will conduct Exploratory Data Analysis (EDA) using EDA packages in Python or R to explore and understand datasets, which includes identifying and removing or replacing missing/incomplete values, deduplicating across multiple datasets, and providing statistical descriptions of data fields, among others. In addition, the Data Analyst will devise and implement Data Analytics approaches for aligning external datasets to the company's data by identifying commonalities and combining different references to the same entities. The role requires the use of data science and statistics software tools in Python or R programming languages to conduct statistical analyses and statistical significance tests, as well as evaluate correlation strengths across multiple datasets and data fields. The Data Analyst will also utilize SQL, Python, and pandas or R to construct time- and space-efficient queries to probe and distill large datasets, leveraging deep knowledge of query languages and approaches. Familiarity with Big Data storage and computation platforms and tools (e.g., Amazon S3 and EMR, PySpark) is essential for efficiently working with large and complex datasets. Visualization tools and software packages (e.g., Tableau, seaborn) will be used to graph and visualize large complex datasets in a simple, intuitive format, presenting findings clearly and concisely for stakeholders. Collaboration with the company's Thought Leadership team will be necessary to generate content for blogs, white papers, reports, and other external-facing content. Additionally, the Data Analyst will work with the Strategic Partnerships team by combining internal and partner-provided datasets and making value propositions about the combined data. The position is based in a fixed location; however, 100% telecommuting is permitted.