Welcome to Kai's Blog!

学习交流成长

快手科技数据类工作职位内推（社招/全职实习）

2019-07-31

Jobs

快手社会招聘内推职位数据科学数据产品数据工程数据分析商业分析经营分析战略分析实习生简历
对快手以下数据职位感兴趣的欢迎将简历发至kai#drkaiwang.com，根据简历背景可以同时内部推荐多个相关职位，最好附1-2句话介绍自己最突出的优势：

【社招】【快手科技】数仓数据开发工程师招聘

岗位职责：
1. 参与公司数据需求收集和整理；
2. 对接业务与分析人员，负责数据集市的研发；
3. 处理支持日常数据统计和数据分析工作。
岗位要求：
1. 计算机相关专业，熟练使用SQL，有三年以上的hive使用经验；
2. 懂得基本分布式计算原理；
3. 熟悉基本数据仓库模型；
4. 喜欢新技术，乐于接受挑战，对数据敏感，具备良好的客户需求理解能力，良好的团队协作和优秀个沟通能力。
【社招】【快手科技】商业分析师

岗位职责：
1. 深度洞察短视频/直播类行业和产品，并进行深入细致的行业分析，为公司业务提供分析支持。
2. 以定量和定性方法解决产品、运营、战略相关的复杂分析问题，能够独立完成方法和策略设计，数据梳理与分析实现。
3. 对公司内外部数据进行挖掘，在深入理解用户行为、产品特点的基础上，自主发掘课题并提出建议，数据驱动产品、经营决策。
岗位要求：
1. 善于深度思考问题，较强的结构化思考能力和分析能力。
2. 良好的数据敏感度，能从海量数据提炼核心结果，能够熟练使用SQL，Python，R等分析工具的候选人优先。
3. 两年及以上行业研究或者咨询相关经验，大型互联网公司/知名咨询机构商业分析工作经验者优先。
【实习】【快手科技】商业数据分析实习生招聘

快手直播分析团队招聘商业数据分析实习生，研一、研二、大四同学优先
另，本岗位也在同时进行社招内推，有兴趣的同学欢迎前来投递、咨询！

工作时间（硬性要求）：要求全职实习，每周需到岗5个工作日，能够连续实习不短于6个月，表现优异有转正机会
工作地点：西二旗快手总部

岗位要求：
1. 希望有比较强的求生欲——遇到不会做的事情，有快速学会的勇气
2. 需熟练使用sql，会python、R者加分
3. 计算机、统计学、数学、经济学、金融、工程或其他相关专业加分
4. 严谨的逻辑思考能力，做分析类case需要反复深究
5. 有互联网行业实习经验者加分，对直播、视频行业有了解加分
工作内容：
1. 对直播产品各项功能，用数据评估效果
2. 完成报表、用户调研等其它任务
备注：
1. 来信请勿设置已读回执
2. 请写全所有教育/工作经历
3. 务必在简历和邮件中注明毕业时间、预计实习期的起止时间
4. 非北京高校非北京户籍的优秀实习生可申请租房补贴
长期内推快手等独角兽公司或知名互联网创业公司的产品、前后端开发、算法、数据、市场、运营等职位。感兴趣欢迎将简历发至kai#drkaiwang.com
Read All
产品分析常见数据指标与名词含义

2019-07-01

Notes

产品分析数据指标名词数据分析指标体系

定义产品的核心指标十分重要，一个合理的核心指标可以有效的指引产品前进的方向

1、流量来源：流量来源的意思是网站的访问来源，比如用户来自于知乎，来自于微博等等。主要用来统计分析各渠道的推广效果。

2、 PV：PV（page view）即页面浏览量或点击量，指页面刷新的次数，每一次页面刷新，就算做一次PV流量。

3、 UV：UV（unique visitor）即独立访客数，在同一天内，UV只记录第一次进入网站的具有独立IP的访问者，在同一天内再次访问该网站则不计数。PV与UV的比值一定程度上反映产品的粘性，比值越高往往粘性越高。

4、 IP数：IP数即独立IP的访问用户数，指1天内使用不同IP地址的用户访问网站的数量。IP数字与UV可能不同（可大可小可相等）

5、日活/月活：每日活跃用户数（DAU）/每月活跃用户数（MAU），反映的是网站或者APP的用户活跃程度，用户粘性。

6、次日留存/次月留存：次日留存、次月留存反映的是网站或者APP的留存率。

7、用户保有率：指在单位时间内符合有效用户条件的用户数在实际产生用户量的比率，也叫用户留存。

8、转化率/流失率：转化率一般用来统计两个流程之间的转化比例。其中流失率也是重要的数据指标。用户流失率=总流失用户数/总用户数。

9、跳出率：指用户到达网站上且仅浏览了一个页面就离开的访问次数（PV）与所有访问次数的百分比。跳出率越高说明越不受欢迎。

10、退出率：对某一个特定的页面而言，从这个页面离开网站的访问数（PV）占这个页面的访问数的百分比。跳出率适用于访问的着陆页（即用户访问的第一个页面），而退出率则适用于任何访问退出的页面。

11、使用时长：每天用户使用的时间。对于游戏或者是社交产品来说，使用时间越长，说明用户越喜欢。一般来说，使用时长越短说明产品粘性越差，用户越不喜欢。

12、ARPU：Average Revenue Per User，每用户平均收入在一定时间内，ARPU=总收入/用户数。

Read All
2017年十大最受欢迎机器学习Python库（转）

2018-08-12

Notes

python libraries 机器学习
- 1. Pipenv
- 2. PyTorch
- 3. Caffe2
- 4. Pendulum
- 5. Dash
- 6. PyFlux
- 7. Fire
- 8. imbalanced-learn
- 9. FlashText
- 10. Luminoth
- PyVips
- Requestium
- skorch
- Conclusion
AI的快速发展，让机器学习走向了巅峰，今天我们就借此盘点一下2017年最受欢迎的机器学习库（ML），希望你能够在这里寻找到你未来一段时间内的“利器”。

1. Pipenv

We couldn’t make this list without reserving the top spot for a tool that was only released early this year, but has the power to affect the workflow of every Python developer, especially more now since it has become the officially recommended tool on Python.org for managing dependencies!

Pipenv, originally started as a weekend project by the awesome Kenneth Reitz, aims to bring ideas from other package managers (such as npm or yarn) into the Python world. Forget about installing virtualenv, virtualenvwrapper, managing requirements.txt files and ensuring reproducibility with regards to versions of dependencies of the dependencies (read here for more info about this). With Pipenv, you specify all your dependencies in a Pipfile — which is normally built by using commands for adding, removing, or updating dependencies. The tool can generate a Pipfile.lock file, enabling your builds to be deterministic, helping you avoid those difficult to catch bugs because of some obscure dependency that you didn’t even think you needed.

Of course, Pipenv comes with many other perks and has great documentation, so make sure to check it out and start using it for all your Python projects, as we do at Tryolabs :)

2. PyTorch

If there is a library whose popularity has boomed this year, especially in the Deep Learning (DL) community, it’s PyTorch, the DL framework introduced by Facebook this year.

PyTorch builds on and improves the (once?) popular Torch framework, especially since it’s Python based — in contrast with Lua. Given how people have been switching to Python for doing data science in the last couple of years, this is an important step forward to make DL more accessible.

Most notably, PyTorch has become one of the go-to frameworks for many researchers, because of its implementation of the novel Dynamic Computational Graph paradigm. When writing code using other frameworks like TensorFlow, CNTK or MXNet, one must first define something called a computational graph. This graph specifies all the operations that will be run by our code, which are later compiled and potentially optimized by the framework, in order to allow for it to be able to run even faster, and in parallel on a GPU. This paradigm is called Static Computational Graph, and is great since you can leverage all sorts of optimizations and the graph, once built, can potentially run in different devices (since execution is separate from building). However, in many tasks such as Natural Language Processing, the amount of “work” to do is often variable: you can resize images to a fixed resolution before feeding them to an algorithm, but cannot do the same with sentences which come in variable length. This is where PyTorch and dynamic graphs shine, by letting you use standard Python control instructions in your code, the graph will be defined when it is executed, giving you a lot of freedom which is essential for several tasks.

Of course, PyTorch also computes gradients for you (as you would expect from any modern DL framework), is very fast, and extensible, so why not give it a try?

3. Caffe2

It might sound crazy, but Facebook also released another great DL framework this year.

The original Caffe framework has been widely used for years, and known for unparalleled performance and battle-tested codebase. However, recent trends in DL made the framework stagnate in some directions. Caffe2 is the attempt to bring Caffe to the modern world.

It supports distributed training, deployment (even in mobile platforms), the newest CPUs and CUDA-capable hardware. While PyTorch may be better for research, Caffe2 is suitable for large scale deployments as seen on Facebook.

Also, check out the recent ONNX effort. You can build and train your models in PyTorch, while using Caffe2 for deployment! Isn’t that great?

4. Pendulum

Last year, Arrow, a library that aims to make your life easier while working with datetimes in Python, made the list. This year, it is the turn of Pendulum.

One of Pendulum’s strength points is that it is a drop-in replacement for Python’s standard datetime class, so you can easily integrate it with your existing code, and leverage its functionalities only when you actually need them. The authors have put special care to ensure timezones are handled correctly, making every instance timezone-aware and UTC by default. You will also get an extended timedelta to make datetime arithmetic easier.

Unlike other existing libraries, it strives to have an API with predictable behavior, so you know what to expect. If you are doing any non trivial work involving datetimes, this will make you happier! Check out the docs for more.

5. Dash

You are doing data science, for which you use the excellent available tools in the Python ecosystem like Pandas and scikit-learn. You use Jupyter Notebooks for your workflow, which is great for you and your colleagues. But how do you share the work with people who do not know how to use those tools? How do you build an interface so people can easily play around with the data, visualizing it in the process? It used to be the case that you needed a dedicated frontend team, knowledgeable in Javascript, for building these GUIs. Not anymore.

Dash, announced this year, is an open source library for building web applications, especially those that make good use of data visualization, in pure Python. It is built on top of Flask, Plotly.js and React, and provides abstractions that free you from having to learn those frameworks and let you become productive quickly. The apps are rendered in the browser and will be responsive so they will be usable in mobile devices.

If you would like to know more about what is possible with Dash, the Gallery is a great place for some eye-candy.

6. PyFlux

There are many libraries in Python for doing data science and ML, but when your data points are metrics that evolve over time (such as stock prices, measurements obtained from instruments, etc), that is not the case.

PyFlux is an open source library in Python built specifically for working with time series. The study of time series is a subfield of statistics and econometrics, and the goals can be describing how time series behave (in terms of latent components or features of interest), and also predicting how they will behave the future.

PyFlux allows for a probabilistic approach to time series modeling, and has implementations for several modern time series models like GARCH. Neat stuff.

7. Fire

It is often the case that you need to make a Command Line Interface (CLI) for your project. Beyond the traditional argparse, Python has some great tools like click or docopt. Fire, announced by Google this year, has a different take on solving this same problem.

Fire is an open source library that can automatically generate a CLI for any Python project. The key here is automatically: you almost don’t need to write any code or docstrings to build your CLI! To do the job, you only need to call a Fire method and pass it whatever you want turned into a CLI: a function, an object, a class, a dictionary, or even pass no arguments at all (which will turn your entire code into a CLI).

Make sure to read the guide so you understand how it works with examples. Keep it under your radar, because this library can definitely save you a lot of time in the future.

8. imbalanced-learn

In an ideal world, we would have perfectly balanced datasets and we would all train models and be happy. Unfortunately, the real world is not like that, and certain tasks favor very imbalanced data. For example, when predicting fraud in credit card transactions, you would expect that the vast majority of the transactions (+99.9%?) are actually legit. Training ML algorithms naively will lead to dismal performance, so extra care is needed when working with these types of datasets.

Fortunately, this is a studied research problem and a variety of techniques exist. Imbalanced-learn is a Python package which offers implementations of some of those techniques, to make your life much easier. It is compatible with scikit-learn and is part of scikit-learn-contrib projects. Useful!

9. FlashText

When you need to search for some text and replace it for something else, as is standard in most data-cleaning work, you usually turn to regular expressions. They will get the job done, but sometimes it happens that the number of terms you need to search for is in the thousands, and then, reg exp can become painfully slow to use.

FlashText is a better alternative just for this purpose. In the author’s initial benchmark, it improved the runtime of the entire operation by a huge margin: from 5 days to 15 minutes. The beauty of FlashText is that the runtime is the same no matter how many search terms you have, in contrast with regexp in which the runtime will increase almost linearly with the number of terms.

FlashText is a testimony to the importance of the design of algorithms and data structures, showing that, even for simple problems, better algorithms can easily outdo even the fastest CPUs running naive implementations.

10. Luminoth

Disclaimer: this library was built by Tryolabs’ R&D area.

Images are everywhere nowadays, and understanding their content can be critical for several applications. Thankfully, image processing techniques have advanced a lot, fueled by the advancements in DL.

Luminoth is an open source Python toolkit for computer vision, built using TensorFlow and Sonnet. Currently, it out-of-the-box supports object detection in the form of models called Faster R-CNN and SSD.

But Luminoth is not only an implementation of some particular models. It is built to be modular and extensible, so customizing the existing pieces or extending it with new models to tackle different problems should be straightforward, with as much code reuse as there can be. It provides tools for easily doing the engineering work that are needed when building DL models: converting your data (in this case, images or videos) to adequate format for feeding your data pipeline (TensorFlow’s tfrecords), doing data augmentation, running the training in one or multiple GPUs (distributed training will be a must when working with large datasets), running evaluation metrics, easily visualizing stuff in TensorBoard and deploying your trained model with a simple API or browser interface, so people can play around with it.

Moreover, Luminoth has straightforward integration with Google Cloud’s ML Engine, so even if you don’t own a powerful GPU, you can train in the cloud with a single command, just as you do in your own local machine.

If you are interested in learning more about Luminoth and the features of its latest version, you can read this blog post and watch the video of our talk at ODSC.

Bonus: watch out for these

PyVips

You may have never heard of the libvips library. In that case, you must know that it’s an image processing library, like Pillow or ImageMagick, and supports a wide range of formats. However, when comparing to other libraries, libvips is faster and uses less memory. For example, some benchmarks show it to be about 3x faster and use less than 15x memory as ImageMagick. You can read more about why libvips is nice here.

PyVips is a recently released Python binding for libvips, which is compatible with Python 2.7-3.6 (and even PyPy), easy to install with pip and drop-in compatible with the old binding, so if you are using that, you don’t have to modify your code.

If doing some sort of image processing in your app, definitely something to keep an eye on.

Requestium

Disclaimer: this library was built by Tryolabs.

Sometimes, you need to automatize some actions in the web. Be it when scraping sites, doing application testing, or filling out web forms to perform actions in sites that do not expose an API, automation is always necessary. Python has the excellent Requests library which allows you perform some of this work, but unfortunately (or not?) many sites make heavy client side use of Javascript. This means that the HTML code that Requests fetches, in which you could be trying to find a form to fill for your automation task, may not even have the form itself! Instead, it will be something like an empty div of some sort that will be generated in the browser with a modern frontend library such as React or Vue.

One way to solve this is to reverse-engineer the requests that Javascript code makes, which will mean many hours of debugging and fiddling around with (probably) uglified JS code. No thanks. Another option is to turn to libraries like Selenium, which allow you to programmatically interact with a web browser and run the Javascript code. With this, the problems are no more, but it is still slower than using plain Requests which adds very little overhead.

Wouldn’t it be cool if there was a library that let you start out with Requests and seamlessly switch to Selenium, only adding the overhead of a web browser when actually needing it? Meet Requestium, which acts as a drop-in replacement for Requests and does just that. It also integrates Parsel, so writing all those selectors for finding the elements in the page is much cleaner than it would otherwise be, and has helpers around common operations like clicking elements and making sure stuff is actually rendered in the DOM. Another time saver for your web automation projects!

skorch

You like the awesome API of scikit-learn, but need to do work using PyTorch? Worry not, skorch is a wrapper which will give PyTorch an interface like sklearn. If you are familiar with those libraries, the syntax should be straightforward and easy to understand. With skorch, you will get some code abstracted away, so you can focus more on the things that really matter, like doing your data science.

Conclusion

What an exciting year! If you know of a library that deserves to be on this list, make sure you mention it in the comments below. There are so many good developments that it’s hard to keep up. As usual, thanks to everybody in the community for such great work!

Finally, don’t forget to subscribe to our newsletter so that you don’t miss out future editions of this post or our ML related content.
Read All
《Nature》- 离开学术圈的博士生并不失败

2018-08-06

Notes

Hive Bug Tableau Null PARTITION 字段分区

最近在Nature上看到了一篇评论：Why it is not a ‘failure’ to leave academia，谈到的恰好是博士毕业后转行的话题，对想要转行的博士生给了几点非常中肯的建议。

笔者是工科博士，毕业回国后几经辗转到了互联网行业做数据科学方向的工作，中间也历经了不少辛酸和挫折。的确很多人会对我说，读了名校的博士，不做科研/不进高校可惜了啊。其实我想对他们说：只要能够及时发现自己的真正兴趣，并且现在和将来能够从事自己真正热爱的工作和事业，那么不管成本多大、过程多么艰辛都是值得的。况且，读博三年多的学术训练和留学经历也是我一生中最宝贵的财富，一点都不浪费。

引用原文中的话来说：

A PhD is highly valuable

This leaves us with one last aspect of the culture of failure and its effect on doctoral students and postdocs: the widespread misconception that a PhD is useful training only for academic research. Or, in other words, if you leave academia, your mum will think that you’ve wasted your time doing a PhD. You might even have wondered about that yourself.

We know that most PhD graduates eventually go on to other careers, but have they all wasted their time? Absolutely not. The skills you are acquiring (or have acquired) during a PhD are highly sought by employers beyond academic science. You are incredibly resilient, hard-working and motivated. You make decisions based on evidence, you can interpret data, you can communicate complex concepts clearly, you are an effective team player and you can prioritize tasks. And you have a degree to prove all of this.

You have every reason to be positive about your job prospects.

Personally, I won’t regret having done my PhD, regardless of my future career.

Read All
HIVE表新增加字段导致新增字段为NULL值的BUG

2018-08-01

Notes

Hive Bug Tableau Null PARTITION 字段分区
- 解决方法1：
- 解决方法2：
问题描述：数据分析时经常遇到需求变更等特殊情况，需要在hive表中增加字段，并且重跑调度任务回溯历史数据（insert overwrite），但连接tableau时发现新增字段均为null值

解决方法1：

第一步: 在hive元数据中的sds表找到字段增加后新分配的字段组ID(CD_ID，表的所有字段对应一个CD_ID字段值)，如:SELECT * FROM sds WHERE location LIKE ‘%table_name%’ 第二步: 在SDS表中可以看到新分配的字段组值(CD_ID)、已有分区所对应的旧字段组值ID(CD_ID)，在该表中把旧的CD_ID值更新为新的CD_ID值即可，如:UPDATE SDS SET CD_ID=NEW_CD_ID(所找到的新值) WHERE CD_ID=OLD_CD_ID(旧值)

解决方法2：

删除原有分区后再回溯历史数据；或删除原表重新建表后回溯历史数据。
Read All
滴滴数据类社招职位内推

2018-07-19

Jobs

滴滴社会招聘内推职位数据科学数据产品数据工程数据分析简历
对滴滴以下数据职位感兴趣的欢迎将简历发至kai#drkaiwang.com，根据简历背景可以同时内部推荐多个相关职位：

数据分析专家/数据工程专家

岗位职责
1. 通过内外部采集数据，优化数仓，构建数据集市等方式丰富和优化基础数据；
2. 熟悉业务数据流的产生，采集，入库，应用等各环节流转机制，配合分析人员完成数据采集，配合数仓完成异常数据监控、告警、修复等工作，有数据仓库开发等经验优先；
3. 通过构建和优化数据工具、可视化工具等来提升分析及运营等人员分析效率，通过构建业务告警工具类监控业务风险点；
岗位要求：
1. 熟练使用SQL，Hive，R或者Python，熟悉Excel等
2. 有数据挖掘/数据质量建设/数仓开发/分析经验工作者优先
3. 有良好的沟通能力，较强的抗压力，能够自我驱动
大数据开发工程师

岗位职责
1. 负责数据平台架构设计，核心开发任务
2. 对业务问题进行合理抽象和设计，设计和开发高质量的底层数据体系，驱动业务快速健康发展
岗位要求：
1. 本科及以上学历，计算机相关专业，两年以上服务端研发经验；
2. 深入理解计算机原理，有扎实的数据结构和算法基础；
3. 熟练掌握Hadoop/Hive/HBase/Kafka等技术；
4. 对Spark或者Flink等流式计算开发经验优先；
5. 精通go，java，python等至少一门语言；
6. 有ES/Druid等开源系统经验者优先；
7. 具有存储服务的设计和开发经验者优先。
高级数据分析师-数据产品经理

岗位职责
1. 分析业务部分实际需求、规划、推进公司数据体系的建设及维护 — 数仓建设；
2. 对产品优化提供数据支持，规划埋点方案及维护埋点质量 — 埋点管理；
3. 梳理数据分析结论，管理公司核心指标开发及维护 — 指标体系；
4. 梳理业务部门日常报表需求，安排开发及维护 — 报表；
5. 与业务部门进行沟通，提炼现有数据使用需求，与数据平台合作开发相关产品 — 数据产品。
岗位要求：
1. 数据相关岗位工作经验2年以上
2. 熟练SQL/Hive 语句
3. 熟悉数据库使用，有BI 和数仓建设者优先
4. 熟悉产品设计及开发流程
5. 具备良好的团队合作、协调能力，主动沟通能力
6. 良好的逻辑思维能力，理解业务发展方案
7. 思维敏捷，善于提出问题思考解决方法
数据科学家

岗位职责
1. 将用户体验数据与行为数据相结合，对数据进行采集整理、统计与深度挖掘分析，对体验指标进行监控与问题定位、提出改进方向、给出数据结论；
2. 具备数据分析和数据建模能力，结合用户行为数据，深入挖掘用户体验痛点，搭建用户体验画像，不断优化和完善现有体验指标分析体系;
3. 定位用户的核心体验痛点，将数据转化为可落地和有说服力的洞察，高效和迅速的推进用户体验提升；
岗位要求：
1. 5年以上数据分析的工作经验，互联网相关领域优先，应用数学，统计学，计算机相关专业硕士优先；
2. 具备大数据的处理能力，精通hive、SQL等相关数据提取工具，熟练使用R或Python、Excel、PPT等工具；具备Tableau可视化工具开发能力更佳；
3. 具有较强的思维逻辑能力，良好的数据敏感度，能从海量数据提炼核心结果；有丰富的数据分析、挖掘、建模的经验；
4. 具备良好的抗压能力和团队合作意识；能跨团队独立推动数据结论的决策和落地。
数据分析高级专家

岗位职责
1. 深刻理解商业逻辑和运营方法，对互联网产品和技术有全面认识，建立业务核心分析框架体系；
2. 能独立处理复杂的业务需求并转化成可量化的数据需求，并对于常用重要需求转化成例行性监控报表及数据产品；
3. 通过数据模型和量化分析为滴滴的运营和产品提供深度洞见；
4. 与产品、研发、运营密切合作, 通过数据来发现滴滴业务的增长点, 并与多方合作实现产品落地, 帮助滴滴核心业务增长；
5. 指导并帮助初级分析师工作及成长。
岗位要求：
1. 国内外主流互联网公司累计3年以上工作经验；
2. 具有较强的商业敏感度和数据分析技能，能够运用创新且可落地的分析方法以解决复杂的商业问题，在负责领域，可以指导初级分析师的工作；
3. 熟悉常规数据分析方法, 对数据分析有系统的方法论；
4. 熟练使用SQL 和至少一种解释分析语言( R、Python等), 并熟悉数据可视化(如 Tableau 等)；
5. 良好的沟通能力, 耐压力, 和强大的推动力；
6. 优秀的团队合作精神、诚实、勤奋、严谨。
首席数据科学家

岗位职责
1. 基于丰富的统计经验, 指领的统计检验, 实验设计, 统计规范, 效果评估方面的工作；
2. 熟悉各种实验设计场景, 并能领导对非常实验场景的实验方案, 和工具团队实现实验的落地和效果评估；
3. 通过数据挖掘和量化分析为滴滴的运营和产品提供深度洞见；
4. 与产品, 研发, 运营密切合作, 通过数据来发现滴滴业务的增长点, 并与多方合作实现产品落地, 帮助滴滴核心业务增长；
5. 指导统计科学组, 对滴滴在产品, 策略, 运营方面的改进做效果评估。
岗位要求：
1. 8年以上统计决策相关的工作经验；
2. 统计(Statistics) 或者生物统计(Biostatistics) PhD, 或相关专业有广泛统计学习的经验；
3. 良好的统计理论知识和实践经验, 有机器学习(Machine Learning)经验；
4. 熟练应用统计语言(如R, NumPy等), 并有统计 package 开发经历, 并熟悉数据可视化.；
5. 有工业界统计规范(Statistical Protocol)开发经历者优先。
长期内推滴滴等独角兽公司或知名互联网创业公司的产品、前后端开发、算法、数据、市场、运营等职位。感兴趣欢迎将简历发至kai#drkaiwang.com
Read All

1/4

Welcome to Kai's Blog!

快手科技数据类工作职位内推（社招/全职实习）

对快手以下数据职位感兴趣的欢迎将简历发至kai#drkaiwang.com，根据简历背景可以同时内部推荐多个相关职位，最好附1-2句话介绍自己最突出的优势：

【社招】【快手科技】数仓数据开发工程师招聘

【社招】【快手科技】商业分析师

【实习】【快手科技】商业数据分析实习生招聘

长期内推快手等独角兽公司或知名互联网创业公司的产品、前后端开发、算法、数据、市场、运营等职位。感兴趣欢迎将简历发至kai#drkaiwang.com

产品分析常见数据指标与名词含义

2017年十大最受欢迎机器学习Python库（转）

1. Pipenv

2. PyTorch

3. Caffe2

4. Pendulum

5. Dash

6. PyFlux

7. Fire

8. imbalanced-learn

9. FlashText

10. Luminoth

PyVips

Requestium

skorch

Conclusion

《Nature》- 离开学术圈的博士生并不失败

HIVE表新增加字段导致新增字段为NULL值的BUG

解决方法1：

解决方法2：

滴滴数据类社招职位内推

对滴滴以下数据职位感兴趣的欢迎将简历发至kai#drkaiwang.com，根据简历背景可以同时内部推荐多个相关职位：

数据分析专家/数据工程专家

大数据开发工程师

高级数据分析师-数据产品经理

数据科学家

数据分析高级专家

首席数据科学家

长期内推滴滴等独角兽公司或知名互联网创业公司的产品、前后端开发、算法、数据、市场、运营等职位。感兴趣欢迎将简历发至kai#drkaiwang.com