A Productive Journey
Chien-Lan Hsueh 2022-07-23
Do you think you’ll continue to use R going forward
Certainly I will continue to use R in my future works. If it’s solely about the programming, R will not be my choice. However, doing data analysis and research is more than programming. I benefit a lot from R’s documents and the community. Therefore, to develop a new anaysis routine or modeling workflow, I like to use R. Once it’s well tested and ready for scale-up, I then help our software team to port it to other languages for scalability, efficiency (speed) and cost.
What things are you going to do differently in practice now that you’ve had this course?
I only used interactive notebook when I coded in Python. For any works
with R codes, I did it on R scripts with source()
like the include
headers and preprocess macro in C/C++. This makes it possible to load or
run some codes in a dynamic manner depending on tasks. However, it makes
codes hard to share with others.
In this course, I started to use R notebook more although it is still buggy (compared to Jupyter Lab Notebook). After I learn how to knit nested markdown documents with parameters and assign in different environment (scope), I can see that this can replace my original workflows and save me another step to share my work.
What areas of statistics/data science are you thinking about exploring further?
I would like to learn more (in depth) about reinforcement learning and neural networks. My main responsibility at work is to figure out the right combination of process parameters using as few experiment runs as possible. On top of the mechanistic models, we rely on statistic models to speed up the R&D cycle. Currently we use random forests and SVM, but quite often, we cannot find good “target labels” to perform these supervised learning algorithms. Our preliminary works using CNN to classify various wafer patterns show very promising results. Taking ST563 Statistical Learning will be my next step and I am really excited about it.