TDS readers come from a staggering variety of professional and educational backgrounds, and join our community at different points of their data science journey. We’re especially proud to support data professionals at the earliest stages of their career, but we also know that there’s no universal definition for what “beginner” means these days.
With that in mind, this week we’re presenting some of the best recent additions to our Getting Started column—the space where we collect standout articles that also happen to be patiently explained, and that don’t require extensive or specialized knowledge of the topic at hand.
Our picks reflect the growing move towards early-career specialization: they range from machine learning project design to a data-engineering tutorial. Since none of us can be an expert in everything, their thematic diversity means that more advanced data scientists will almost certainly find something new and interesting to explore as well.
- Get familiar with a popular machine learning framework. If you’ve been tinkering with gradient-boosted algorithms, chances are high you’ve run into LightGBM. In case you could use some guidance on how to make the most of it, Leonie Monigatti’s introduction to the most essential LightGBM parameters is clear, actionable, and well illustrated.
- It’s never too early to think about ML project design. Of all the (many) industry buzzwords that have come into circulation in the past few years, MLOps seems to have one of the longest shelf lives. Chayma Zatout’s deep dive is a good starting point if you’re not sure how this concept might relate to your day-to-day workflows, and how to apply its principles to your current projects.
- Ease your way into building a solid pipeline. Apache Airflow might be a common tool for data engineering teams, but as Aashish Nair points out, its ubiquity doesn’t make its terminology, features, and quirks any less daunting. To help, Aashish presents a Python-based demo that walks readers through the process of creating a simple Airflow pipeline.
- Neural networks from the ground up. It’s all but impossible to understand the major strides we’re seeing in AI research without a firm grasp of neural networks. Dr. Roi Yehoshua’s overview of perceptrons—“one of the earliest computational models of neural networks”—is a gentle entryway into the topic, and covers the basic concepts before moving to a Python implementation.
- Streamline your learning process with better notes. Regardless of the data science topic you’d like to focus on in coming weeks, a good note-taking practice can make a real difference. Madison Hunter’s new post presents a six-step roadmap to more effective studying and better retention.
- Get up to speed with an up-and-coming Python library. If you learned how to work with DataFrames using Pandas—a likely scenario for many data scientists!—you may or may not be happy to know that a new library, Polars, has been gaining a lot of traction in recent months thanks to its high-speed performance. David Hundley’s latest post is geared towards Pandas-trained people who’d like to explore Polars’ benefits.
Thank you for your time and your support this week! It allows us to publish excellent stories every day, including those we recommend for a special Boost on Medium, a program we’re thrilled to take part in.
If you enjoy the articles you read on TDS (and want to gain unlimited access to our archive), consider becoming a Medium member. Students: now’s a particularly great time to join, as many of you can enjoy a substantial discount on memberships.
Until the next Variable,