And you don't need to code to get started.

Today's data world uses many open source tools, from data engineering to data science and deep learning. This is a unique opportunity to grow your career in multiple ways. In a recent podcast of DataTalks.Club, Merve Noyan, shared how she went from baby steps on GitHub to Developer Advocate Engineer at HuggingFace. Inspired by the talk, I wanted to give my reflection on how open source has helped me throughout my career so far, up to Staff Data Engineer.
🍼 Baby steps in Open Source
Reproducible issue
Contributing to open source can be scary. Where do you start with an unknown codebase, a different way of working, and a lot of automation during the PR?
Well, you don't have to code to do your first steps!
The first time I faced open source was because I had an issue with a python package. After no luck on StackOverflow, I decided to look at the GitHub issues. A similar issue was already there but with poor context, so I commented with an extensive how-to reproduce this one. One day later, a fix came out from the maintainer 🎉
Have a problem with a library? Here's what I usually do :
Go directly to the GitHub repo and check the documentation.
2. If there's nothing helpful in the documentation, I'm searching for any issues. Don't forget to remove the default filter open
and search through all issues. Most of the time, you must dig into the closed issues to find relevant information.
3. If there's no existing/related issue, I'll open one. I'll spend enough time providing all information needed to reproduce it.
This process is so underrated but so valuable for the maintainer. Having multiple data points of a problem with clear steps on how to reproduce is 50% of the work towards a solution.
Promotion and documentation support
There are other ways to get involved without coding :
Update documentation
Helping on StackOverflow
Share it on social media (Twitter/Linkedin)
These will also allow you to exchange knowledge and meet incredible people online.
🦸 Next Level
Your first coding tutorial
Before committing to someone's code, why not share your knowledge along a “hello world” project?
This is less scary because you are in control of everything, and it doesn't need to be crazy in terms of features. The main goal is to teach something. It can be a blog or a video, but it's always better to have the code repository pushed somewhere. Here are some personal examples of written coding tutorials and videos I did.
Bonus: reach out to the creator of whatever you are covering; sometimes, they will be super happy to re-share and highlight your work!
Your first library
It doesn't need to be a great library that millions of users will download. It can be something you created to solve a specific problem you encounter. If you face a challenge, chances are high that someone will have the same.
It doesn't even need to be a library. It can be a framework, code snippet, or a boilerplate. That's what I did with a pyspark boilerplate. I wanted a simple boilerplate I could reuse over different projects. Nothing perfect and fancy, but it's solving a problem I have.
Your first Pull Request (PR)
Now that you've been solo contributing, you are ready to look at someone's project. It's worth looking at the first good issue
label on GitHub and start the discussion before implanting anything.
It can be frustrating to have your PR rejected because it's not in line with the design decision. Merve Noyan highlighted that maintainers will always be happy to discuss with you, as they respect your time and commitment to the project.
Multiple seasoned events will promote open-source contributions.
Here are a few of them:
Contribution sprint: Many opensource projects have dedicated contribution sprints where maintainers will focus their time onboarding and helping new contributors.
Hacktoberfest
Google Summer of Code
📣 Promote your project
Nobody wants to git clone and read your README to set up your project. The last mile would be to deploy your project so users can easily use it.
If your project is a library, push it to appropriate places (e.g.: PyPI for python)
For other kinds of projects, there are a couple of platforms that can help you :
Kaggle provides a notebook runtime to show off your projects
HuggingFace Space offers a simple way to host ML demo apps
Streamlit turns data scripts into web apps in a few minutes
🌟 From contributing to Open Source to landing your dream job
There's a great secret about doing work in public: it's public. Anyone can look it up. It could also speed up technical interviews as you may have already proven your abilities through some PR's.
Some companies also offer you the opportunity to do a public PR on an open source project they own. It's great because your coding tests are visible for your other interviews.
🚀 Go contribute!
There has never been a better opportunity to contribute to Open Source.
There are tons of projects.
Many platforms to lower the technical barrier to deploying and showcasing your work.
And everything that you will do will be public, which is gold for future reference.
So don't hesitate, and make the leap!
Mehdi OUAZZA aka mehdio 🧢
Thanks for reading! 🤗 🙌 If you enjoyed this, follow me on 🎥 Youtube, ✍️ Medium, or 🔗LinkedIn for more data/code content!
Support my writing ✍️ by joining Medium through this link