Sunday, December 22, 2024

What is `Open Source`?


It has never been easier to explore data science, and schooling is optional.

Mark Twain once quipped that he patently refused to let schooling interfere with his education, and I highly recommend this paradigm for data science students of all ages. A degree can help you get a job, but only competence enables science.

Our current data science renaissance is largely fueled by open-source software and programming languages, free and accessible means by which cutting edge technologies can be applied to real-world data at little to no cost.

With both students and professionals in mind, I focus on the following 2 open-source programming languages on this site:
  • Python: Known for its simplicity and a strong ecosystem of libraries, Python is a top choice for data manipulation, analysis, machine learning, and visualization.
  • R: Favored in academic and research sectors for statistical computing and graphics, R excels in data visualization and manipulation with several concomitant libraries developed and supported by the R community.

There are a few other programming languages, but these are enough. I’m likely to leave docker containers, AWS, and Arch Linux out of it completely. Forget Tableau, Spotfire, Excel, and Power BI for now, too, though I understand that, like favorite ice cream flavors, some people have these preferences, and if you work for a giant company willing to pay costs, so be it. But, my focus here is on getting things done without such things.

Also, some data scientists, particularly legacy statisticians, love SAS. Like, really love it, a lot. There is even a free, non-commercial build of SAS that can be accessed by students, and building macros in that can be fun, but it is unnecessary complication to my stated ends.

This website, and my feed on X feature open-source content I think worth your time.

I learned everything I needed to know to use chained AI-agents overseen by an LLM supervisor for $13 on Udemy.com from this course. (If it is listed for a higher price, wait for it to go on sale.)

One of the greatest assets of open-source software and programming languages is the extensive community contributing packages to make it better, and there are many places to look up answers and ask questions when you run into issues. Several LLMs are also trained on this open source content.

Finally, as I continue adding content to this site, I will publish code to GitHub, a popular coding repository offering a cornucopia of resources to explore.

Additional explanations of what "open source" offers can be found at the following links:

Super Admin

Jimmy Fisher



you may also like

  • by Jimmy Fisher
  • Oct 19, 2024
Welcome to the Site!
  • by Jimmy Fisher
  • Oct 19, 2024
Leveraging Public Health Data
  • by Jimmy Fisher
  • Oct 19, 2024
Self-Learning Data Science
  • by Jimmy Fisher
  • Nov 03, 2024
Intro to Data Projects