Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Prison IP to Showcase ASM and CTI Improvements at GovWare 2025 in Singapore

    October 14, 2025

    OpenAI and Broadcom Announce Strategic Collaboration to Deploy 10 Gigawatts of Customized AI Accelerators by 2029

    October 14, 2025

    Futures of Work ~ Constructing Higher Techniques for Survivors of Exploitation

    October 14, 2025
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»10 Command-Line Instruments Each Information Scientist Ought to Know
    Machine Learning & Research

    10 Command-Line Instruments Each Information Scientist Ought to Know

    Oliver ChambersBy Oliver ChambersOctober 12, 2025No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    10 Command-Line Instruments Each Information Scientist Ought to Know
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    10 Command-Line Instruments Each Information Scientist Ought to Know
    Picture by Writer

     

    # Introduction

     
    Though in trendy knowledge science you’ll primarily discover Jupyter notebooks, Pandas, and graphical dashboards, they don’t all the time provide the degree of management you may want. However, command-line instruments might not be as intuitive as you would like, however they’re highly effective, light-weight, and far quicker at executing the precise jobs they’re designed for.

    For this text, I’ve tried to create a steadiness between utility, maturity, and energy. You’ll discover some classics which are almost unavoidable, together with extra trendy additions that fill gaps or optimize efficiency. You possibly can even name this a 2025 model of vital CLI instruments listing. For many who aren’t aware of CLI instruments however need to study, I’ve included a bonus part with assets within the conclusion, so scroll all the way in which down earlier than you begin together with these instruments in your workflow.

     

    # 1. curl

     
    curl is my go-to for making HTTP requests like GET, POST, or PUT; downloading information; and sending/receiving knowledge over protocols akin to HTTP or FTP. It’s perfect for retrieving knowledge from APIs or downloading datasets, and you may simply combine it with data-ingestion pipelines to drag JSON, CSV, or different payloads. One of the best factor about curl is that it’s pre-installed on most Unix programs, so you can begin utilizing it instantly. Nonetheless, its syntax (particularly round headers, physique payloads, and authentication) might be verbose and error-prone. If you end up interacting with extra advanced APIs, it’s possible you’ll want an easier-to-use wrapper or Python library, however figuring out curl remains to be a necessary plus for fast testing and debugging.

     

    # 2. jq

     
    jq is a light-weight JSON processor that permits you to question, filter, rework, and pretty-print JSON knowledge. With JSON being a dominant format for APIs, logs, and knowledge interchange, jq is indispensable for extracting and reshaping JSON in pipelines. It acts like “Pandas for JSON within the shell.” The largest benefit is that it supplies a concise language for coping with advanced JSON, however studying its syntax can take time, and very massive JSON information might require extra care with reminiscence administration.

     

    # 3. csvkit

     
    csvkit is a set of CSV-centric command-line utilities for remodeling, filtering, aggregating, becoming a member of, and exploring CSV information. You possibly can choose and reorder columns, subset rows, mix a number of information, convert from one format to a different, and even run SQL-like queries towards CSV knowledge. csvkit understands CSV quoting semantics and headers, making it safer than generic text-processing utilities for this format. Being Python-based means efficiency can lag on very massive datasets, and a few advanced queries could also be simpler in Pandas or SQL. When you want velocity and environment friendly reminiscence utilization, contemplate the csvtk toolkit.

     

    # 4. qwk / sed

     
    Hyperlink (sed): https://www.gnu.org/software program/sed/guide/sed.html
    Basic Unix instruments like awk and sed stay irreplaceable for textual content manipulation. awk is highly effective for sample scanning, field-based transformations, and fast aggregations, whereas sed excels at textual content substitutions, deletions, and transformations. These instruments are quick and light-weight, making them good for fast pipeline work. Nonetheless, their syntax might be non-intuitive. As logic grows, readability suffers, and it’s possible you’ll migrate to a scripting language. Additionally, for nested or hierarchical knowledge (e.g., nested JSON), these instruments have restricted expressiveness.

     

    # 5. parallel

     
    GNU parallel hastens workflows by working a number of processes in parallel. Many knowledge duties are “mappable” throughout chunks of knowledge. Let’s say it’s a must to execute the identical transformation on tons of of information—parallel can unfold work throughout CPU cores, velocity up processing, and handle job management. It’s essential to, nevertheless, be conscious of I/O bottlenecks and system load, and quoting/escaping might be tough in advanced pipelines. For cluster-scale or distributed workloads, contemplate resource-aware schedulers (e.g., Spark, Dask, Kubernetes).

     

    # 6. ripgrep (rg)

     
    ripgrep (rg) is a quick recursive search instrument designed for velocity and effectivity. It respects .gitignore by default and ignores hidden or binary information, making it considerably quicker than conventional grep. It’s good for fast searches throughout codebases, log directories, or config information. As a result of it defaults to ignoring sure paths, it’s possible you’ll want to regulate flags to go looking all the pieces, and it isn’t all the time accessible by default on each platform.

     

    # 7. datamash

     
    datamash supplies numeric, textual, and statistical operations (sum, imply, median, group-by, and so forth.) straight within the shell through stdin or information. It’s light-weight and helpful for fast aggregations with out launching a heavier instrument like Python or R, which makes it perfect for shell-based ETL or exploratory evaluation. But it surely’s not designed for very massive datasets or advanced analytics, the place specialised instruments carry out higher. Additionally, grouping very excessive cardinalities might require substantial reminiscence.

     

    # 8. htop

     
    htop is an interactive system monitor and course of viewer that gives reside insights into CPU, reminiscence, and I/O utilization per course of. When working heavy pipelines or mannequin coaching, htop is extraordinarily helpful for monitoring useful resource consumption and figuring out bottlenecks. It’s extra user-friendly than conventional prime, however being interactive means it doesn’t match properly into automated scripts. It could even be lacking on minimal server setups, and it doesn’t exchange specialised efficiency instruments (profilers, metrics dashboards).

     

    # 9. git

     
    git is a distributed model management system important for monitoring adjustments to code, scripts, and small knowledge belongings. For reproducibility, collaboration, branching experiments, and rollback, git is the usual. It integrates with deployment pipelines, CI/CD instruments, and notebooks. Its downside is that it’s not meant for versioning massive binary knowledge, for which Git LFS, DVC, or specialised programs are higher suited. The branching and merging workflow additionally comes with a studying curve.

     

    # 10. tmux / display screen

     
    Terminal multiplexers like tmux and display screen allow you to run a number of terminal classes in a single window, detach and reattach classes, and resume work after an SSH disconnect. They’re important if it’s good to run lengthy experiments or pipelines remotely. Whereas tmux is really helpful attributable to its lively improvement and suppleness, its config and keybindings might be tough for newcomers, and minimal environments might not have it put in by default.

     

    # Wrapping Up

     
    When you’re getting began, I’d advocate mastering the “core 4”: curl, jq, awk/sed, and git. These are used in all places. Over time, you’ll uncover domain-specific CLIs like SQL shoppers, the DuckDB CLI, or Datasette to fit into your workflow. For additional studying, try the next assets:

    1. Information Science on the Command Line by Jeroen Janssens
    2. The Artwork of Command Line on GitHub
    3. Mark Pearl’s Bash Cheatsheet
    4. Communities just like the unix & command-line subreddits usually floor helpful methods and new instruments that may broaden your toolbox over time.

     
     

    Kanwal Mehreen is a machine studying engineer and a technical author with a profound ardour for knowledge science and the intersection of AI with medication. She co-authored the book “Maximizing Productiveness with ChatGPT”. As a Google Technology Scholar 2022 for APAC, she champions range and educational excellence. She’s additionally acknowledged as a Teradata Variety in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower ladies in STEM fields.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    Remodeling the bodily world with AI: the subsequent frontier in clever automation 

    October 14, 2025

    Constructing Pure Python Net Apps with Reflex

    October 14, 2025

    The Architect’s Dilemma – O’Reilly

    October 14, 2025
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Prison IP to Showcase ASM and CTI Improvements at GovWare 2025 in Singapore

    October 14, 2025
    Don't Miss

    Prison IP to Showcase ASM and CTI Improvements at GovWare 2025 in Singapore

    By Declan MurphyOctober 14, 2025

    Torrance, United States, October 14th, 2025, CyberNewsWirePrison IP at Sales space J30 | Sands Expo…

    OpenAI and Broadcom Announce Strategic Collaboration to Deploy 10 Gigawatts of Customized AI Accelerators by 2029

    October 14, 2025

    Futures of Work ~ Constructing Higher Techniques for Survivors of Exploitation

    October 14, 2025

    Remodeling the bodily world with AI: the subsequent frontier in clever automation 

    October 14, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2025 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.