Right here’s How I Constructed an MCP to Automate My Information Science Job

Picture by Ideogram

Most of my days as an information scientist seem like this:

Stakeholder: “Are you able to inform us how a lot we made in promoting income within the final month and what number of that got here from search advertisements?”
Me: “Run an SQL question to extract the info and hand it to them.”
Stakeholder: “I see. What’s our income forecast for the subsequent 3 years?”
Me: “Consolidate knowledge from a number of sources, communicate to the finance group, and construct a mannequin that forecasts income.”

Duties just like the above are advert hoc requests from enterprise stakeholders. They take round 3–5 hours to finish and are normally unrelated to the core venture I am engaged on.

When data-related questions like these are available in, they typically require me to push the deadlines of present initiatives or work further hours to get the job completed. And that is the place AI is available in.

As soon as AI fashions like ChatGPT and Claude have been made obtainable, the group’s effectivity improved, as did my capability to answer advert hoc stakeholder requests. AI dramatically lowered the time I spent writing code, producing SQL queries, and even collaborating with totally different groups for required info. Moreover, after AI code assistants like Cursor have been built-in with our codebases, effectivity good points improved even additional. Duties just like the one I simply defined above might now be accomplished twice as quick as earlier than.

Not too long ago, when MCP servers began gaining reputation, I assumed to myself:

Can I construct an MCP that automates these knowledge science workflows additional?

I spent two days constructing this MCP server, and on this article, I’ll break down:

The outcomes and the way a lot time I’ve saved with my knowledge science MCP
Sources and reference supplies used to create the MCP
The essential setup, APIs, and companies I built-in into my workflow

# Constructing a Information Science MCP

For those who do not already know what an MCP is, it stands for Mannequin Context Protocol and is a framework that means that you can join a big language mannequin to exterior companies.
This video is a good introduction to MCPs.

// The Core Drawback

The issue I wished to unravel with my new knowledge science MCP was:

How do I consolidate info that’s scattered throughout numerous sources and generate outcomes that may immediately be utilized by stakeholders and group members?

To perform this, I constructed an MCP with three elements, as proven within the flowchart beneath:

Picture by Creator | Mermaid

// Part 1: Question Financial institution Integration

As a information base for my MCP, I used my group’s question financial institution (which contained questions, a pattern question to reply the query, and a few context in regards to the tables).

When a stakeholder asks me a query like this:

What proportion of promoting income got here from search advertisements?

I not need to look by means of a number of tables and column names to generate a question. The MCP as a substitute searches the question financial institution for the same query. It then good points context in regards to the related tables it ought to question and adapts these queries to my particular query. All I must do is name the MCP server, paste in my stakeholder’s request, and I get a related question in a couple of minutes.

// Part 2: Google Drive Integration

Product documentation is normally saved in Google Drive—whether or not it is a slide deck, doc, or spreadsheet.

I linked my MCP server to the group’s Google Drive so it had entry to all our documentation throughout dozens of initiatives. This helps shortly extract knowledge and reply questions like:

Are you able to inform us how a lot we made in promoting income within the final month?

I additionally listed these paperwork to extract particular key phrases and titles, so the MCP merely has to undergo the key phrase checklist primarily based on the question moderately than accessing lots of of pages without delay.

For instance, if somebody asks a query associated to “cellular video advertisements,” the MCP will first search by means of the doc index to establish probably the most related recordsdata earlier than trying by means of them.

// Part 3: Native Doc Entry

That is the best part of the MCP, the place I’ve an area folder that the MCP searches by means of. I add or take away recordsdata as wanted, permitting me so as to add my very own context, info, and directions on prime of my group’s initiatives.

# Abstract: How My Information Science MCP Works

This is an instance of how my MCP presently works to reply advert hoc knowledge requests:

A query is available in: ”What number of video advert impressions did we serve in Q3, and the way a lot advert demand do we’ve got relative to produce?”
The doc retrieval MCP searches our venture folder for “Q3,” “video,” “advert,” “demand,” and “provide,” and finds related venture paperwork
It then retrieves particular particulars in regards to the Q3 video advert marketing campaign, its provide, and demand from group paperwork
It searches the question financial institution for related questions on advert serves
It makes use of the context obtained from the paperwork and question financial institution to generate an SQL question about Q3’s video marketing campaign
Lastly, the question is handed to a separate MCP that’s linked to Presto SQL, which is routinely executed
I then collect the outcomes, evaluate them, and ship them to my stakeholders

# Implementation Particulars

Right here is how I carried out this MCP:

// Step 1: Cursor Set up

I used Cursor as my MCP consumer. You’ll be able to set up Cursor from this hyperlink. It’s basically an AI code editor that may entry your codebase and use it to generate or modify code.

// Step 2: Google Drive Credentials

Virtually all of the paperwork utilized by this MCP (together with the question financial institution) have been saved in Google Drive.

To present your MCP entry to Google Drive, Sheets, and Docs, you will must arrange API entry:

Go to the Google Cloud Console and create a brand new venture.
Allow the next APIs: Google Drive, Google Sheets, Google Docs.
Create credentials (OAuth 2.0 consumer ID) and save them in a file known as credentials.json.

// Step 3: Set Up FastMCP

FastMCP is an open-source Python framework used to construct MCP servers. I adopted this tutorial to construct my first MCP server utilizing FastMCP.

(Observe: This tutorial makes use of Claude Desktop because the MCP consumer, however the steps are relevant to Cursor or any AI code editor of your alternative.)

With FastMCP, you’ll be able to create the MCP server with Google integration (pattern code snippet beneath):

@mcp.software()
def search_team_docs(question: str) -> str:
    """Search group paperwork in Google Drive"""
    drive_service, _ = get_google_services()
    # Your search logic right here
    return f"Looking for: {question}"

// Step 4: Configure the MCP

As soon as your MCP is constructed, you’ll be able to configure it in Cursor. This may be completed by navigating to Cursor’s Settings window → Options → Mannequin Context Protocol. Right here, you will see a bit the place you’ll be able to add an MCP server. Whenever you click on on it, a file known as mcp.json will open, the place you’ll be able to embrace the configuration to your new MCP server.

That is an instance of what your configuration ought to seem like:

{
  "mcpServers": {
    "team-data-assistant": {
      "command": "python",
      "args": ["path/to/team_data_server.py"],
      "env": {
        "GOOGLE_APPLICATION_CREDENTIALS": "path/to/credentials.json"
      }
    }
  }
}

After saving your modifications to the JSON file, you’ll be able to allow this MCP and begin utilizing it inside Cursor.

# Remaining Ideas

This MCP server was a easy aspect venture I made a decision to construct to save lots of time on my private knowledge science workflows. It is not groundbreaking, however this software solves my rapid ache level: spending hours answering advert hoc knowledge requests that take away from the core initiatives I am engaged on. I imagine {that a} software like this merely scratches the floor of what is potential with generative AI and represents a broader shift in how knowledge science work will get completed.

The normal knowledge science workflow is transferring away from:

Spending hours discovering knowledge
Writing code
Constructing fashions

The main target is shifting away from hands-on technical work, and knowledge scientists are actually anticipated to have a look at the larger image and resolve enterprise issues. In some circumstances, we’re anticipated to supervise product selections and step in as a product or venture supervisor.

As AI continues to evolve, I imagine that the traces between technical roles will grow to be blurred. What is going to stay related is the talent of understanding enterprise context, asking the correct questions, decoding outcomes, and speaking insights. In case you are an information scientist (or an aspiring one), there isn’t any query that AI will change the way in which you’re employed.

You will have two selections: you’ll be able to both undertake AI instruments and construct options that form this modification to your group, or let others construct them for you.

Natassha Selvaraj is a self-taught knowledge scientist with a ardour for writing. Natassha writes on all the pieces knowledge science-related, a real grasp of all knowledge matters. You’ll be able to join along with her on LinkedIn or take a look at her YouTube channel.

Main Menu

What's Hot

New AI Management Guidelines with Emily Discipline, CPO of LPL Monetary

High 7 Free Machine Studying Programs with Certificates

Open VSX extensions hijacked: GlassWorm malware spreads by way of dependency abuse

Right here’s How I Constructed an MCP to Automate My Information Science Job

High 7 Free Machine Studying Programs with Certificates

AWS and NVIDIA deepen strategic collaboration to speed up AI from pilot to manufacturing

5 Vital Shifts D&A Leaders Should Make to Drive Analytics and AI Success

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

New AI Management Guidelines with Emily Discipline, CPO of LPL Monetary

High 7 Free Machine Studying Programs with Certificates

Open VSX extensions hijacked: GlassWorm malware spreads by way of dependency abuse

AI Toys Can Pose Security Issues for Kids, New Research Suggests Warning

Main Menu

Subscribe to Updates

What's Hot

Right here’s How I Constructed an MCP to Automate My Information Science Job

# Constructing a Information Science MCP

// The Core Drawback

// Part 1: Question Financial institution Integration

// Part 2: Google Drive Integration

// Part 3: Native Doc Entry

# Abstract: How My Information Science MCP Works

# Implementation Particulars

// Step 1: Cursor Set up

// Step 2: Google Drive Credentials

// Step 3: Set Up FastMCP

// Step 4: Configure the MCP

# Remaining Ideas

Related Posts