Leao2005

All Star
Supporter
Joined
Dec 14, 2019
Messages
2,654
Reputation
140
Daps
7,171
@Macallik86 @greenvale @Serious @Secure Da Bag @brehsimilitude @Rawtid @King45 @RareAirBorne

Need y'alls advice

Have a project with my team where we would be doing web scraping of stores that my company works with.

our idea at the moment is to web scraping to see whether retailers are either add a store or close a store (perm or temp closures). We are doing this so we dont have to ask/rely on the retailers for said openings/closings to have real life match our data.


Two questions:
We can do this in python?

And how feasible is this off the top of youse guys heads? Should we web scrap Google Map data, the retailers website, etc?


Im currently learning python right now and will try learning web scraping within the upcoming weeks.

Please give me y'all thoughts.
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,674
Reputation
1,503
Daps
21,870
@tunes757
Python/Scraping are not my forte at all so take my advice as more brainstorming/(un)educated guesses than expertise...

Python is the main language I hear about re:scraping off the internet so it's good that you are already knowledgeable. In terms of sources to use, my guess is to start with the most trustworthy sources and work my way down. I would trust Google Maps the least imo because anyone can throw something on there. The order I'd look at things are:
  1. Company website
  2. Governmental Records (especially if the focus is local)
  3. Crowdsourced content (Facebook/Google Maps/etc)
If the data is very well formatted, and your dept is already heavy in the Microsoft suite, then Excel/Power BI might be the most user-friendly option.

If it is a local initiative only, you might get some traction using active business licenses. Here's an example of Subway Restaurants in my city from a feed that is updated daily. I think that since it shows "current/active", locations should drop when a company closes for example but that's the type of data definitions that you need to get concrete answers on if you find a similar source.

If you are scraping a popular website (ie Faecbook/Google Maps), I would probably look for tutorials online for reference/reusable code instead of trying to reinvent the wheel.
 

Leao2005

All Star
Supporter
Joined
Dec 14, 2019
Messages
2,654
Reputation
140
Daps
7,171
@tunes757
Python/Scraping are not my forte at all so take my advice as more brainstorming/(un)educated guesses than expertise...

Python is the main language I hear about re:scraping off the internet so it's good that you are already knowledgeable. In terms of sources to use, my guess is to start with the most trustworthy sources and work my way down. I would trust Google Maps the least imo because anyone can throw something on there. The order I'd look at things are:
  1. Company website
  2. Governmental Records (especially if the focus is local)
  3. Crowdsourced content (Facebook/Google Maps/etc)
If the data is very well formatted, and your dept is already heavy in the Microsoft suite, then Excel/Power BI might be the most user-friendly option.

If it is a local initiative only, you might get some traction using active business licenses. Here's an example of Subway Restaurants in my city from a feed that is updated daily. I think that since it shows "current/active", locations should drop when a company closes for example but that's the type of data definitions that you need to get concrete answers on if you find a similar source.

If you are scraping a popular website (ie Faecbook/Google Maps), I would probably look for tutorials online for reference/reusable code instead of trying to reinvent the wheel.
Yea the initiative is country wide sadly
 

greenvale

Superstar
Supporter
Joined
Aug 1, 2017
Messages
6,353
Reputation
1,960
Daps
24,777
Reppin
Delaware
@Macallik86 @greenvale @Serious @Secure Da Bag @brehsimilitude @Rawtid @King45 @RareAirBorne

Need y'alls advice

Have a project with my team where we would be doing web scraping of stores that my company works with.

our idea at the moment is to web scraping to see whether retailers are either add a store or close a store (perm or temp closures). We are doing this so we dont have to ask/rely on the retailers for said openings/closings to have real life match our data.


Two questions:
We can do this in python?

And how feasible is this off the top of youse guys heads? Should we web scrap Google Map data, the retailers website, etc?


Im currently learning python right now and will try learning web scraping within the upcoming weeks.

Please give me y'all thoughts.
Wish I could help. My python skills aren’t that good but I’m in grad school and got a python course next semester.

Let us know what you end up doing
 
Joined
Mar 11, 2022
Messages
536
Reputation
166
Daps
2,400
@tunes757
I don't have much expertise in scraping as I've only done it for adhoc analysis and projects but nothing more than that so my answer is similar to @Macallik86. Python is generally the easiest to get started with and scraping is no different so yes you can do this but I can't speak to the scalability and I know scraping can be brittle depending on the complexity of your scraping logic and the source being scraped.

If you have to scrape many sources that are designed differently then it might be easier to get access to the back end or else the solution could be tough to scale and maintain. Like any solution there's a tradeoff. You can use a source like Google that makes it easy to collect the data but may not be as accurate or you can use sources like company websites that may be more difficult to collect the data but will also be more accurate. Hard to say more without knowing all the parameters of the project and goals
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,674
Reputation
1,503
Daps
21,870
Glad they rolled it back. I had a panic last week when I couldn't run any Macro-enabled reports. Took me 30 minutes to figure out that I had to tell Excel to trust any files coming from my company's shared drive.
 

Sauce Mane

Superstar
Joined
Mar 11, 2022
Messages
3,739
Reputation
1,750
Daps
27,830
Reppin
Brooklyn
Any advice on how to get an entry level Data Analyst job? I have Bachelor's in Economics, did a 4-month long apprenticeship or 200 hours of data analysis, Excel, SQL, Tableau, and data visualization. Did Google Data Analytics certificate and financial modeling cert. I'm trying to get my foot in the door at this point and work my way up.
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,674
Reputation
1,503
Daps
21,870
Any advice on how to get an entry level Data Analyst job? I have Bachelor's in Economics, did a 4-month long apprenticeship or 200 hours of data analysis, Excel, SQL, Tableau, and data visualization. Did Google Data Analytics certificate and financial modeling cert. I'm trying to get my foot in the door at this point and work my way up.
Not a Data Analyst but I have a solid overlap in skillset. Some tips I'd say are:
  • Network network network. Attend local events that are in the same lane. Open Data meetups in your city, volunteer for Data Science for Social Good, attend local Tableau conferences if they're free, join a free group for Business Analysts, etc
  • Contribute online via open source groups. Make sure to put in work first to show your skillset and then once people take notice, you can bring up that you're looking for work. Too many people join these spaces with their hands out from jump and get curved
  • You probably already did it, but get your resume on job sites so that while you are actively grinding, you are still passively popping up in searches.
    • Make sure to look at a few articles/videos on keywords that Data Scientist recruiters are looking for to increase your odds.
      • Don't go too deep in the data scientist YouTube rabbit hole. There are many channels that will lead you to inaction instead of action
    • Make sure to reference your GitHub on your resume so that inclined technically savvy people can easily see your handiwork
  • Read a book or two on marketing yourself for a job in your downtime for more ideas.
  • Keep applying to jobs! The initial job is always the hardest and then it gets easier from there.

It's not ideal, but a main takeaway is that a lot of companies want someone (read: anyone) to vouch for you so they know you aren't too green. It could be the fact that you were hired for an internship that gets you a sit down. It could be other contributors in an open-source project you're working on. It could be a local business that someone you know runs, etc. I'd keep grinding and tweaking my resume, while also looking for opportunities to find someone who can vouch for me.
 
Last edited:

greenvale

Superstar
Supporter
Joined
Aug 1, 2017
Messages
6,353
Reputation
1,960
Daps
24,777
Reppin
Delaware
Any advice on how to get an entry level Data Analyst job? I have Bachelor's in Economics, did a 4-month long apprenticeship or 200 hours of data analysis, Excel, SQL, Tableau, and data visualization. Did Google Data Analytics certificate and financial modeling cert. I'm trying to get my foot in the door at this point and work my way up.

Not a Data Analyst but I have an overlap in Analyst skills. Some tips I'd say are:
  • Network network network. Attend events that are in the same lane. Open Data meetups in your city, volunteer for Data Science for Social Good, attend local Tableau conferences if they're free, join a free group for Business Analysts, etc
  • Check for open source groups you can join. Contribute a bit first to show your skillset and then once people take notice, you can bring up that you're looking for work.
  • Get your resume on job sites so that while you are actively grinding, you are still passively popping up in searches in case someone's looking.
    • Make sure to look at a few articles/videos on keywords that employers are looking for to increase your odds
  • It's not ideal, but basically a lot of companies just want someone (read: anyone not-related to you) to vouch for you. It could be the fact that you were hired for an internship. It could be other contributors in an open-source project you're working on. It could be a local business that someone you know runs.
  • Use GitHub and reference it on your resume so people can see your handiwork
  • Read a book or two on marketing yourself for a job in your downtime for more ideas.
  • Keep applying to jobs! The initial job is always the hardest and then it gets easier from there.
Great advice above.

I'd just add learn SQL. It will only help you if you can demonstrate you can grab your own data.
 

MikelArteta

Moderator
Staff member
Supporter
Joined
Apr 30, 2012
Messages
253,289
Reputation
32,002
Daps
774,930
Reppin
Top 4
Any advice on how to get an entry level Data Analyst job? I have Bachelor's in Economics, did a 4-month long apprenticeship or 200 hours of data analysis, Excel, SQL, Tableau, and data visualization. Did Google Data Analytics certificate and financial modeling cert. I'm trying to get my foot in the door at this point and work my way up.

I work in big data, all i did is a certificate for data science and got my gig.
 

Serious

Veteran
Supporter
Joined
Apr 30, 2012
Messages
80,271
Reputation
14,339
Daps
191,136
Reppin
1st Round Playoff Exits
Any advice on how to get an entry level Data Analyst job? I have Bachelor's in Economics, did a 4-month long apprenticeship or 200 hours of data analysis, Excel, SQL, Tableau, and data visualization. Did Google Data Analytics certificate and financial modeling cert. I'm trying to get my foot in the door at this point and work my way up.
Take whatever job you can get initially, that’s data centric. After 6-12 months you should be able to parlay that into a better position.

Don’t get frustrated thinking you need to know everything right away either. A lot shyt comes with time and mentorship.

Lastly don’t be opposed to a cheap data science degree. I’m finishing up my masters, and trust me it helps get your foot in the door. Because while a lot of people are “self studying” you have an institution backing your work. You have a curriculum detailing what you’re capable of doing. You should be able create a decent portfolio of your work as well. And lastly. There’s a network associated with your school. Use it. There’s a discord at my uni, where people tip off job openings. There’s managers getting their masters for credentials, and you’ll be surprised if they’re hiring.

Plus in a lot of industries credentials matter. Masters looks better than a ba/bs or no degree in certain fields like healthcare.
 
Last edited:

Secure Da Bag

Veteran
Joined
Dec 20, 2017
Messages
41,326
Reputation
21,368
Daps
129,535
Using XML & STUFF( ) I could have just exported all results per member into a single, csv-delimited column and then exported to Excel and split it up for based on calendar year

That's and window functions are the two things in SQL, I haven't mastered yet. But with respect to FOR XML/STUFF, you know that SQL Server 2016 released some new string functions. They should help ease string manipulation. And I just found out there's a FOR JSON too. But I haven't used it at all.
 
Top