Messing around with PostgreSQL
While attempting to look into quantitative stock analysis with a friend we looked at:
But before we could really get going on these tutorials we needed an SQL database, these helped:
- Install Postgres (Tutorial)
- SQL Queries (Tutorials)
- Python interfacing - SQL Alchemy (Tutorial)
Data I'll be digging into:
- SEC Financial Data (monthly)
- SEC Financial Datasets (Quarterly)
First successful data load!
Few quirks I learned and worked out after a couple hours of stack overflow:
- Load data from Query line is better - although the gui is ok. Learning the syntax is the hard part. (See example in image)
- I get the impression loading each table one at a time is best practice. Turns out this data was huge. Like 253 Million Rows for just one of the files. This became apparent as I tried to edit it with notepad++. I ended up just killing it after waiting 10 mins. Took 5 seconds to load. Wow this is powerful. (When i got it right lol.)
- postgreSQL is picky about inputs and data types - like extremely... I tried like 50 query statements until I got one that works.
- Permissions to access the file was an issue, likely due to windows file permissions. Moving it to another drive is what I took it out of the windows permissions ecosystem. (alternative was to add everyone user permissions to the files.) Running as administrator didn't work.
- I ended up just deleting the header line manually because it only works properly for actual csvs. My file was a .txt. and converting it to .csv was computationally infeasible... Big DATA!