Selasa, 18 Agustus 2020
Show HN: Mp3 to Text https://ift.tt/3iSOC64
Show HN: Mp3 to Text https://ift.tt/314IdhQ August 18, 2020 at 09:33PM
Launch HN: Synth (YC S20) – Realistic, synthetic test data for your app https://ift.tt/3iUnFyy
Launch HN: Synth (YC S20) – Realistic, synthetic test data for your app Hey! Christos, Damien and Nodar here and we're the co-founders of Synth ( https://getsynth.com ) - Synth is an API which allows you to quickly and easily provision test databases with realistic data with which to test your application. We started our company about a year ago, after working at a quantitative hedge fund in London where we built models to trade US equities. Strangely, instead of spending time developing models or building the trading system, a large portion of our time was spent on just sourcing and on-boarding datasets to train and feed our models. The process of testing datasets and on-boarding them was archaic; one data provider served us XML files over FTP which we then had to spend weeks transforming for our models to ingest. A different provider asked us to spin up our own database and then sent us a binary which was used to load the data. We had to whitelist their API ip-address and setup a cronjob to make sure the dataset was never out of date. The binary provided an interactive input so it couldn't be scripted, or rather it could be but you need something to mock the interactive params. All this took a junior developer on the team a good 3-4 days to figure out and setup. Furthermore after our trial expired we decided we didn't actually need this dataset so those 3-4 days were essentially wasted. Our frustration around the status-quo in data distribution is what drove us to start our company. We spent the first 6 months building a privacy-aware query engine (think Presto but with built in privacy primitives), but software developers we talked to would frequently divert the topic to the lack of high quality, sanitised testing data during the software development lifecycle. It was strange - most of us developers and data scientists constantly use some sort of testing data for different reasons. Maybe you want a local development environment which is representative of production but clean from customer data. Or a staging environment which contains a much smaller, representative database so that tests run faster. You could want the dataset to be much bigger to test how your application scales. Maybe you want to share your database with 3rd party contractors who you don't necessarily trust. Whichever way you put it, it's strange that for a problem most of us face every day, we have no idiomatic solution. We write bespoke scripts and pipelines which often break. They are time consuming to write and maintain and every time your schema changes you need to update them manually. Or we get lazy and copy/paste production. We finally listened to all this feedback, dropped the previous product, and built Synth instead. Synth is a platform for provisioning databases with completely synthetic data. The way Synth works can be broken into 3 main steps. You first download our CLI tool (a bunch of python wrapped up in a container) and point it at your database to create a model (we host the models on the Synth platform). This model encodes your schema, and foreign key relationships as well as a semantic representation of your types. We currently use simple regular expressions to classify the semantic types (for example an address or license plate). The whole model is represented as a JSON object - if the classifier gets something wrong you can easily change the semantic type. Once the model has been created, the next step is to train the model. Under the hood we use a combination of copulas and deep-learning models to model the distributions and correlations in your dataset (the intuition here is that it's much more useful for developers to have realistic data than just sample from a random number generator). The final step is to use the trained model to generate synthetic data. You can either sample directly from the model or we can spin up a database for you and fill it with as much data as you need. The generation step samples from the trained model to create realistic data, as well as utilising bespoke generators for sensitive fields (credit card numbers, names, addresses etc.) You can run the entire lifecycle in a single command - you point the CLI tool at your database (currently Postgres, MySQL and MsSQL) and in ~1 minute you get an i.p. address and credentials to your new database with completely synthetic data. We're long time fans of HN and are eagerly looking forward to feedback from the community (especially criticism). We've made a free version available for this week so you can try it with no strings attached. We hope some of you will find Synth useful. If you have any questions we'll be around throughout the day. Also feel free to get in touch via the site. Thanks! ~ Christos, Damien & Nodar August 18, 2020 at 08:09PM
Show HN: Nice Ice – A widget for collecting user feedback with one LoC https://ift.tt/348W48K
Show HN: Nice Ice – A widget for collecting user feedback with one LoC https://niceice.io August 18, 2020 at 06:41PM
Show HN: ProgressKer The all-in-one progress tracker app for your daily routine https://ift.tt/3g6qPO9
Show HN: ProgressKer The all-in-one progress tracker app for your daily routine https://ift.tt/2EcpFn2 August 18, 2020 at 02:56PM
Show HN: RGB Color Spectrum Visualization Tool https://ift.tt/349lrat
Show HN: RGB Color Spectrum Visualization Tool https://ift.tt/3iOY1v7 August 18, 2020 at 11:31AM
Show HN: Made in India CSS https://ift.tt/34cS5rQ
Show HN: Made in India CSS https://ift.tt/2EePjHy August 18, 2020 at 01:05PM
Show HN: WizAtHome – Work from Home Wellness Management https://ift.tt/316nKsZ
Show HN: WizAtHome – Work from Home Wellness Management https://ift.tt/3hkwhOq August 18, 2020 at 11:04AM
Show HN: Chrome extension: Gives Ctrl+F like find results using GloVe vectors https://ift.tt/31a71Fv
Show HN: Chrome extension: Gives Ctrl+F like find results using GloVe vectors https://ift.tt/1Tx74hR August 18, 2020 at 07:28AM
Show HN: Convert Kubernetes resources to helm charts with Palinarus https://ift.tt/3azeQHR
Show HN: Convert Kubernetes resources to helm charts with Palinarus https://ift.tt/3iK8BU8 August 18, 2020 at 02:29AM
Show HN: Lorempdf.com – Create sample PDFs quick and easy https://ift.tt/2E2Sdzp
Show HN: Lorempdf.com – Create sample PDFs quick and easy https://ift.tt/2E1pzii August 18, 2020 at 05:25AM
Show HN: Dropbase 2.0 – Turn your offline files into live databases, instantly https://ift.tt/2Y7KqXY
Show HN: Dropbase 2.0 – Turn your offline files into live databases, instantly https://ift.tt/3ehHn4I August 18, 2020 at 12:38AM
Show HN: I'm building a cloud cost tool for Terraform https://ift.tt/2E0IJVw
Show HN: I'm building a cloud cost tool for Terraform https://ift.tt/3dus4p8 August 18, 2020 at 12:10AM
Langganan:
Postingan (Atom)